Microsoft Corporation
January 2005
Applies to:
Microsoft SQL Server 2005
Microsoft Visual Studio 2005
Microsoft .NET Framework 2.0
XML and relational data
Summary:
Three options for handling XML data in the Visual Studio 2005/SQL
Server 2005 environment are discussed, including usage scenarios and
guidelines to help you choose between them. (34 printed pages)
Contents
Introduction to System.Xml, SQLXML, and the XML Data Type
XML Usage Scenarios
XML Classes in the .NET Framework
SQLXML
Server-Side Support for Relational/XML Integration (FOR XML/OPENXML)
XML Data Type in SQL Server 2005
Comparison of Different Approaches
Conclusion
For More Information
Introduction to System.Xml, SQLXML, and the XML Data Type
This
section contains a brief history of the XML support provided in
Microsoft SQL Server 2000 and an overview of the three options provided
in the Microsoft Visual Studio 2005/SQL Server 2005 environment for
manipulating XML and relational data. The options are: 1) the classes
in the System.Xml namespace, 2) the SQLXML classes, and 3) the XML data type provided in SQL Server 2005.
XML support was added to Microsoft SQL Server 2000 to provide the user with features such as:
- Exposing relational data as XML
- Shredding XML documents into row sets
- Creating XML views by mapping XML schemas to database schemas using XML-Data Reduced (XDR) schemas
- Creating queries on XML views using XPath
- Exposing data in SQL Server through HTTP
This support was further enhanced in subsequent SQLXML Web releases. Enhancements include:
- Updategrams and XML Bulkload to persist changes made to XML views
- Support
for annotated XML Schema definition language (XSD) to describe mappings
(support for XDR still exists but its use is not recommended)
- Client-side FOR XML
- SQLXML Managed Classes
- Support for Web services
The
Microsoft .NET Framework 1.0 has extensive support for reading,
writing, and processing XML documents. This support has been further
enhanced in the .NET Framework 2.0 to improve the performance and
usability of various XML classes. New classes provided by the .NET
Framework in the System.Xml namespace can be used to map XML data to
and from relational data.
SQLXML is a set of libraries and
technologies that enables the seamless integration of relational data
in SQL Server databases with XML. It is a middle-tier component and
does not include the server-side XML support provided by FOR XML and
OPENXML. SQLXML provides a schema-driven mapping approach to generating
XML from relational source data and loading XML that represents
relational information back into relational tables. SQLXML classes
provide XML support to SQL Server 2000 databases and above.
Microsoft
SQL Server 2005 adds built-in support for XML in the form of the XML
data type. XML data can be stored natively in XML data type columns.
Additionally, an XML data type column can be further constrained by
associating an XML schema collection with this column. XML values
stored in XML data type columns can be manipulated with the help of
XQuery and an XML Data Modification Language (DML). Indexes can be
built on XML data to improve query performance. Furthermore, FOR XML
and OPENXML have been enhanced to support the new XML data type.
New
features introduced in SQL Server 2005 to store and process XML data
coupled with XML features offered in earlier versions of SQL Server
provide developers with several ways in which XML data can be stored
and processed in XML applications. As there are multiple ways in which
XML applications can be built using the alternative approaches provided
by SQL Server 2005, it is important to understand the scenarios,
trade-offs, and synergies of the different technologies in order to
make the right choice. This article provides guidance in selecting the
appropriate alternative for developing XML applications with SQL Server
2005.
XML Usage Scenarios
The areas in which XML is used can be broadly classified as follows:
- XML for Business Integration: Business integration,
also known as enterprise application integration (EAI), includes A2A
(application-to-application), B2B (business-to-business), and B2C
(business-to-consumer) applications. Applications working on disparate
systems communicate with each other using XML-based messages.
- XML for Content Management:
Content management systems based on XML enable users to store,
retrieve, modify, and query XML documents. These systems store XML
documents in their native format.
Next a few scenarios
that fall under the previously mentioned categories are discussed. The
solutions for these scenarios are presented in subsequent sections with
detailed treatments of the different XML options available in the SQL
Server 2005/Visual Studio 2005 environment.
Scenario 1: Insurance Claims
An
auto insurance company providing services on the Internet allows its
insurance buyers or agents to enter insurance claims through the
company's Web site. These claims will be processed by the centralized
system located at the corporate head quarters. Once the processing is
complete, the system is required to store specific information related
to the claim in a specified XML format. Exact copies of these XML
documents must be maintained in the system for legal purposes. This
scenario shows the use of XML for content management.
Scenario 2: Data Exchange Between Automobile Manufacturer and Parts Suppliers I
An
automobile manufacturer interacts with multiple parts suppliers to
procure the parts required for the company. Currently the manufacturer
receives invoices from suppliers. The data corresponding to these
invoices is then manually fed to a legacy invoice processing system.
The invoice processing system stores the data in relational format. The
manufacturer now wants to automate the invoice data feeding process to
the legacy invoice processing system. This scenario is an example of
the use of XML for business integration.
Scenario 3: Data Exchange Between Automobile Manufacturer and Parts Suppliers II
This
scenario involves an automobile manufacturer that interacts with
multiple parts suppliers as mentioned in the previous scenario. The
manufacturer's present system does not provide the facility for the
suppliers to check the status of an invoice or to get a copy of payment
instructions from the manufacturer. Currently this information is
available to the suppliers only over the phone. The automobile
manufacturer needs to be able to expose this information over the Web
so that the suppliers can perform these tasks automatically. This
scenario demonstrates the use of XML for business integration.
Scenario 4: Content Management System
A
company provides information in the fields of medicine, law, and
technology to its customers through various channels including Web,
books, and CD-ROM. The company wants to build a content management
system to help it deliver high quality content to its customers in less
time. This scenario illustrates the use of XML for content management.
Scenario 5: Customer Survey
A
company provides air ticket booking services on the Internet and
conducts surveys for each season to identify the most sought after
destination by its customers for the current season. The questionnaire
used for each season is different and the questionnaire may change in
the future. The company will analyze the information, and the analysis
results are used to design packaged travel deals that will satisfy the
needs of the maximum number of customers. This scenario can be
classified as the use of XML for content management.
XML Classes in the .NET Framework
The
Microsoft .NET Framework has excellent support for developing XML-based
products. In the .NET Framework, the core classes such as XmlTextReader, XmlTextWriter, XmlDocument, XmlValidatingReader,
etc., are available in the System.Xml namespace, which is the root
namespace for all XML classes. These core classes enable users to read,
write, and validate XML documents using both stream-based and DOM-based
(Document Object Model-based) navigation/access models. The System.Xml
namespace contains the following subnamespaces:
- System.Xml.Schema – Contains classes to deal with XML Schemas definition language (XSD) schemas.
- System.Xml.Serialization – Provides classes for the serialization of objects into XML format documents or streams.
- System.Xml.XPath – Includes classes for navigating through XML documents using XPath expressions.
- System.Xml.Xsl – Has classes for performing Extensible Stylesheet Transformations (XSLT).
Enhancements in the System.Xml Namespace
In Visual Studio 2005, new classes such as XsltCommand and enhancements to existing XML classes like XmlDocument can be used to perform various operations on XML data, including modifying XML documents, applying XSL transformations, etc.
Some of the enhancements in Visual Studio 2005 with respect to XML classes in the System.Xml namespace are as follows:
- XML Schema validation support has been added to the XmlDocument class.
- The XmlReader and XmlWriter classes have been enhanced to provide significant performance improvements and to support XML schema types. Also, static Create methods have been added to provide an easier way to create instances of XmlReader and XmlWriter using an XmlReaderSettings and an XmlWriterSettings class to configure the type created.
For more information on enhancements in System.Xml see the
What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release [ http://msdn2.microsoft.com/en-us/library/aa468556.aspx ] white paper.
The
classes in the System.Xml namespace can be used to implement custom XML
parsing, manipulation, and storage logic. Leveraging the common
language runtime (CLR) hosting capability of SQL Server 2005 and using
XML classes in Visual Studio 2005, XML processing can be performed
either in the middle tier or in the database tier.
Use of the .NET Framework XML classes involves storing XML documents as a column of type [n]varchar(max) or varbinary(max)
in a database, or as files on the file system, and processing these
documents in the middle tier or in the database using classes in the
System.Xml namespace. XML classes in the .NET Framework can also be
used to operate on data stored in XML data type.
The .NET Framework XML classes are a suitable choice if:
- You want to get access to all of the .NET
Framework XML functionality, such as streaming parsers, document type
definition (DTD) and XSD validation, XSLT processing, etc.
- You want to use SQL Server merely as a data store for XML documents and do not require fine-grained access inside the database.
- You
do most or all processing on the XML document using the .NET Framework
XML classes and perform updates at the document level.
You can use [n]varchar(max), varbinary(max), or the XML data type to store XML.
If you use [n]varchar(max) or varbinary(max) you get the following benefits:
- You preserve textual fidelity for XML documents with an exact copy of the document including white space and formatting.
- Your application gets the fastest possible performance for insert and retrieval operations on the document as a whole.
The benefits of using the XML data type are given in a later section.
Performing XML Processing in the Middle Tier
XML
processing can be performed in the middle tier using the various XML
classes provided by the .NET Framework. As mentioned earlier, when this
approach is adopted, XML documents can be stored as a column of type [n]varchar(max)
or XML in a database or as files on the file system. In the middle
tier, these documents are retrieved from the database and processed
according to the requirements of the user as follows:
- If you need to read an XML document, use an XmlReader created via the XmlReader.Create() method to load the document obtained from the database. Navigate through the document using Read(). The XmlReader class provides fastest possible read-only, forward-only, non-cached access to the XML document.
- If you need write access to an XML document and require full navigational access to XML data, use the XmlDocument class to load and access the XML document. XmlDocument
is an implementation of the Document Object Model (DOM) in the .NET
Framework, which is an in-memory tree representation of an XML document
that enables navigation and editing of this document.
- If you need to validate an XML document based on DTD/XSD or to obtain XSD information at runtime, use the XmlReader class. Create methods with either XsdValidation or DTDValidation set to true on the XmlReaderSettings class. A ValidationEventHandle() event handler can be also set to handle validation errors encountered during reading.
- If you need to apply an XSL transformation to an XML document, use the XPathDocument class to load the XML document and apply the transformation using the XslCommand class. The XPathDocument class provides a fast, high-performance cache for XML document processing using XSLT.
- If you need to query an XML document using XPath expressions, load the XML document using either XPathDocument (read-only) or XmlDocument (read/write). Create an instance of an XPathNavigator using the CreateNavigator() method, and pass the required XPath expression as an argument to the Select() method of XPathNavigator.
Performing XML Processing in the Database
The
integration of SQL Server 2005 with CLR enables developers to use the
XML classes provided by the .NET Framework to perform processing in the
database tier as well. This integration offers the capability of
writing stored procedures, functions, triggers, and user-defined types
in any language supported by the .NET Framework. Furthermore, CLR
hosting also provides access to the complete .NET Framework base class
library. As a result, the different XML processing options described in
the previous section can also be performed in the database.
The advantages of using CLR integration are as follows:
- It provides the ability to use object-oriented
languages like C# and Visual Basic .NET for authoring database objects
in managed code.
- Managed database objects are more secure than the extended stored procedures available in earlier versions of SQL Server.
- It provides the ability to define user-defined data types and user-defined aggregates.
- Under certain conditions, compiled managed database objects deliver improved performance over Transact-SQL.
In
SQL Server 2005, database developers have two options for writing
stored procedures, triggers, and user-defined functions. They are
Transact-SQL and any language available in the .NET Framework, such as
C# or Visual Basic .NET. The choice of the language depends on the type
of operations that are performed on the data. Transact-SQL is best
suited when the code will perform mostly data access with little or no
procedural logic. Managed classes are best suited for mathematically
intensive functions and procedures such as string handling, date
operations, system resource access, file access, or image processing.
The steps involved in using the XML classes of the .NET Framework in the database tier are as follows:
- Developing a managed assembly. Using
any language available in the .NET Framework, implement processing
functionality as an assembly and package this assembly as a DLL.
Assemblies can also refer to other assemblies.
- Registering an assembly and granting permissions.
Assemblies developed using the .NET Framework can be registered with
SQL Server by using the CREATE ASSEMBLY T-SQL statement. While
registering an assembly, it is also possible to specify the code access
permissions granted to the assembly. An assembly can be unregistered
using the DROP ASSEMBLY T-SQL statement.
- Exposing managed types to T-SQL.
Processing functionality offered by an assembly can be exposed to T-SQL
via scalar-valued user-defined functions, table-valued user-defined
functions, user-defined procedures (UDPs), or user-defined triggers.
Scalar user-defined functions can be used in any scalar expression.
Table-valued user-defined functions can be used in any FROM clause.
UDPs can be invoked in an EXEC statement.
Analyzing Scenarios
Insurance
claims contain data-centric information such as Claim ID, Policy
Number, Claim Settlement Data, etc., and document-centric information
such as a description of the accident damage. XML documents are
excellent at aggregating data-centric and document-centric information.
The key requirement in the present scenario (see the Scenario 1: Insurance Claims
section) is the need to maintain an exact copy of insurance claims in
XML format. With SQL Server, this requirement can be met easily by
storing the insurance claims as a column of type [n]varchar(max) or varbinary(max)
in the database. It is important to note that the XML data type should
not be used to store the documents if information such as insignificant
white spaces, order of attributes, namespace prefixes, and the XML
declaration need to be preserved.
Benefits
The benefits of using [n]varchar(max) or varbinary(max) as a storage medium and classes from the System.Xml namespace for the manipulation of XML documents are as follows:
- It is flexible when you need to make changes
to the schema of an XML document. It is also useful when you want to
store XML documents with different schemas in the same column.
- It provides textual fidelity for XML documents when [n]varchar(max) or varbinary(max) is used to store XML. This may be a requirement for applications that deal with legal documents such as insurance claims.
- You
can take advantage of database features such as transactional updates,
concurrent access, backup, replication, etc., compared to storing XML
instances as files in a file system.
- As this approach does not
depend on XML support offered by the database, the application can
easily be extended to support multiple database servers such as SQL
Server, Oracle, etc.
- It can use the processing power of the
client system, thereby reducing the load on the server. By performing
CPU-intensive XML processing on the middle tier, the server is relieved
of some of the load and is available for other important tasks.
- It offers the best possible performance for document-level insertion and retrieval operations.
- Complex operations such as XSL transforms can be done in the database as stored procedures, triggers, or functions.
Limitations
The limitations of using [n]varchar(max) or varbinary(max) for storing and classes from the System.Xml namespace for processing XML instances can be summarized as follows:
- Coding complexity is high compared to the XML data type (see the XML Data Type in SQL Server 2005 section) or the SQLXML option (see the SQLXML
section). Even though the database logic is simple, implementation of
the code to handle the parsing and processing of XML in the middle tier
or database tier is complicated.
- The amount of code required
to implement this solution is high. As a result, the cost of
maintenance is also high compared to SQLXML option.
- Since the XML document is stored as [n]varchar(max)
in the database, fine-grained updates, inserts, or deletes on the XML
document are not possible. It offers limited querying capabilities.
- The size of an XML document that can be stored in [n]varchar (max) data type is limited to 2GB.
- It is very costly to search a column of documents stored this way based on the XML content.
Example of Using XML Classes in the .NET Framework
Consider the insurance claim scenario that was discussed earlier in this paper (see the Scenario 1: Insurance Claims
section). The insurance company, once the claim is approved, would like
to store the claim information for legal purposes. The claim
information can be stored in a database as [n]varchar(max) data type.
The flow of the application is as follows:
- After processing the claim, the application either approves or rejects the claim.
- Using classes in the System.Xml namespace, the XML document for the claim is generated.
- The generated XML document is sent to a stored procedure.
- The stored procedure inserts the XML document into the table.
The
following code example generates an XML document using the claim
details available with the system and inserts the XML document into the
database.
using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace InsuranceClaim
{
class Insurance
{
static void Main(string[] args)
{
Insurance.InsertInsuranceClaim();
}
static void InsertInsuranceClaim()
{
StringWriter strWriter = null;
XmlWriter writer = null;
XmlWriterSettings settings = null;
SqlConnection connection = null;
SqlCommand command = null;
try
{
strWriter = new StringWriter();
settings = new XmlWriterSettings();
//Use indenting for readability.
settings.Indent = true;
settings.Encoding = System.Text.Encoding.UTF8;
writer = XmlWriter.Create(strWriter, settings);
//Write the XML delcaration.
writer.WriteStartDocument();
writer.WriteStartElement("InsuranceClaim");
writer.WriteStartElement("ClaimInfo");
writer.WriteElementString("ClaimID", "C1234");
writer.WriteElementString("ClaimType", "3");
writer.WriteStartElement("SettlementDetails");
writer.WriteStartElement("PaymentDetails");
writer.WriteElementString("PaidTo", "Jeff");
writer.WriteElementString("Amount", "2000");
writer.WriteElementString("Date", "05/12/2002");
writer.WriteElementString("ApprovedBy", "Mike");
writer.WriteEndElement();//End of PaymentDetails
writer.WriteEndElement();//End of SettlementDetails
writer.WriteEndElement();//End of ClaimInfo
writer.WriteStartElement("DamageReport");
writer.WriteString("Minor accident occured on ");
writer.WriteElementString("Address", "ABC Street, Sample City, Sample State");
writer.WriteString(" due to ");
writer.WriteElementString("Cause", "bad weather");
writer.WriteString(" resulted in damage to ");
writer.WriteElementString("DamagedItem", "Head Lights");
writer.WriteElementString("DamagedItem", "Engine");
writer.WriteEndElement();//End of DamageReport
writer.WriteEndElement();//End of InsuranceClaim
writer.WriteEndDocument();
//Write the XML to file and close the writer.
writer.Flush();
connection = new SqlConnection();
connection.ConnectionString = @"server=localhost;
database=AdventureWorks;Integrated Security=SSPI;";
command = connection.CreateCommand();
command.CommandText = "InsertInsuranceClaim";
command.CommandType = System.Data.CommandType.StoredProcedure;
command.Parameters.Add("@CustomerID",
System.Data.SqlDbType.Char);
command.Parameters.Add("@Claim",
System.Data.SqlDbType.VarChar);
String xml = strWriter.ToString();
string strCustomerID = "1001";
command.Parameters[0].Value = strCustomerID;
command.Parameters[0].Size = strCustomerID.Length;
command.Parameters[1].Value = xml;
command.Parameters[1].Size = xml.Length;
connection.Open();
command.ExecuteNonQuery();
connection.Close();
}
finally
{
if (connection.State == ConnectionState.Open)
connection.Close();
if (writer != null)
writer.Close();
if (strWriter != null)
strWriter.Close();
}
}
}
}
Here is the script that creates the database table:
CREATE TABLE [InsuranceClaim](
[CustomerID] [char](4) NOT NULL,
[Claim] [varchar](max) NOT NULL,
[ModifiedDate] [datetime] NOT NULL DEFAULT (getdate())
)
The following stored procedure is used to insert the XML document into the database.
CREATE PROCEDURE [dbo].[InsertInsuranceClaim]
@CustomerID [char](4),
@Claim [varchar](max)
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO [InsuranceClaim] ( CustomerID, Claim )
VALUES ( @CustomerID, @Claim )
END;
SQLXML
Introduced
with SQL Server 2000, SQLXML encompasses a whole range of features
related to XML processing on the client side. It is a set of libraries
and technologies enabling the seamless integration of relational data
in SQL Server databases with XML describing relationally structured
data.
In the pre-SQL 2000 days, the developer had to provide
the code layer for interaction between the relational data and the data
in XML format. With the advent of SQLXML, life was made easier, for it
provided the link between relational data and XML. The topic of
discussion in this paper is confined to the SQLXML Managed Classes. For
more information on the applicability of other features of the library,
see the SQLXML [ http://msdn2.microsoft.com/en-us/library/aa286527.aspx ] page on MSDN.
SQLXML comprises many of the features introduced in SQL Server to support XML. These features are:
- Ability to transform the result of a query into XML on the client side
- Ability to create an XML view of relational data by using an annotated XSD mapping schema file, which allows you to
- Define an XPath query against the XML view
- Perform an insert, update, or delete on data in the database using XML templates known as updategrams
- Perform XML bulk load operations
- Ability to access SQL Server using HTTP, which enables you to:
- Specify SQL statements in the URL
- Specify template queries in the URL
- Specify template files in the URL
- Specify XPath queries written against the annotated XSD mapping schema files in the URL
- Ability
to expose functionality offered by stored procedures, user-defined
functions, and template queries as SOAP-based Web services
- Ability to write code in the .NET Framework to take advantage of the XML features offered by SQLXML using SQLXML Managed Classes
Client-side XML formatting.
Specifying a FOR XML clause on the client side causes the middle tier
to perform the FOR XML transformation on the rowset returned by the
server in response to a query. To perform XML formatting on the client
side:
- Set the ClientSideXml property of the SqlXmlCommand object to true if you are using SQLXML Managed Classes.
- Set the ClientSideXML provider-specific property to True if you are using SQLXMLOLEDB Provider.
- Specify client-side-xml="1" in the template, if you are using template queries.
- Select the Run on the Client option on the virtual directory in the Settings tab if you are accessing SQL Server using HTTP.
- The
XML formatting modes that are valid with FOR XML on the client side are
RAW, NESTED, and EXPLICIT. When RAW mode is used, the resulting XML
document contains an XML element for each row in the query result and
an attribute corresponding to each column in a row. When NESTED mode is
specified, the base table names are returned as the element names of
the resulting XML document. The EXPLICIT mode generates XML documents
of any shape by allowing you to specify the format of the desired XML
in the query itself.
- XML views. XML views are created
by using annotated XSD schemas that define the mappings between
relational and XML data. These XML views can be queried using XPath
queries. The relational data exposed through XML views can also be
modified and the modifications can then be submitted to the database
using updategrams. Furthermore, XML views are also useful for inserting
huge XML documents into the database with the help of the COM-based XML Bulk Load object.
- HTTP access to SQL Server.
SQLXML provides a utility known as the IIS Virtual Directory Management
utility, which can be used to set up an IIS virtual directory to expose
the XML features of an SQL Server via HTTP. Support for specifying the
SQL statements, stored procedures, templates queries, template files,
and XPath queries directly at the URL is provided with the help of SQL
ISAPI extension.
- Web services support in SQLXML.
Support for exposing the functionality of an SQL Server as a SOAP-based
Web service was added in SQLXML 3.0. This feature allows the SQL Server
to receive SOAP HTTP requests from clients to execute stored
procedures, user-defined functions, and templates.
- SQLXML Managed Classes.
Access to SQLXML functionality in the .NET Framework is provided
through SQLXML Managed Classes. There are three managed classes in
SQLXML:
- SqlXmlCommand - handles the database connectivity and query execution aspects
- SqlXmlParameter - helps specify the parameters in queries
- SqlXmlAdapter - facilitates interaction with the dataset in the .NET Framework
SQLXML Managed Classes allow you to:
- Execute an SQL query with FOR XML clause
- Execute an XPath query against a mapping schema
- Execute a template query
- Execute a template query file
- Execute an updategram
- Execute a DiffGram
Using SQLXML to expose relational data as XML documents is a suitable choice if:
- Your application receives XML data, which is highly structured and maps well into relational tables.
- Your
application must load huge XML documents received from external
applications into a database and persist them in relational format.
- Your application does not have a requirement to preserve the document order.
- Your application is required to present the same data to multiple data consumers in different formats.
- Performance of DML operations is critical for your application.
- Your application has a requirement to utilize the full potential of the optimizer for query optimization.
- Your application performs fine-grained data manipulations.
- Your application is required to expose existing relational data as XML.
Analyzing Scenarios
In the first data exchange scenario described in the XML usage scenarios (see the Scenario 2: Data Exchange Between Automobile Manufacturer and Parts Suppliers I
section), the interaction between an automobile manufacturer and
multiple parts suppliers presents a classic case for using SQLXML. The
manufacturer has to communicate with different suppliers to exchange
invoice data. The proposed solution uses a Web service and SQLXML to
solve this problem. The manufacturer exposes a Web service that can be
used by the suppliers to send invoices to the manufacturer. The Web
service uses an XSLT that is specific to the customer to transform the
invoice from the supplier format to a common format used by the
manufacturer. The Web service then shreds the XML document using an XML
view that maps the content of the invoice document to the columns of a
relational table. The legacy invoice processing system will be able to
pick the data from the relational table and proceed with processing.
The advantages of using XML views for this scenario are as follows:
- Low maintenance costs. Any changes made to
the invoice schema by the supplier can be accommodated easily by
modifying the XSLT file that is specific to the supplier.
- The coding complexity is less compared to FOR XML (see the Server-Side Support for Relational/XML Integration (FOR XML/OPENXML) section).
- New suppliers can be supported easily by creating an XSLT file specific to the supplier.
Benefits
The advantages of using SQLXML are summarized as follows:
- Creating an annotated mapping schema to map
relational data to the XML data is a relatively simple and more
maintainable solution than the FOR XML EXPLICIT on the server side.
- SQLXML
offers the ability to create updatable bi-directional XML views
compared to read-only XML representations of relational data created
using FOR XML.
- Mapping XSD provides the facility for
accommodating the change requests in XML format without major code
changes. This provides ease for maintenance.
- SQLXML provides the ability to perform XML formatting on the client side by allowing the user to set the ClientSideXML property to true for the SqlXMLCommand class and thereby reducing the load on the server.
Limitations
On the downside, there are a few limitations when SQLXML is used from the client side:
- XML views are not suitable for cases where the hierarchy of an XML document is too deep or recursive of unknown depth.
- SQLXML
is not suitable for descriptive documents containing mixed content
markup and ordered data such as product catalogs, news reports, etc.
- Because the document order is not preserved, it is difficult to reconstruct the original XML document.
- While
decomposing XML documents into relational tables gives excellent
searching performance, the transformation to and from XML can be
expensive.
- When default mapping is used in XSD mapping
schemas, it can expose the database table name and column information,
which can lead to unintentional information leakage. This risk can be
avoided by specifying explicit mapping for tables and columns.
- SQL
statements in the URL should only be used in trusted intranets. The use
of such queries on the Internet can lead to potential security risks.
Example of Using SQLXML
With
the overview of SQLXML complete, the next step is to delve into an
example where SQLXML Managed Classes can be applied. Consider a simple
example—exporting sales order details for a specific customer. The
tables used for the example are available in the AdventureWorks database.
Data
from the database has to be available in the client in XML format,
which can be displayed by the presentation layer. Next you will see how
the relational data in the SQL database can be manipulated as XML data
using the SQLXML class. A mapping XML Schema is used to manipulate and
map the XML node names to table fields. For more information on
manipulating relational data using the SQLXML managed library, see the
SQLXML [ http://msdn2.microsoft.com/en-us/library/aa286527.aspx ] page on MSDN.
The
following annotated XSD schema defines the mappings between the
relational tables [Sales.Customer], [Sales.SalesOrderHeader],
[Sales.SalesOrderDetail] and the target XML representation of sales
order details of a customer. Parent-child relationships in XML can also
be defined using XSD mapping schemas as shown in the following XSD
schema.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship name="CustomerOrderHeader"
parent="Sales.Customer"
parent-key="CustomerID"
child="Sales.SalesOrderHeader"
child-key="CustomerID" />
<sql:relationship name="OrderHeaderOrderDetail"
parent="Sales.SalesOrderHeader"
parent-key="SalesOrderID"
child="Sales.SalesOrderDetail"
child-key="SalesOrderID" />
</xsd:appinfo>
</xsd:annotation>
<xsd:element name="Customer" sql:relation="Sales.Customer" >
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Order" sql:relation="Sales.SalesOrderHeader"
sql:relationship="CustomerOrderHeader" maxOccurs="unbounded" >
<xsd:complexType>
<xsd:sequence>
<xsd:element name="OrderDetail"
sql:relation="Sales.SalesOrderDetail"
sql:relationship="OrderHeaderOrderDetail"
maxOccurs="unbounded" >
<xsd:complexType>
<xsd:attribute name="SalesOrderID"
type="xsd:integer" />
<xsd:attribute name="ProductID" type="xsd:integer" />
<xsd:attribute name="OrderQty" type="xsd:integer" />
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="SalesOrderID" type="xsd:integer" />
<xsd:attribute name="CustomerID" type="xsd:integer" />
<xsd:attribute name="OrderDate" type="xsd:date" />
<xsd:attribute name="ShipDate" type="xsd:date" />
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="CustomerID" type="xsd:integer" />
</xsd:complexType>
</xsd:element>
</xsd:schema>
class ExportOrders
{
/// <summary>
/// This method use SqlXmlCommand class to select the records from
/// Sales.Customer, Sales.SalesOrderHeader and Sales.SalesOrderDetail
/// tables.The data is fetched as such from server and formatted into
/// xml at client side. Note that the ClientSideXml is set to true.
/// </summary>
static void Main(string[] args)
{
if (args.Length < 1)
{
Console.WriteLine("Usage");
Console.WriteLine("CustomerOrders <CustomerID> [OrderID]");
return;
}
try
{
StringBuilder strBuilder = new StringBuilder();
strBuilder.Append("/Customer[@CustomerID='");
strBuilder.Append(args[0]);
strBuilder.Append("']");
if (args.Length > 1)
{
strBuilder.Append("/Order[@SalesOrderID='");
strBuilder.Append(args[1]);
strBuilder.Append("']");
}
SqlXmlCommand xmlCommand = new SqlXmlCommand(@"Provider=
SQLOLEDB; Server=localhost; database=AdventureWorks;
Integrated Security=SSPI;");
xmlCommand.ClientSideXml = true;
xmlCommand.RootTag = "CustomerOrders";
xmlCommand.SchemaPath = @"CustomerOrderDetails.xsd";
xmlCommand.CommandType = SqlXmlCommandType.XPath;
xmlCommand.CommandText = strBuilder.ToString();
Stream reader = xmlCommand.ExecuteStream( );
FileStream fsOut = File.Create("CustomerOrder.xml");
StreamWriter sw = new StreamWriter(fsOut);
using (StreamReader sr = new StreamReader(reader))
{
sw.Write(sr.ReadToEnd());
}
sw.Flush();
sw.Close();
fsOut.Close();
}
catch (Exception exception)
{
Console.WriteLine( exception.ToString() );
}
}
}
The
method provided above exports the sales order details for a Customer ID
specified as a command line argument to the application. The data is
converted into XML format on the client side, thereby avoiding
performance issues on the server side. Note that the annotated XSD
schema mapping provided above has to be saved as
CustomerOrderDetails.xsd for the above code fragment to work properly.
Note The code that goes into retrieving the data as XML from the database is minimal as can be seen in the example.
Server-Side Support for Relational/XML Integration (FOR XML/OPENXML)
Support
for returning the results of an SQL query as XML documents on the
server side is provided in SQL Server using the FOR XML extension to
the SELECT statement. On the other hand, the OPENXML keyword provides
the ability to extract rowsets from an XML document.
FOR XML
Server-side FOR XML supports four XML transformation modes—RAW, AUTO, EXPLICIT, and PATH.
By
default, RAW mode maps each row in the query result set to an XML
element and each column in the row to an attribute. When the ELEMENTS
option is specified with ROW mode, each column in a row is mapped to a
subelement of the element that is generated for the row. It is also
possible to request an inline schema for the XML that is generated by
specifying XMLSCHEMA option.
AUTO mode provides support for
generating nested XML elements where each table in the FROM clause, for
which at least one column is listed in the SELECT clause, is mapped to
an XML element and the columns listed in the SELECT clause are mapped
to attributes (or subelements if the ELEMENTS option is specified) by
default.
EXPLICIT mode provides the maximum control over the
shape of the XML that is generated from a query result. It allows you
to generate XML with any shape by specifying the format for the desired
XML in the query itself.
Composing complex XML documents using
the EXPLICT mode is cumbersome. Using the PATH mode together with the
ability to write nested FOR XML queries and the TYPE directive to
return XML type instances provides an alternative to writing complex
EXPLICIT mode queries. The PATH mode maps the columns in the rowset
returned by the SELECT query to attributes and subelements by
interpreting the column names as XPath-like syntax. For more
information on FOR XML enhancements in SQL Server 2005 see the
What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ] white paper.
OPENXML
OPENXML, together with the sp_xml_preparedocument and sp_xml_removedocument system stored procedures, provides a relational rowset view of an XML document. In order to use OPENXML on an XML document, sp_xml_preparedocument
must be used to create an in-memory representation of the XML document.
This stored procedure parses the XML document using the MSXML parser
and returns a handle to the XML document that can be used with OPENXML.
Now, parameters such as XML document handle, rowpattern, which is an
XPath expression that maps nodes of XML data to rows, rowset schema,
and mapping between the rowset columns and the XML nodes can be passed
to OPENXML to obtain the rowset. The XML document must be unloaded from
memory using the sp_xml_removedocument stored procedure when it is no longer required.
Enhancements in FOR XML
FOR XML has been enhanced in SQL Server 2005 with the following features:
- Ability to typecast the results of FOR XML using the new TYPE directive
- Ability to assign the results of FOR XML to a variable of type XML
- Ability to nest FOR XML queries to generate XML hierarchies
- Ability to use the new PATH mode to generate complex XML documents
- Ability to generate an inline schema in XDR or XSD format using the XMLDATA and XMLSCHEMA options, respectively
- Ability to use the ELEMENTS directive with RAW mode to generate element-centric XML
- Ability to use the XSINIL option with the ELEMENT directive to map NULL values to an element with an attribute xsi:nil="true"
Enhancements in OPENXML
In SQL Server 2005, OPENXML has been enhanced to support the following features:
- Ability to pass XML type data to sp_xml_preparedocument
- Ability to use new data types in the WITH clause
Using FOR XML and OPENXML to compose and decompose XML documents is a suitable choice if:
- Your application has a requirement to store data relationally and expose this information to another application as XML.
- Your application does not have the requirement to preserve the order of XML.
- Your application performs heavy-element–level DML operations.
- Your application requires fine-grained data access and update.
- Your application requires you to expose the relational data through a Web service.
Analyzing Scenarios
The requirement in scenario 3 (see the Scenario 3: Data Exchange Between Automobile Manufacturer and Parts Suppliers II
section) is to provide Web services that suppliers can use to obtain
the status of an invoice or get a copy of payment instructions. FOR XML
in conjunction with a Web service provides a solution that allows the
manufacturer to expose these services on the Internet. The supplier
uses a Web service to enquire about the status of his invoice. The Web
service then uses the invoice ID provided by the supplier and generates
the response in XML format from the relational data using a FOR XML
statement. The generated XML document will be returned to the supplier.
In the current scenario, using the FOR XML statement-based approach
provides the following benefits:
- FOR XML provides an easy way to dynamically compose simple XML documents from relational data.
- Maintaining a FOR XML query is easy compared to an XML view when they are used to compose simple XML documents.
Benefits
Some of the benefits of using FOR XML/OPENXML are as follows:
- FOR XML offers a simple way of generating XML from relational data on the server.
- FOR XML provides the ability to expose business information through Web services.
- OPENXML
allows you to pass a rowset to a stored procedure in XML format which
enables you to perform bulk INSERT, UPDATE, and DELETE operations with
just one network roundtrip.
- FOR XML together with XSL can be used for application integration or business integration.
Limitations
The limitations of using FOR XML/OPENXML for composing and decomposing XML documents are:
- Shaping XML structures using FOR XML with the EXPLICT option is a difficult task.
- It is difficult to maintain complex queries written using FOR XML EXPLICIT.
- XML
documents generated by FOR XML AUTO can expose the database table name
and column information leading to inadvertent information disclosure.
This situation can be avoided by specifying aliases for tables and
columns.
Example of Using FOR XML and OPENXML
The following examples use the AdventureWorks
database that comes with SQL Server 2005. Let us look at an example
where FOR XML is used to obtain customer, order, and order detail
information from the [Sales.Customer], [Sales.SalesOrderHeader],
[Production.Product], and [Sales.SalesOrderDetail] tables for a
specified range of customers.
Example: Using FOR XML
SELECT Cust.CustomerID,
OrderHeader.CustomerID,
OrderHeader.SalesOrderID,
Detail.SalesOrderID, Detail.LineNumber,Detail.ProductID,
Product.Name,
Detail.OrderQty
FROM Sales.Customer Cust,
Sales.SalesOrderHeader OrderHeader,
Sales.SalesOrderDetail Detail,
Production.Product Product
WHERE Cust.CustomerID = OrderHeader.CustomerID
AND OrderHeader.SalesOrderID = Detail.SalesOrderID
AND Detail.ProductID = Product.ProductID
AND (Cust.CustomerID BETWEEN 44 AND 46)
ORDER BY OrderHeader.CustomerID,
OrderHeader.SalesOrderID
FOR XML AUTO
Result of the query is as shown below:
<Cust CustomerID="44">
<OrderHeader CustomerID="44" SalesOrderID="53575">
<Detail SalesOrderID="53575" LineNumber="2" ProductID="952" OrderQty="2">
<Product Name="Chain" />
</Detail>
<Detail SalesOrderID="53575" LineNumber="1" ProductID="969" OrderQty="1">
<Product Name="Touring-1000 Blue, 60" />
</Detail>
<Detail SalesOrderID="53575" LineNumber="3" ProductID="972" OrderQty="1">
<Product Name="Touring-2000 Blue, 54" />
</Detail>
</OrderHeader>
<OrderHeader CustomerID="44" SalesOrderID="59024">
<Detail SalesOrderID="59024" LineNumber="1" ProductID="972" OrderQty="3">
<Product Name="Touring-2000 Blue, 54" />
</Detail>
<Detail SalesOrderID="59024" LineNumber="2" ProductID="957" OrderQty="2">
<Product Name="Touring-1000 Yellow, 60" />
</Detail>
</OrderHeader>
</Cust>
<Cust CustomerID="46">
<OrderHeader CustomerID="46" SalesOrderID="48354">
<Detail SalesOrderID="48354" LineNumber="1" ProductID="730" OrderQty="1">
<Product Name="LL Road Frame - Red, 62" />
</Detail>
</OrderHeader>
</Cust>
The following example extracts order details specified in the XML document using OPENXML and XPath expressions.
Example: Using OPENXML
DECLARE @XmlDocumentHandle int
DECLARE @XmlDocument nvarchar(max)
SET @XmlDocument = N'<ROOT>
<Cust CustomerID="44">
<OrderHeader CustomerID="44" SalesOrderID="53575">
<Detail SalesOrderID="53575" LineNumber="2" ProductID="952" OrderQty="2">
<Product Name="Chain" />
</Detail>
<Detail SalesOrderID="53575" LineNumber="1" ProductID="969" OrderQty="1">
<Product Name="Touring-1000 Blue, 60" />
</Detail>
<Detail SalesOrderID="53575" LineNumber="3" ProductID="972" OrderQty="1">
<Product Name="Touring-2000 Blue, 54" />
</Detail>
</OrderHeader>
<OrderHeader CustomerID="44" SalesOrderID="59024">
<Detail SalesOrderID="59024" LineNumber="1" ProductID="972" OrderQty="3">
<Product Name="Touring-2000 Blue, 54" />
</Detail>
<Detail SalesOrderID="59024" LineNumber="2" ProductID="957" OrderQty="2">
<Product Name="Touring-1000 Yellow, 60" />
</Detail>
</OrderHeader>
</Cust>
<Cust CustomerID="46">
<OrderHeader CustomerID="46" SalesOrderID="48354">
<Detail SalesOrderID="48354" LineNumber="1" ProductID="730" OrderQty="1">
<Product Name="LL Road Frame - Red, 62" />
</Detail>
</OrderHeader>
</Cust>
</ROOT>'
-- Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @XmlDocumentHandle OUTPUT, @XmlDocument
-- Execute a SELECT statement using OPENXML rowset provider.
SELECT *
FROM OPENXML (@XmlDocumentHandle, '/ROOT/Cust/OrderHeader/Detail',2)
WITH (CustomerID varchar(10) '../@CustomerID',
OrderID int '../@SalesOrderID',
LineNumber int '@LineNumber',
ProductID int '@ProductID',
Quantity int '@OrderQty')
-- Remove the internal representation.
EXEC sp_xml_removedocument @XmlDocumentHandle
Result of the query is as follows:
--------------------------------------------------------
CustomerID OrderID LineNumber ProductID Quantity
--------------------------------------------------------
44 53575 2 952 2
44 53575 1 969 1
44 53575 3 972 1
44 59024 1 972 3
44 59024 2 957 2
46 48354 1 730 1
--------------------------------------------------------
XML Data Type in SQL Server 2005
The
hierarchical nature of XML data makes it hard to model it as relational
data as the structure of the data becomes complex (e.g., depth of
hierarchy increases). Moreover, when XML data is mapped to relational
data, the order of elements in XML instances is also not preserved and
there is a significant cost involved in composing the original XML
document from decomposed relational data. The limitations of the
relational model to store XML data make it ideal to store XML instances
natively. Native XML instances do not suffer from the limitations of
relational models and offer features such as the ability to handle
hierarchical or nested data, the ability to preserve the order of
elements, a straightforward way to store and retrieve XML data, the
flexibility to support multiple schemas, and so on.
Microsoft
SQL Server 2005 provides extensive support for XML data processing.
With SQL Server 2005, XML values can be stored natively in an XML data
type column, which can be typed according to a collection of XML
schemas, or can be left untyped. Fine-grained data manipulation is
supported using XQuery and XML DML, the latter being an extension for
data modification. Furthermore, the XML column can be indexed for
improved query performance.
Typed XML
Typed
XML is ideal for situations where you have XML schemas describing your
XML data. In such cases, you can associate a collection of XML schemas
with the XML column to yield typed XML. Validation on XML type columns
is performed based on the XML schema collection associated with the
column. Additionally, the performance of queries that involve typed XML
data is better compared to untyped XML data, as it requires no runtime
conversion of node values.
Untyped XML
Using
untyped XML is suitable if you have schemas, but you do not want the
server to validate the data, or if no schema is available. You may want
to store untyped XML even when a schema is present if an application:
- Has no fixed schema
- Performs the validation on the client side before storing the data at the server
- Temporarily stores XML data that is invalid according to the schema
- Uses schema components that are unsupported at the server (e.g., key/keyref)
Untyped
XML documents are checked to ensure that they are well formed, even
though they are not associated with any schema. It is important to note
that untyped XML imposes a performance penalty due to the runtime
conversion of node values, as the node values are stored internally as
Unicode strings.
Usage Scenarios of XML Data Type
With the new XML data type in SQL Server 2005, it is now possible to:
- Create a table with one more columns of type XML in addition to the relational columns.
- Create a typed XML column type by associating it with an XML schema collection.
- Create a constraint on an XML column involving other XML or non-XML type columns to enforce business rules.
- Create a variable of type XML that can be used to store an instance of an XML data type.
- Create a parameter of type XML to a stored procedure or user-defined function.
- Return an XML type value from a user-defined function.
- Assign the result of a FOR XML query obtained using the new TYPE directive to a variable of type XML.
- Run a subset of XQuery to query into the XML structure and transform XML data.
- Create a computed column based on a column of type XML.
- Create an XML index on a column of type XML to improve the performance of queries.
- Perform element-level insert, delete, and update operations on XML instances using XML DML.
- Pass an instance of XML type data to sp_xml_preparedocument to prepare an in-memory representation of the XML document.
- Write cross-domain queries that will include both relational and XML columns using XQuery and XML DML.
- Cast or convert the XML type to varchar or nvarchar types using CAST and CONVERT, respectively.
- Convert or cast string data types such as [n]varchar, [n]text, varbinary, and image into XML type using CAST or CONVERT.
XML Data Type Methods and XML DML
Queries
and manipulation on XML data type columns are supported through five
methods. Fragments of an XML document can be extracted using the query() method of XML data type. The query()
method accepts an XQuery expression as an argument and returns an
untyped XML instance. Scalar values can be extracted from an XML
instance using the value() method by specifying an XQuery
expression and the desired SQL type to be returned. Existential checks
on an XML instance can be performed using the exist() method. Decomposing an XML document into relational data is facilitated with the help of the nodes() method.
Data manipulation operations can be performed on an XML instance using the modify()
method. Support for XML DML is provided through insert, delete, and
update keywords added to XQuery. One or more nodes can be inserted,
deleted, and updated using the insert, delete, and update keywords,
respectively.
XML Indexing
The parsing
and shredding involved in query processing operations on XML data type
columns can consume a prohibitively long time if the size of the XML
instance is very big. Query performance on the XML data type can be
improved by creating indexes on these columns. The size of the XML data
and usage scenario play an important role in dictating the type of
index that is required. SQL Server supports two types of
indexes—primary XML indexes and secondary XML indexes; the latter
cannot exist without the former.
The creation of a primary XML
index on an XML column shreds the XML BLOB and stores these values in
an internal table. This results in improved performance during query
execution time as there is no shredding involved at runtime. Depending
on the usage scenario, query performance can be further improved by
creating secondary XML indexes. Three types of secondary XML
indexes—PATH, PROPERTY, and VALUE—can be created to improve the
performance of queries based on path, property, and value respectively.
More information on selecting an appropriate secondary index for an XML
type column is available in the Performance Optimizations for the XML
Data Type white paper.
Storing XML documents as an XML data type is a suitable choice if:
- Your application has a requirement to preserve
the Infoset content of XML instances. The Infoset content of an XML
document includes document hierarchy, order of elements, values of
elements and attributes, and so on. Information such as order of
attributes, prefixes of namespaces, insignificant white spaces, and XML
declaration is not preserved.
- Your application requires element-level modify and query operations on XML documents.
- Your application needs indexes on XML data type columns to speed up query processing.
- Your XML data may or may not have a schema.
- Your
application uses XML documents with a wide range of structures, or XML
documents conforming to different or complex schemas that are too hard
to map to relational structures.
Analyzing Scenarios
Analyzing Scenario: Content Management System
Now, let us analyze the content management system described in the XML usage scenarios (see the Scenario 4: Content Management System
section). Publishing companies deal with information in various forms
including text, images, audio, video, etc. Blocks of information that
can be used independently are collected from various sources and
maintained in a database. These blocks of information are known as components.
By assembling individual components, documents are created. Which
components are included in a document depends on the requirements of
the users. These documents will be delivered to the subscribed users
through various channels. Content management systems typically have
requirements such as the ability to store, retrieve, search, and update
the content with high performance and scalability.
XML as a
unified data model provides an attractive option for storing both XML
data and XML content in the same document. XML also provides the
ability to separate the presentation from the data itself, which is
important because the presentation of the same information to different
users may differ. The native XML data type offered by SQL Server 2005
can be used to meet the requirements of this content management system.
With the XML data type, it is possible to store XML documents, modify
XML documents at element-level using XML DML, and perform queries on
XML documents using XQuery.
Analyzing Scenario: Customer Survey
In the customer survey XML usage scenarios (see Scenario 5: Customer Survey
section), the key requirement is the ability to store the survey
information with multiple schemas. Data that does not have a fixed
schema cannot be modeled using a single relational table. A relational
table with an XML column provides an excellent option for storing such
information. An additional column can be added to the relational table
to store the type of survey. Information corresponding to a particular
type of survey can be analyzed by fetching all records for the survey
type using the survey type column. In a typical survey, customers
usually don't answer all the questions. Therefore, instead of creating
several columns (one for each question in the survey) and storing NULL
values in the database for unanswered questions, storing the
per-customer survey information as an XML document in one column would
be beneficial. Storing the customer survey information as an XML type
column is more appropriate for this scenario because:
- Using XML type columns allows the survey
information with different schemas to be stored in a single XML type
column. Associating the XML type column with an XML schema collection
allows the user to store data for multiple survey types.
- Validation
of survey information can be performed by the user interface and there
is no need to store this as a typed XML data type column to enforce
validation at the database level.
- XQuery can be used to perform analysis on the data.
Benefits
The benefits of storing XML data in an XML data type column can be summarized as follows:
- The XML data type offers a simple and
straightforward way of storing your XML data at the server while
preserving document order and document structure. This is especially
true in the case of documents where the document order and document
structure are very important. Assume a sample scenario where your
application gets an XML document from some data source and your
application is supposed to store the document. Storing it in an nvarchar or text
column does not guarantee well-formed XML and does not provide easy
access to its content. In such cases, it is appropriate to store the
incoming XML document natively in the XML column.
- The XML data
type provides the ability to perform fine-grained query and modify
operations on XML data. Prior to SQL Server 2005, there was no way to
store XML natively in the database. Hence, if you wanted to modify or
query XML data, the data would have to be loaded from the nvarchar or text
column, an XML document created out of the string, and then modified.
Similar steps were required to write the modified data back to the
database. Now with the XML data type, these steps become much easier.
- With the XML data type, it is possible to create indexes on XML data type column for faster query processing.
- When
the XML data type is used, you can use XML schema collection and
constraints on XML data to enforce business rules. The XML schemas are
used to validate the data, add type-based operational semantics,
perform more precise type checks than untyped XML during compilation of
query and data modification statements, and optimize storage and query
processing.
- As the XML type data is stored in the database, it
is included in a variety of database activities such as backup and
restore, SQL Server security, transactions, logging, and so on.
Limitations
The following are some of the limitations that one must be aware of when working with the new XML data type:
- An exact copy of the data is not stored.
Insignificant white spaces, namespace prefixes, order of attributes,
and XML declaration are not preserved.
- The maximal depth of hierarchy in an XML document is 128.
- The maximal size of the internal binary representation of an XML document is 2GB.
- XML instances cannot be compared. Thus:
- An XML column cannot be part of a primary or foreign key constraint.
- An XML column cannot be used in a GROUP BY statement as a grouping value.
- XML cannot be cast to text, ntext, and image data types as these types are deprecated in SQL Server 2005. However, the XML data type can be cast to [n]varchar and [n]varbinary types.
Example of Using the XML Data Type
The sample application uses the Sales.Store table from the AdventureWorks database. The Sales.Store table contains CustomerID as the primary key and Demographics as the XML column. The Demographics
column contains the Store Survey information. The information that gets
stored as part of Store Survey is optional. This means that the Demographics
column may or may not contain all the elements. If the same information
were stored in a relational format, these elements would need to be
created as columns of the table. Since most of the Store Survey
information is optional, these columns will contain NULL values for
most of the data. This would lead to a waste of table space. To avoid
such waste, the Store Survey information in the Demographics column is stored in XML format. The Demographics
column contains the sales information, viz., Annual Sales, Annual
Revenue, Bank name, etc. per customer. These fields are stored as XML
elements in the data.
The sample application performs the following functions:
- Display the list of all the customers along with their demographic information.
- Display the demographic information for a particular customer.
- Insert a new customer in the Sales.Store table along with the demographic information.
- Modify some elements (such as Annual Sales, Annual Revenue, etc.) of the demographic information for a given customer.
- Delete the demographic information for a given customer.
The application uses the System.Data.SqlTypes.SqlXml class to retrieve the data from an XML column. The SQLXML class is a direct mapping to the XML column.
Using the SQLXML class, it is possible to directly retrieve the data from the XML column without any mappings or conversions.
Here,
let us look at an example of how to retrieve the Annual Revenue element
for the customer ID 12. The following code example illustrates the
second function mentioned above.
Public void RetreiveAnnualRevenue ()
{
SqlConnection conn = new SqlConnection();
conn.ConnectionString = @"Server=localhost; Database=AdventureWorks;
integrated security=SSPI";
conn.Open();
SqlCommand command = conn.CreateCommand();
command.CommandText = @"select Demographics.query(
'declare namespace SS="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey"
<StoreInfo>
<AnnualRevenue>
{data(/SS:StoreSurvey/SS:AnnualRevenue)}
</AnnualRevenue>
</StoreInfo>') as
Result from Sales.Store where CustomerID=12";
SqlDataReader datareader = command.ExecuteReader();
System.Text.StringBuilder builder = new System.Text.StringBuilder();
While (datareader.Read())
{
SqlXml sqlxml = datareader.GetSqlXml(0);
builder.Append(sqlxml.Value);
}
//Note: xml1 is a XML web control
this.xml1.DocumentContent = builder.ToString();
this.xml1.TransformSource = @"StoreInfo.xslt";
}
The above method creates an instance of the SqlConnection object to the AdventureWorks database. The command text property contains the query to retrieve the value of Annual Revenue field for the customer ID 12.
This
is an XQuery that will get directly executed on the XML column. The
return value of the query will be an XML fragment, which is then mapped
to the SQLXML class. The XML fragment can then be retrieved using the Value property of the SQLXML class.
The retrieved XML fragment is then displayed in the client application, using an XML Web server control.
Comparison of Different Approaches
Feature | XML classes in the .NET Framework | FOR XML / OPENXML | SQLXML | XML Data Type |
Code complexity | High. There are no classes that directly map between XML data and relational data. | Medium. Writing queries using FOR XML EXPLICIT is difficult. | Low.
The classes provide a mechanism to manipulate the relational data as
XML data and the updategrams provide a facility to update the records
as well. | Low. Since XML data is stored as such in
the column, complexity is less. Besides, Visual Studio 2005 provides
classes to manipulate the XML data type. XML DML can be used for
modifying the XML data. |
Maintainability | Complicated. Changes in the fields of the table or XML require code change. | Difficult. Changes in the fields of the table or XML require changes to the query. | Easy. In most cases, modifying the mapping XSD files can accommodate changes. | Easy. XQuery provides easy syntax for querying the XML columns in the database. |
Installation | Requires no special installation other than the .NET Framework. | No special installations required. | Requires that SQLXML library is installed on the client machines. | No special installation required. |
Security | Secure to a very good extent, since the data type and formats are not exposed on the client side in normal case. | Secure if proper care is taken to avoid disclosure of table names and column names. | The design has to secure the mapping XSD file if it is stored client side instead of mid-tier. | Secure |
Support for .NET Compact Framework | Limited support. XmlDataDocument is not supported in the Microsoft .NET Compact Framework. | Supported | Not supported | Not supported. If XML data type columns from SQL Server are synced to SQL Server mobile, it will be converted to ntext. |
Data Validation | Can be enforced by the client and the server. | Can be enforced by the server. | Can be performed by the client. | Can be enforced by the server using XML Schema. |
Data Storage | [n]varchar(max), XML, or varbinary(max) | Relational tables (can use XML as a field). | Relational tables (can use XML as a field). | XML data type |
Fidelity | Textual fidelity (preserves the XML data at byte level) | Relational fidelity (preserves the hierarchical structure of the data, but the order among elements is ignored) | Relational fidelity | InfoSet fidelity (preserves the InfoSet content of the XML data) |
Data access and update in store | Supports updates at document level. | Supports fine-grained data access and updates. | Supports fine-grained data access and updates. | Supports fine-grained data access and updates. |
Conclusion
This
paper provides the user with the different options for handling XML in
SQL Server 2005. The System.Xml namespace, SQLXML, and the XML data
type are discussed with the relative benefits and limitations of each,
and include sample scenarios. The performance exhibited in ideal
scenarios enables the user to choose the appropriate XML option for the
user's application.
For More Information
Microsoft SQL Server 2005 Web site [ http://www.microsoft.com/sql/ ]
XML Support in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345117.aspx ]
XML Best Practices for Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345115.aspx ]
What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ]
Performance Optimizations for the XML Data Type [ http://msdn2.microsoft.com/en-us/library/ms345118.aspx ]