XML Support in Microsoft SQL Server 2005

Microsoft SQL Server 9.0 Technical Articles

Shankar Pal, Mark Fussell, and Irwin Dolobowsky
Microsoft Corporation

December 2005

Applies to:
Microsoft SQL Server
Microsoft .NET Framework 2.0

Summary: This paper provides an overview of the XML support that is built into Microsoft SQL Server 2005. It includes a discussion of how XML integrates with client-side programming support in both the .NET Framework 2.0 and native code such as OLEDB and SQLXML. (40 printed pages)

Introduction
Motivating Scenarios for XML Storage
Server-Side XML Processing in SQL Server 2005
Client-Side XML Processing in SQL Server 2005
Conclusion
References

Introduction

eXtensible Markup Language (XML) has been widely adopted as a platform-independent format for data representation. It is useful for exchanging information among loosely coupled, disparate systems, such as in business-to-business (B2B) applications and workflow situations. Data interchange has been a major driver of XML technologies.

XML is increasingly present in enterprise applications that are used for modeling semi-structured and unstructured data. One such application is document management. Documents (e-mail messages, for example) are semi-structured by nature. If documents are stored inside a database server as XML, powerful applications can be developed such as:

Applications that retrieve documents based on their content.
Applications that query for partial content, such as finding the section whose title contains the word "Background."
Applications that aggregate documents.

Such scenarios are becoming feasible with the increase in the development and availability of applications that generate and consume XML. For example, the Microsoft Office 2003 System allows users to generate Microsoft Word, Excel, Visio, and Infopath documents as XML markup.

Why Use Relational Databases for XML Data?

Storing XML data in a relational database provides benefits in the areas of data management and query processing. Microsoft SQL Server provides powerful query and data modification capabilities over relational data. In SQL Server 2005, these capabilities are extended to querying and modifying XML data. This allows your company to leverage investments made over past releases, such as investments in the areas of cost-based optimizations and data storage. For example, indexing techniques in relational databases are well-known. These have been extended to indexing XML data so that queries can be optimized using cost-based decisions.

XML data can interoperate with existing relational data and SQL applications. This means that XML can be introduced into the system as data modeling needs arise without disrupting existing applications. The database server also provides administrative functionality for managing XML data (for example, backup, recovery, and replication).

Native XML support within SQL Server 2005 is necessary to address increasing XML usage. Enterprise application development benefits from the XML support in SQL Server 2005.

The following sections give an overview of XML support in SQL Server 2000 and 2005, describe some of the motivating scenarios for XML usage, and include detailed discussions of the server-side and client-side XML feature sets.

XML Support in SQL Server 2000

This section provides a brief, high-level overview of the XML support in SQL Server 2000 and subsequent Web releases of the SQLXML client-side programming platform that provides rich support for mapping data back and forth between the relational and the XML data models.

Server-Side Support

At the server, XML data is generated from tables and query results by using a FOR XML clause in a SELECT statement. This is ideal for data interchange and Web service applications. The converse of FOR XML is a relational rowset generator function named OpenXML; it extracts values from the XML data into columns of a rowset by evaluating XPath 1.0 expressions. OpenXML is used by applications that shred incoming XML data into tables or for querying by using the Transact-SQL language.

Client-Side Support

The client programming support for SQL Server 2000 is referred to as SQLXML. (For more information, see SQLXML [ http://msdn2.microsoft.com/en-us/library/aa286527.aspx ] on the Microsoft Developer Network (MSDN)). At the center of this technology is XML view, which is a bi-directional mapping of an XML Schema to relational tables. SQL Server 2000 supports only the mapping of XDR schemas, though support for XSD was added in later Web releases. The XML view allows querying by using a subset of XPath 1.0 where the mapping is used to translate the path expressions into SQL queries on the underlying tables, and the query results are packaged into XML results.

SQL XML also enables you to create XML templates that you can use to create an XML document that has dynamic sections. Within the XML document, you can embed FOR XML queries and XPath 1.0 expressions over mapping queries. When the XML template is executed, the query block is replaced with the result of the query. In this way you can create XML documents that include both some static content and some dynamic content that is data-driven.

In SQL Server 2000, there are two main ways to access SQLXML functionality.

SQLXMLOLEDB Provider—The SQLXMLOLEDB Provider is an OLE DB provider that exposes Microsoft SQLXML functionality through ADO.
HTTP Access—SQLXML functionality in SQL Server 2000 can also be accessed via HTTP by using the SQLXML ISAPI filter. By using our configuration tool, you can set up a Web site to receive incoming requests to execute the XML templates, and the FOR XML and XPath 1.0 statements over XML views using HTTP.

Limitations in XML Support

The server and client programming platforms provide rich support for the generation and consumption of XML data based on mapping between tabular and XML data. This handles fairly structured XML data well. In SQLXML, the query language is a subset of XPath 1.0 and has some limitations. For example, the descendant axes (//-operator) are not supported. Consequently, restrictions exist that affect the development of certain solutions. For example, XML document order is not preserved, which is so crucial for applications such as document management. Furthermore, recursive XML schemas are not supported. In spite of these limitations, client SQLXML and server XML functionality have been widely used in application development. SQL Server 2005 addresses many of these limitations, extends the relational-XML interchange features, and provides native XML support.

Overview of XML Support in SQL Server 2005

This section provides a brief, high-level overview of the new XML support in SQL Server 2005, which is complemented by support in the .NET Framework 2.0 and by native client data access such as OLE DB.

XML Data Type

The XML data model has characteristics that make it very hard if not practically impossible to map to the relational data model. XML data has a hierarchical structure that may be recursive; relational databases provide weak support for hierarchical data (modeled as foreign key relationships). Document order is an inherent property of XML instances and must be preserved in query results. This is in contrast with relational data, which is unordered; order must be enforced with additional ordering columns. Re-assembling the result during querying is costly for realistic XML schemas that decompose the XML data into a large number of tables.

SQL Server 2005 introduces a native data type called XML. A user can create a table that has one or more columns of type XML in addition to relational columns. XML variables and parameters are also allowed. XML values are stored in an internal format as large binary objects (BLOBs) in order to support the XML model characteristics, such as document order and recursive structures, more faithfully.

SQL Server 2005 provides XML schema collections as a way to manage W3C XML Schemas as metadata. An XML data type can be associated with an XML schema collection to enforce schema constraints on XML instances. When the XML data is associated with an XML schema collection, it is called typed XML; otherwise it is called untyped XML. Both typed and untyped XML are accommodated within a single framework, the XML data model is preserved, and query processing enforces XML semantics. The underlying relational infrastructure is used extensively for this purpose. It supports interoperability between relational and XML data, thereby making way for more widespread adoption of the XML features.

XML Data Type Query and Data Modification

XML instances can be retrieved by using a Transact-SQL SELECT statement. Five built-in methods on the XML data type are provided to query and modify XML instances.

The XML data type methods accept XQuery. XQuery is an emerging W3C standard language (currently in Last Call). It includes the navigational language XPath 2.0. A language for modifying XML data, such as adding or deleting subtrees and updating scalar values, is also available. Together with a large set of functions, embedded XQuery and data modification languages provide rich support for manipulating XML data.

The XQuery type system is aligned with that of W3C XML schema types. Most of the SQL types are compatible with the XQuery type system (for example, decimal). A handful of types (for example, xs:duration) are stored in an internal format and suitably interpreted to be compatible with the XQuery type system.

The compilation phase checks static type correctness of XQuery expressions and data modification statements, and uses XML schemas for type inferences in the case of typed XML. Static type errors are raised if an expression could fail at run time due to a type safety violation.

XML Indexing

Query execution processes each XML instance at run time. This becomes expensive whenever the XML value is large in size or the query is evaluated on a large number of rows in a table. Consequently, a mechanism for indexing XML columns is provided to speed up queries.

B⁺trees have been extensively used for indexing relational data. A primary XML index on an XML column creates a B⁺tree index on all tags, values, and paths of the XML instances in the column. The primary XML index provides efficient evaluation of queries on XML data and reassembly of the XML result from the B⁺tree, while preserving document order and document structure.

Secondary XML indexes can be created on an XML column to speed up different classes of commonly occurring queries: PATH index for path-based queries, PROPERTY index for property bag scenarios, and VALUE index for value-based queries.

XML Schema Processing

XML columns, variables, and parameters can optionally be typed according to a collection of XML schemas that may be related (for example, by using <xs:import>) or unrelated to one another. Each typed XML instance specifies the target namespace from the XML schema collection it conforms to. During data assignment and modification, the database engine validates the instance according to the XML schema.

XML schema information is used in storage and query optimizations. Typed XML instances contain typed values in the internal, binary representation as well as in XML indexes. This provides efficient processing of typed XML data.

Relational and XML Integration

Users can store both relational and XML data within the same database. Briefly, the database engine knows how to honor the XML data model in addition to the relational data model. Relational data and SQL applications continue to behave correctly upon upgrade to SQL Server 2005. XML data residing in files and text or image columns can be moved into XML data type columns at the server. The XML column can be indexed, queried, and modified by using the XML data type methods.

The database leverages existing relational infrastructure and engine components such as the storage engine and the query processor for XML processing. For example, XML indexes create B⁺trees and query plans can be viewed in Showplan output. Because data management functionality, such as backup/restore and replication, is integrated into the relational framework, this functionality is available on XML data. In addition, the new data management features, such as database mirroring and snapshot isolation, work with the XML data type to provide a seamless user experience.

Structured data should be stored in tables and relational columns. The XML data type is a suitable choice for semi-structured and markup data using XML when the application needs to perform fine-grained query and modification of the data.

FOR XML and OpenXML enhancements

The existing FOR XML functionality has been enhanced in several ways. It works over XML data type instances and other new SQL types such as [n]varchar(max). For more information about FOR XML enhancements, see What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ] on MSDN.

The new TYPE directive generates an XML data type instance that can be assigned to an XML column, variable, or parameter, or queried by using XML data type methods. This allows the nesting of SELECT… FOR XML TYPE statements.

The PATH mode allows users to specify the path in the XML tree where a column's value should appear and—together with the aforementioned nesting—is more convenient to write than FOR XML EXPLICIT.

The directive XSINIL, used in conjunction with ELEMENTS, maps NULL to an element with the attribute xsi:nil="true". Also, the new ROOT directive allows a root node to be specified in all modes of FOR XML. The new XMLSCHEMA directive generates an XSD inline schema. FOR XML in SQL Server 2005 also allows users to specify element names to replace the default <row> in FOR XML RAW mode.

OpenXML functional enhancements consist of the ability to accept the XML data type in sp_preparedocument, and to generate XML and new SQL type columns in the rowset.

Client Access to XML Data Type

Clients can access XML data in the server in several ways. Native SQL client access by using ODBC and OLE DB delivers XML data as a Unicode string. OLE DB also provides ISequentialStream access to XML data for streaming Unicode data.

Managed access via ADO.NET in the .NET Framework 2.0 delivers XML data as a new class called SqlXml. It supports a method named CreateReader() that returns an XmlReader instance to read the returned XML. Equally, DataSet can load instances of the XML data type into columns on the mid-tier that can be edited as an XML document and saved back to SQL Server. Both of these enable SQL queries to be issued to the server to retrieve XML columns for manipulation on the mid-tier.

SOAP access directly to HTTP endpoints in SQL Server 2005 can be used to query, retrieve, and modify XML data.

Both native and managed client technologies provide new interfaces for retrieving the XML schema collection that types an XML column.

Motivating Scenarios for XML Storage

XML data is becoming more pervasive. It can represent customer data, with or without XML schemas that describe the data. The XML data and the XML schemas must be managed together. Quite often, the XML schemas for realistic applications are complex. Mapping such XML schemas to tables and columns is a complex task. Maintaining this mapping over time, when XML schemas change or new ones are added to the system, is troublesome. Quite often, XML data is stored in the file system or in text columns at the database server. Using a text column provides the benefits of data management capabilities, such as replication and backup/restore, but it does not allow query support based on the XML structure of the data. With native XML support, application development using XML becomes faster.

Custom Property Management

Some applications, such as user interface software, allow users to choose from a fixed set of properties. Others allow users to define their own custom properties of interest. These custom properties can be managed nicely if they are stored in XML format. Applications can support more types of properties than just scalar properties. They can also support:

Multi-valued properties on objects, such as multiple phone numbers.
Complex properties; for example, the author property of a document might be the author's contact information.

Object properties can be stored in an XML data type column and indexed for efficient query processing.

Data Exchange and Workflow

XML allows a platform-independent way of exchanging data among applications. The data can be modeled as messages with XML markup. Instead of constantly shredding and generating XML messages, it is prudent to store messages in XML format. This fits well with data flow requirements. An XML message reaching a workflow stage carries the current state. Each message is processed, the progress is recorded in the XML content (for example, state change), and the XML data is forwarded to the next stage of workflow processing. Because messages might be of different types or even semi-structured, and have different XML schemas associated with them, mapping them to tables is not always an easy task.

XML-based standards are emerging for different, vertical domains, such as for financial and geo-spatial data. These standards describe the structure of the data, based on which instance data can be queried and updated. Quite often, the actual data is in binary form, while the XML data provides metadata information regarding them.

As a simple example, to pass a table of input parameters to a stored procedure or function, an application converts the data to XML and passes it in as an XML data type parameter. Within the stored procedure or function, the rowset is regenerated from the XML parameter.

Document Management

Suppose a call center maintains patient records and conversations as an XML document. When a patient calls in, the call center wants to recall the previous conversation to set the context of the incoming call. This is possible by querying XML markup, which benefits the application. Furthermore, viewing the details of conversations that occurred on earlier occasions and recording the current conversation are facilitated.

Documents, such as e-mail messages, are semi-structured in nature. Documents with XML markup are becoming easier to create, for example, with Microsoft Office 2003. These XML documents can be stored in XML columns, indexed, queried, and updated. Thus, developers can do more with native XML support.

Server-Side XML Processing in SQL Server 2005

SQL Server 2005 support consists of providing one database in which you can store both relational and XML data.

The XML Data Type

You can create a table with an XML column by using the usual CREATE TABLE statement. The XML column can then be indexed in a special way.

Untyped XML

The SQL Server 2005 XML data type implements the ISO SQL-2003 standard XML data type. As such, it can store not only well-formed XML 1.0 documents but also so-called XML content fragments with text nodes and an arbitrary number of top-level elements. Checks for well-formedness of the data are performed, which does not require the XML data type to be bound to XML schemas. Data that is not well-formed is rejected.

Untyped XML is useful when the schema is not known a priori so that a mapping-based solution is not possible. It is also useful when the schema is known but mapping to a relational data model is very complex and hard to maintain, or multiple schemas exist and are late bound to the data based on external requirements.

Example: Untyped XML Column in Table

The following statement creates a table named docs with an integer primary key named pk and an untyped XML column named xCol.

CREATE TABLE docs (pk INT PRIMARY KEY, xCol XML not null)

A table can also be created with more than one XML or relational column with or without a primary key.

Typed XML

If you have XML schemas that describe your XML data in an XML schema collection, you can associate the XML schema collection with the XML column to yield typed XML. The XML schemas are used to validate the data, perform more precise type checks than untyped XML during compilation of query and data modification statements, and optimize storage and query processing.

Type information is stored in both the XML BLOB and the XML indexes, and uses more space for XML BLOBs. Typed XML yields better performance with value-based queries (that is, where the search value is more selective than, say, the path in which the value occurs) by avoiding run-time value conversions so that index seeks into the XML indexes become possible.

Typed XML columns, parameters, and variables can store XML documents or content, which you can specify as an option (DOCUMENT or CONTENT, respectively, with CONTENT as the default) at the time of declaration. Furthermore, you must provide the collection of XML schemas. Specify DOCUMENT if each XML instance has exactly one top-level element; otherwise, use CONTENT. The query compiler uses the DOCUMENT flag in type checks to infer Singleton top-level elements.

Example: Typed XML Column in Table

XML columns, variables, and parameters can be bound to a collection of XML schemas. (For more details and examples, see XML schema processing later in this paper.) Suppose myCollection is the name of one such collection. The following statement creates a table named XmlCatalog with an XML column Document typed using myCollection. The typed XML column is also specified to accept XML fragments, not just XML documents.

CREATE TABLE XmlCatalog ( 
   ID INT PRIMARY KEY, 
   Document XML(CONTENT myCollection))

Constraining XML Data Type Columns

In addition to typing an XML column, you can use relational (column or row) constraints on typed and untyped XML data type columns. Most SQL constraints are applicable to XML columns as well. The notable exceptions to this are unique, primary key, and foreign key constraints, since XML data type instances cannot be compared. Thus, you can specify an XML column to be nullable or non-nullable, supply a default value, and define CHECK constraints on the column. For example, an untyped XML column can have a CHECK constraint to verify that the stored XML instances confirm to an XML schema.

Use constraints under the following conditions:

Your business rules cannot be expressed in XML schemas. For example, the delivery address of a flower shop must be within 50 miles of its business location. This can be written as a constraint on the XML column. The constraint may involve XML data type methods.
Your constraint involves other XML or non-XML columns in the table. An example is the enforcing the constraint so that the value of Customer ID (that is, the result of the path expression /Customer/@CustId) in an XML instance equals the value of an integer in the CustomerID column.
You want to limit an XML schema collection to the top-level elements or the schema namespaces that are to be permitted in a typed XML column. This is useful when the same XML schema collection is used to type multiple XML columns and each XML column should contain a different top-level element. It is also useful when the insertion of XML instances conforming to only the latest version of an XML schema is allowed.

Example: Constraining XML Column

To ascertain that the <last-name> of an <author> of a <book> is different from the <author>'s <first-name>, the following CHECK constraint CK_name is specified. XML data type methods must be provided within a user-defined function, for which udf_Check_Names() is introduced.

CREATE FUNCTION udf_Check_Names (@xmlData XML) 
RETURNS int AS 
BEGIN
RETURN (SELECT @xmlData.exist('/book/author[first-name = last-name]'))
END
GO

CREATE TABLE docs (pk INT PRIMARY KEY, 
xCol XML not null
CONSTRAINT CK_name CHECK (dbo.udf_Check_Names(xCol) = 0))
GO

Example: Limiting XML Schema Collection

Suppose the XML column Document in the table XmlCatalog (described in the typed XML column in table example) is to be constrained to allow the storage of only a single <dvdstore> element per row and to prevent the insertion of <bookstore> elements. The following CHECK constraint achieves this.

CREATE FUNCTION udf_Check_Top_Level_Nodes (
     @xmlData XML(CONTENT myCollection)) 
RETURNS int AS 
BEGIN
     RETURN (@xmlData.value ('count(/*)', 'int'))
END

CREATE FUNCTION udf_Check_Dvdstore_Nodes (
     @xmlData XML(CONTENT myCollection)) 
RETURNS int AS 
BEGIN
RETURN (@xmlData.value('declare default element namespace "http://myDVD";
      count(/dvdstore)', 'int'))
END

ALTER TABLE XmlCatalog 
ADD CONSTRAINT dvdstore_only_check 
CHECK (dbo.udf_Check_Top_Level_Nodes (Document) = 1 AND 
      dbo.udf_Check_Dvdstore_Nodes (Document) = 1)

Text Encoding

SQL Server 2005 stores XML data as Unicode (UTF-16). XML data retrieved from the server comes out in UTF-16 encoding as well. If you want a different encoding, you must perform the necessary conversion, either by casting or on the mid-tier, after retrieving the data. For example, you could cast your XML data to varchar type on the server, in which case the database engine serializes the XML with an encoding determined by the collation of the varchar.

Storing XML Data

You can supply an XML value for an XML column, parameter, or variable in multiple ways:

As a character or binary SQL type that is implicitly converted to XML data type.
As the content of a file.
As the output of the XML publishing mechanism FOR XML with the TYPE directive that generates an XML data type instance.

The supplied value is checked for well-formedness. An XML column by default allows both XML documents and XML fragments to be stored. If the data fails the well-formedness check, it is rejected with an appropriate error message.

For typed XML, the supplied value is checked for conformance to XML schemas that are registered with the XML schema collection that is typing the XML column. The XML instance is rejected if it fails this validation. Furthermore, the DOCUMENT flag on typed XML restricts accepted values to XML documents only while CONTENT allows both XML document and content to be supplied.

Example: Inserting Data into Untyped XML Column

The following statement inserts a new row into the table named docs with the value 1 for the integer column pk and a <book> instance for the XML column. The <book> data, supplied as a string, is implicitly converted to XML data type and checked for well-formedness during insertion.

INSERT INTO docs VALUES (1, '<book genre="security"
publicationdate="2002" ISBN="0-7356-1588-2">
   <title>Writing Secure Code</title>
   <author>
      <first-name>Michael</first-name>
      <last-name>Howard</last-name>
   </author>
   <author>
      <first-name>David</first-name>
      <last-name>LeBlanc</last-name>
   </author>
   <price>39.99</price>
</book>')
INSERT INTO docs VALUES (2, 
'<doc id="123">
    <sections>
   <section num="1"><title>XML Schema</title></section>
   <section num="3"><title>Benefits</title></section>
   <section num="4"><title>Features</title></section>
    </sections>
</doc>')

Example: Inserting Data into Untyped XML Column from File

The INSERT statement in the following code segment reads the content of file C:\temp\xmlfile.xml as a BLOB by using OPENROWSET. A new row is inserted into the table named docs with a value of 10 for the primary key and the BLOB for the XML column xCol. A well-formedness check occurs when file content is assigned to the XML column.

INSERT INTO docs 
SELECT 10, xCol
FROM    (SELECT * FROM OPENROWSET 
      (BULK 'C:\temp\xmlfile.xml',
      SINGLE_BLOB) AS xCol) AS R(xCol)

Example: Inserting Data into Typed XML Column

Typed XML columns require that the XML instance data specify the target namespace of the XML schema that is used to type it (the namespace may be empty). In the following code segment, this is done via the namespace declaration xmlns=http://myDVD.

INSERT XmlCatalog VALUES(2, 
'<?xml version="1.0"?>
<dvdstore xmlns="http://myDVD">
  <dvd genre="Comedy" releasedate="2003">
    <title>My Big Fat Greek Wedding</title>
    <price>19.99</price>
  </dvd>
</dvdstore>')

Example: Storing XML Data Generated Using FOR XML with TYPE Directive

FOR XML has been enhanced with a TYPE directive to generate the result as an XML data type instance. The resulting XML can be assigned to an XML column, variable, or parameter. In the following statement, the XML instance generated by using FOR XML TYPE is assigned to an XML data type variable @xVar. The variable can be queried using XML data type methods.

DECLARE @xVar XML
SET     @xVar = (SELECT * FROM docs FOR XML AUTO,TYPE)

Storage Representation

XML data type instances are stored in an internal, binary representation that is streamable and optimized for efficient parsing. Tags are mapped to integer values and the mapped values are stored in the internal representation. This yields some compression of the data, as well.

For untyped XML, node values are stored as Unicode (UTF-16) strings, so that run-time type conversion is required to perform operations. For example, to evaluate the predicate /book/price > 9.99, the value of the book's price is converted to decimal. On the other hand, for typed XML, values are encoded in the type specified in the XML schemas. This makes parsing of the data much more efficient and avoids run-time conversion.

The stored binary form is limited to 2 GB per XML instance, which can accommodate most XML data. Furthermore, the depth of XML hierarchy is limited to 128 levels.

The InfoSet content of the XML data is preserved. It may not be an exact copy of the text XML, since the following information is not retained: insignificant white spaces, order of attributes, namespace prefixes, and XML declaration. (For more information about InfoSet, see the WC3 recommendations for XML information at http://www.w3.org/TR/xml-infoset [ http://www.w3.org/tr/xml-infoset ] ).

Data Modeling Considerations

Quite often, a combination of relational and XML data type columns is appropriate for data modeling. Some of the values from your XML data can be stored in relational columns, and the rest, or the entire XML value, stored in an XML column. This may yield better performance and locking characteristics.

Values within the XML data can be promoted to computed columns in the same table for Singleton values (that is, single-valued properties). A multi-valued property requires a separate table for the property, which must be populated and maintained by using triggers. Queries need to be written directly against the property table.

The granularity of the XML data stored in an XML column is critical for locking and update characteristic. SQL Server employs the same locking mechanism for both XML and non-XML data. When the granularity is large, locking large XML instances for updates causes throughput to decline in a multi-user scenario. On the other hand, severe decomposition loses object encapsulation and raises re-assembly cost.

Querying and Modifying XML Data

Querying XML instances stored in an XML column requires parsing binary XML data in the column. Parsing binary XML is much faster than parsing the text form of the XML data. XML indexing avoids reparsing and is discussed in Indexing XML data later in this paper.

Methods on XML Data Type

You can retrieve entire XML values or you can retrieve parts of XML instances. This is possible by using four XML data type methods that take an XQuery expression as argument: query(), value(), exist() and nodes(). A fifth method, modify(), allows modification of XML data and accepts an XML data modification statement as input.

The query() method is useful for extracting parts of an XML instance. The XQuery expression evaluates to a list of XML nodes. The subtree rooted at each of these nodes is returned in document order. The result type is untyped XML.
The value() method extracts a scalar value from an XML instance. It returns the value of the node the XQuery expression evaluates to. This value is converted to a Transact-SQL type specified as the second argument of the value() method.
The exist() method is useful for existential checks on an XML instance. It returns 1 if the XQuery expression evaluates to non-null node list; otherwise it returns 0.
The nodes() method yields instances of a special XML data type, each of which has its context set to a different node that the XQuery expression evaluates to. The special XML data type supports the query(), value(), nodes(), and exist() methods, and can be used in count(*) aggregations and NULL checks. All other uses result in an error.
The modify() method permits modifying parts of an XML instance, such as adding or deleting subtrees, or replacing scalar values such as the price of a book from 9.99 to a 39.99.

Example: Using Query() Method

Consider the following query on the XML column named xCol of table docs that extracts <section> elements anywhere under the <doc> element whose id is 123. The query also retrieves the value from the integer primary key column. The query() method in the SELECT list is evaluated for each row in the table yielding a sequence of <section> elements, which, together with their subtrees, are retrieved in document order. Each XML instance without a <doc> element having id 123 or without a <section> element below it returns no result; the return value of the query() method is empty XML.

SELECT pk, xCol.query('/doc[@id = 123]//section')   
FROM   docs

Empty XML values can be filtered in an outer SELECT statement. Alternatively, the exist() method may be used, as shown in the next example.

Example: Using Exist() Method

Consider the following query. It involves the query() and exist() methods on the XML column xCol of table docs. The exist() method evaluates the path expression /doc[@id = 123], checking for the existence of a top-level <doc> element that has an attribute named id with the value 123. For each such row, the query() method in the SELECT clause is evaluated; in this example, the query() method yields a sequence of <section> elements anywhere under the <doc> element. Any row that returns 0 from the exist() method is skipped.

SELECT xCol.query('/doc[@id = 123]//section')   
FROM   docs
WHERE  xCol.exist ('/doc[@id = 123]') = 1

Example: Using Value() Method

The following query extracts the title of the third section of a document as Unicode string using the value() method. The SQL type nvarchar(max) of the result is specified as the second argument of the value() method. The XQuery function data() extracts a scalar value from the <title> node.

SELECT xCol.value(
   'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)') 
FROM   docs

Example: Using GROUP BY with XML Data Type Methods

XML data type methods are not allowed in the Transact-SQL GROUP BY clause. However, you can extract values from an XML column in a subquery, alias the grouping columns, and use the aliases in the GROUP BY clause. The following query illustrates this by computing the number of books published by authors with the same first name.

SELECT Fname, count(Fname)
FROM    
  (SELECT nref.value('(author/first-name)[1]', 'nvarchar(max)') Fname
   FROM   docs CROSS APPLY xCol.nodes('/book') T(nref)
   WHERE  nref.exist ('author/first-name') = 1) Result
GROUP BY FName
ORDER BY Fname

Example: Executing SQLCMD

Executing XQuery and XML data modification statements requires that the connection option QUOTED_IDENTIFIER be ON. The default value of this option in SQLCMD is OFF. This must be changed to ON by using the –I switch.

sqlcmd -E -I -d <database> -Q "SELECT xCol.query('//author') FROM docs"

Example: Difference in Behavior Between cast() and value() Method

The query() method returns XML data type instances, which have special XML characters entitized when converted to a string type. A value() method, on the other hand, returns an SQL type value, which does not entitize the special characters. This difference is evident in event notification. While the first of the following two queries may contain entitized carriage returns:

SELECT EVENTDATA().value(' 
(/EVENT_INSTANCE/TSQLCommand/CommandText/text())[1]','nvarchar(max)')

the second query does not:

SELECT CAST (EVENTDATA().query(' 
  /EVENT_INSTANCE/TSQLCommand/CommandText/text()') AS nvarchar(max))

The XQuery Language

There are numerous sources of XML from Office documents stored on the file system, Web services, or configuration files. In fact, data is being increasingly generated either in XML format or as virtual XML documents. To cope with this increasing amount of data with a powerful query language, XQuery was conceived. The justification for XQuery is described in the W3C XQuery Language specification [ http://www.w3.org/tr/xquery ] as:

A query language that uses the structure of XML intelligently can express queries across all kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
XQuery is designed to meet the requirements identified by the W3C XML Query Working Group XML Query 1.0 Requirements and the use cases in XML Query Use Cases. It is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.
XQuery can also be summarized in the following statement: The XQuery language is to XML as the SQL language is to relational databases.

A subset of XQuery [ http://www.w3.org/tr/xquery/ ] embedded in Transact-SQL is the language supported for querying the XML data type. The language is under development (currently in Last Call) by the World Wide Web Consortium (W3C) with the participation of all major database vendors including Microsoft Corporation. Our implementation is aligned with the July 2004 draft of XQuery.

XQuery includes XPath 2.0 as a navigation language. The implementation of XQuery by SQL Server 2005 provides constructs for iteration over nodes (FOR), node check (WHERE), returning values (RETURN), and sorting (ORDER BY). It also provides element construction for reshaping data during querying.

SQL Server 2005 also provides language constructs for data modification language (DML) of the XML data type. (For more information, see XML Data Modification later in this document). The following example shows how to use XQuery on the XML data type.

Example: Using Rich Language Constructs in XQuery

The following query shows several XQuery language constructs used together. It returns the title, wrapped in a new tag <topic>, of sections with section number 3 and higher from a document with id 123.

SELECT pk, xCol.query('
   FOR $s in /doc[@id = 123]//section
   WHERE $s/@num >= 3
   RETURN <topic>{data($s/title)}</topic>')   
FROM docs

The FOR iterates over all <section> elements under <doc> elements with id 123, and binds each such <section> to the variable $s. The WHERE clause ensures that the section number (@num attribute of the <section> element) is 3 or higher. The query returns the value of the section <title> in document order wrapped in a constructed element called <topic>.

Query Compilation and Execution

The SQL statement is parsed by the SQL parser. When it encounters the XQuery expression, it jumps into the XQuery compiler, which then compiles the XQuery expression. This yields a query tree that is grafted into the tree for the overall query.

The overall query tree undergoes query optimization and produces a physical query plan that it picked based on cost-based estimates. The Showplan output shows mostly relational operators and some new operators such as UDX for XML processing.

Query execution is tuple-oriented as in the rest of the relational framework. The WHERE clause is evaluated on each row of the table named docs; this involves parsing the XML BLOB at run time to evaluate XML data type methods. If the condition is satisfied, the row is locked and the SELECT clause is evaluated in the row. The result is produced as XML data type for query() method and converted into the specified target type for value() method.

If, on the other hand, the row does not satisfy the conditions in the WHERE clause, it is skipped and execution moves to the next row.

XML Data Modification

SQL Server 2005 provides constructs for data modification as an extension to XQuery. Subtrees can be inserted before or after a specified node, or as the leftmost or rightmost child. Furthermore, a subtree can be inserted into a parent node, in which case it becomes the rightmost child of the parent. Attribute, element, and text node insertions are all supported.

Deletion of subtrees is supported. In this case, the entire subtree is removed from the XML instance.

Scalar values can be replaced with new scalar values.

Example: Insertion of Subtree into XML Instances

This example shows the use of the modify() method to insert a new <section> element to the right of the <section> element whose number is 1.

UPDATE docs SET xCol.modify('
  insert 
   <section num="2">   
        <title>Background</title>
   </section>
  after (/doc//section[@num=1])[1]')

Example: Update Price of this Book to $49.99

The following UPDATE statement replaces the <price> of a book whose ISBN is 1-8610-0311-0 to $49.99. The XML instance is typed with the XML schema http://myBooks, hence the namespace declaration in the XML data modification statement.

UPDATE XmlCatalog
SET    Document.modify ('
    declare namespace bk = "http://myBooks";
    replace value of (/bk:bookstore/bk:book
[@ISBN="1-861003-11-0"]/bk:price)[1] with 49.99')

Type Checking and Static Errors

XQuery introduces type checking. The compilation phase checks the static type correctness of XQuery expressions and data modification statements, and uses XML schemas for type inferences in case of typed XML. It raises static type errors if an expression could fail at run time due to a type safety violation. Examples of static errors are the addition of a string to an integer, receiving a sequence of values where the operation expects a single value, and querying for a non-existent node for typed data. Explicit casting to the proper type is a workaround for static errors resulting from type mismatches. XQuery run-time errors are converted into empty sequences.

Location steps, function parameters, and operators (for example, eq) that require Singletons return an error if the compiler cannot determine whether a Singleton is guaranteed at run time. The problem arises often with untyped data. For example, the lookup of an attribute requires a Singleton parent element; an ordinal selecting a single parent node is adequate.

Example: Type Checks in value() Method

The following query on an untyped XML column requires an ordinal specification on //author/last-name since the value() method expects a Singleton node as the first argument. Without it, the compiler cannot determine whether only one <last-name> node will occur at run time.

SELECT xCol.value('(//author/last-name)[1]',
   'nvarchar(50)') LastName
FROM   docs

The evaluation of the node()-value() combination to extract attribute values may not require the ordinal specification, as shown in the next example.

Example: Known Singleton

The nodes() method in the following statement generates a separate row for each <book> element. The value() method evaluated on a <book> node extracts the value of @genre, which, being an attribute, is a Singleton.

SELECT nref.value('@genre', 'varchar(max)') LastName
FROM   docs CROSS APPLY xCol.nodes('//book') AS R(nref)

Binding Relational Data in XQuery and XML DML

When your data resides in a combination of relational and XML data type columns, you might want to write queries that combine relational and XML data processing. You can convert the data in relational and XML columns to an XML data type instance using FOR XML with the TYPE directive and query it by using XQuery. Conversely, you can generate a rowset from XML values and query it by using Transact-SQL.

A more convenient and efficient way of writing cross-domain queries is to use the value of an SQL variable or column within XQuery or XML data modification expressions as follows:

Apply the value of a SQL variable in your XQuery or XML DML expression using sql:variable().
Use the values from a relational column in XQuery or XML DML context with sql:column().

This approach allows applications to parameterize queries, as shown in the following example. sql:column() is used in a similar way and provides additional benefits. Indexes over the column can be used for efficiency as decided by the cost-based query optimizer. Furthermore, a computed column can be used.

XML and user-defined types cannot be used with sql:variable() and sql:column().

Example: Binding Relational Data Using sql:variable()

In this query, the ISBN of a <book> element is passed in using a SQL variable named @isbn. Instead of using a constant, sql:variable() supplies the value of the ISBN, and the query can be used to search for any ISBN, not just the one whose ISBN is 0-7356-1588-2.

DECLARE @isbn varchar(20)
SET     @isbn = '0-7356-1588-2'
SELECT  xCol
FROM    docs
WHERE   xCol.exist ('/book[@ISBN = sql:variable("@isbn")]') = 1

Example: Specifying Ordinal Position Using sql:variable()

The following query retrieves the <last-name> of the N-th <author> of a <book> where the value of N is specified by using sql:variable():

DECLARE @aNum int
SET     @aNum = 2
SELECT  xCol.value ('(/book/author,
    [sql:variable("@aNum")]/last-name)[1]''nvarchar(64)')
FROM    docs

Rowset Generation from XML Data

In custom property management and data interchange scenarios, applications quite often map some part of the XML data to a rowset. For example, to pass a table of input parameters to a stored procedure or function, an application converts the data to XML and passes it in as an XML data type parameter. Within the stored procedure or function, the rowset is regenerated from the XML parameter.

SQL Server 2000 provides OpenXml() for this purpose. It is a facility for generating a rowset from an XML instance by specifying the relational schema for the rowset and how values inside the XML instance map to columns in the rowset.

Alternatively, the nodes() method can be used to generate node contexts within an XML instance, and use the node contexts in value(), query(), exist(), and nodes() methods to generate the desired rowset. The nodes() method accepts an XQuery expression, evaluates it on each XML instance in an XML column, and uses XML indexes effectively. The next example illustrates the use of the nodes() method for rowset generation.

Example: Extract Properties from XML Instance

Suppose you want to extract first and last names of authors, whose first name is not "David", as a rowset consisting of two columns, FirstName and LastName. Using the nodes() and value() methods, you can achieve this as follows:

SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
       nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM   docs CROSS APPLY xCol.nodes('//author') AS R(nref)
WHERE  nref.exist('.[first-name != "David"]') = 1

In this example, nodes('//author') yields a rowset of references to <author> elements for each XML instance. The first and last names of authors are obtained by evaluating value() methods relative to those references. For good performance, the XML column should be indexed, which is the topic of the next section.

Example: Extract Properties from XML Variable

The CROSS APPLY operator in the previous query is not required when properties are extracted from an XML variable or parameter. This example considers an XML variable named @xVar to which a <book> instance is assigned and retrieves the <first-name> and <last-name> of authors.

DECLARE @xVar XML
SET @xVar = 
  '<book genre="security" publicationdate="2002" ISBN="0-7356-1588-2">
   <title>Writing Secure Code</title>
   <author>
      <first-name>Michael</first-name>
      <last-name>Howard</last-name>
   </author>
   <author>
      <first-name>David</first-name>
      <last-name>LeBlanc</last-name>
   </author>
   <price>39.99</price>
   </book>'

SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
       nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM   @xVar.nodes('//author') AS R(nref)
WHERE  nref.exist('.[first-name != "David"]') = 1

Indexing XML Data

XML data is stored in an internal binary form and can be up to 2 GB in storage. Each query parses the XML blob at run time one or more times in each row of the table. This makes query processing slow. If querying is common in the workload, it is beneficial to index the XML column, although the cost of XML index maintenance during data modification must be taken into account.

XML indexes are created by using a new DDL statement on typed and untyped XML columns. This creates a B⁺tree for all XML instances in the column. The first index on an XML column is the primary XML index. Using it, three types of secondary XML indexes are supported on the XML column to speed up common classes of queries, as described in the following section.

Primary XML Index

The primary XML index requires a clustered index on the primary key of the base table (that is, the table in which the XML column is defined). It creates a B⁺tree on a subset of the Infoset items of XML nodes. Columns of the B⁺tree represent tags such as element and attribute names, node values, and node types. Other columns capture the document order and structure in the XML data, and the path from the root of the XML instance to each node for efficient evaluation of path expressions. The primary key of the base table is duplicated in the primary XML index to correlate index rows with base table rows.

Tags and type names given in XML schemas are mapped to integer values and the mapped values are stored in the B⁺tree to optimize storage. The path column in the index stores a concatenation of the mapped values in reversed order, that is, from a node to the root of the XML instance. The reverse representation allows path values to be matched when the path suffix is known (for example, in a path expression such as //author/last-name).

If the base table is partitioned, the primary XML index is partitioned the same way; that is, using the same partitioning function and partitioning scheme.

Full XML instances are retrieved from XML columns (for example, SELECT * FROM docs or SELECT xCol FROM docs). Queries involving XML data type methods use the primary XML index, returning scalar values, or XML subtrees from the index itself.

Example: Creating Primary XML Index

The following statement creates an XML index named idx_xCol on the XML column xCol of table docs.

CREATE PRIMARY XML INDEX idx_xCol on docs (xCol)

Secondary XML Indexes

Once the primary XML index is created, secondary XML indexes can be created to speed up different classes of queries within a workload. Three types of secondary XML indexes—PATH, PROPERTY, and VALUE—benefit path-based queries, custom property management scenarios, and value-based queries, respectively.

The PATH index builds a B+-tree on the columns (path, value) of the primary XML index. A value for the path is computed from a path expression; a node's value, if one is provided, is used as well. Since the leading fields of the PATH index are known, index seek into the PATH index speeds up evaluation of the path expression. The most common case is the use of the exist() method on XML columns in the WHERE clause of a SELECT statement.
The PROPERTY index creates a B+-tree on the columns (PK, path, value) of the primary XML index, where PK is the primary key of the base table. This index benefits property value lookups within an XML instance.
The VALUE index creates a B+-tree on the columns (value, path) of the primary XML index. This index benefits queries where a node's value is known but its path is imprecisely specified in the query. This typically occurs with descendant axes lookups, such as //author/last-name [. ="Howard"], where <author> elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as /book [@* = "novel"], where the query looks for <book> elements with some attribute having the value "novel". Furthermore, the VALUE index is useful for value-based range scan of typed XML.

Up to 128 levels of the XML hierarchy are accommodated; XML instances containing longer paths are rejected during insertion and modification.

Similarly, up to the first 128 bytes of a node's value are indexed; longer values are accommodated within the system and are not indexed.

Example: Path-Based Lookup

Suppose the following query is common in your workload.

SELECT xCol
FROM   docs
WHERE  xCol.exist ('/book[@genre = "security"]') = 1

The path expression /book/@genre and the value securitycorrespond to the key fields of the PATH index. Consequently, a secondary XML index of type PATH is helpful for this workload.

CREATE XML INDEX idx_xCol_Path on docs (xCol)
   USING XML INDEX idx_xCol FOR PATH

Example: Fetching Properties of an Object

Consider the following query that retrieves the properties genre, title, and ISBN of a book from each row in table T.

SELECT xCol.value ('(/book/@genre)[1]', 'varchar(50)'),
    xCol.value ('(/book/title)[1]', 'varchar(50)'),
    xCol.value ('(/book/@ISBN)[1]', 'varchar(50)')
FROM    docs

The property index is useful in this case and is created as follows:

CREATE XML INDEX idx_xCol_Property on docs (xCol)
   USING XML INDEX idx_xCol FOR PROPERTY

Example: Value-Based Query

In the following query, the descendant-or-self axis (//-operator) specifies a partial path so that the lookup based on the value of ISBN benefits from the use of the VALUE index.

SELECT xCol
FROM    docs
WHERE    xCol.exist ('//book/@ISBN[. = "0-7356-1588-2"]') = 1

The VALUE index is created as follows:

CREATE XML INDEX idx_xCol_Value on docs (xCol)
   USING XML INDEX idx_xCol FOR VALUE

Content Indexing

You can create a full-text index on XML columns; this indexes the content of the XML values while ignoring the XML markup. Attribute values are not full-text indexed (since they are considered part of the markup) and element tags are used as token boundaries. You can create both XML and full-text indexes on an XML column, and combine full-text search with XML index usage. Full-text search using CONTAINS() and XQuery fn:contains() have different semantics. The latter as implemented is a case-sensitive, substring search, while the former is a token match using stemming.

Use the full-text index as the first filter to narrow down the choices and then apply XQuery to filter further. The overall semantics is that of case-sensitive substring searches. Alternatively, a part of the XML data can be promoted to a computed XML column XC on which a full-text index is created. This defines XC as the full-text search context.

Example: Create Full-Text Index on XML Column

The steps for creating full-text index on an XML column are the same as those for other SQL type columns. A unique key column on the base table is required. The DDL statements are as follows, in which PK__docs__7F60ED59 is the single-column primary key index of the table.

CREATE FULLTEXT CATALOG ft AS DEFAULT
CREATE FULLTEXT INDEX ON dbo.docs (xCol) KEY INDEX PK__docs__7F60ED59

Example: Combining Full-Text Search with XML Querying

The following query checks that an XML value contains the word Secure in the title of a book.

SELECT * 
FROM   docs 
WHERE  CONTAINS(xCol,'Secure') 
AND    xCol.exist('/book/title/text()[fn:contains(.,"Secure")]') =1

The CONTAINS() method uses the full-text index to subset the XML values that contain the word Secure anywhere in the document. The exist() method ensures that the word Secure occurs in the title of a book.

Example: Search Context Defined by XML Data Type Methods

An alternative to the exist() method is to use the LIKE operator on the text value of the context node as shown in the following code. This yields a case-insensitive, substring search.

SELECT pk, xCol 
FROM     docs CROSS APPLY xCol.nodes('/book/title/text()') title(tRef)
WHERE  CONTAINS (xCol, 'Secure')   
AND    title.tRef.value ('.', 'NVARCHAR(MAX)') LIKE '%Secure%'

Query Execution Using XML Indexes

XML indexes speed up query execution. Queries are always compiled against the primary XML index on an XML column, if one exists. A single query plan is produced for the entire query (both relational and XML parts), which is optimized by using the database engine's cost-based optimizer. Secondary XML indexes are selected based on the query optimizer's cost estimates.

Catalog Views for XML Indexes

Catalog views exist to provide metadata information regarding XML indexes. The catalog view sys.indexes contains entries for XML indexes with the index "type" 3. The name column contains the name of the XML index.

XML indexes are also recorded in the catalog view sys.xml_indexes, which contains all the columns of sys.indexes and a few special ones meaningful for XML indexes. The value NULL in the column secondary_type indicates a primary XML index; the values 'P', 'R', and 'V' stand for PATH, PROPERTY, and VALUE secondary XML indexes, respectively. The column secondary_type_desc contains NULL for a primary XML index, and the strings "PATH," "PROPERTY," and "VALUE" for the three types of secondary XML indexes.

Space usage of XML indexes can be found by using the table-valued function sys.dm_db_index_physical_stats(). This function provides information such as the number of disk pages occupied, average row size in bytes, number of records, and other information for all index types, including XML indexes. This information is available for each database partition; XML indexes use the same partitioning scheme and partitioning function of the base table.

Example: Space Usage of XML Indexes

SELECT sum (page_count)
FROM    sys.dm_db_index_physical_stats (db_id(), object_id('docs'), 
DEFAULT, DEFAULT, 'DETAILED') SDPS
    JOIN sys.xml_indexes SXI ON (SXI.index_id = SDPS.index_id)
WHERE    SXI.name = 'idx_xCol_Path'

This statement yields the number of disk pages occupied by the XML index idx_xCol_Path in table T across all partitions. Without the sum() function, the result would return the disk page usage per partition.

XML Schema Processing

XML schemas are optional in the system. As previously mentioned, XML data not bound to XML schemas is considered to be untyped—XML node values are stored as Unicode strings, and XML instances are checked for well-formedness. An untyped XML column can be indexed.

XML typing is done by associating an XML data type with XML schemas that are registered with an XML schema collection. A new DDL statement allows the creation of an XML schema collection with which one or more XML schemas may be registered. An XML column, parameter, or variable bound to an XML schema collection is typed according to all the XML schemas in the collection. Within an XML schema collection, the type system identifies each XML schema using its target namespace.

Each top-level XML element in an XML instance must specify a possible empty target namespace it conforms to. Data is validated during insertion and modification according to the target namespace of each top-level element. The binary XML representation encodes typed values based on the associated XML schema information and is fully described, so that reparsing it is more efficient as compared to reparsing untyped XML. Values are properly typed in XML indexes as well (for example, /book/price is stored as decimal if it is defined in the XML schema as xs:decimal).

During query compilation, XML schemas are used for type checking and static errors are issued for type mismatches. The query compiler also uses XML schemas for query optimizations.

The database engine's metadata subsystem contains XML type information such as XML schema collections and their contained XML schemas, and the mapping between the primitive XSD and relational type systems. Almost all of the W3C XML Schema 1.0 specification is supported. (For more information on the W3C XML Schema 1.0 specification, see http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ [ http://www.w3.org/tr/2001/rec-xmlschema-1-20010502/ ] and http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/ [ http://www.w3.org/tr/2001/rec-xmlschema-2-20010502/ ] ) Comments and annotations in XML schema documents are not preserved, and key/keyref is not supported.

XML Schema Collections

An XML schema collection is a metadata entity, scoped by a relational schema. It contains one or more XML schemas that may be related (for example, using <xs:import>) or unrelated. Individual XML schemas within an XML schema collection are identified by using their target namespace. XML schema collections are securable entities, much like tables.

An XML schema collection is created by using CREATE XML SCHEMA COLLECTION syntax and providing one or more XML schemas. You can then type an XML column by using the XML schema collection. This design yields a flexible data model in which XML that is typed according to different XML schemas can be stored in the same column. This is especially convenient when the number of XML schemas is large. Furthermore, this design supports XML schema evolution to some extent.

Additionally, the option DOCUMENT or CONTENT on a typed XML column specifies whether XML trees or fragments, respectively, can be stored in the XML column. The default behavior is CONTENT. For DOCUMENT, each XML instance must specify the target namespace of its top-level element, according to which it is validated and typed. For CONTENT, on the other hand, each top-level element can specify any one of the target namespaces in the XML schema collection. The XML instance is validated and typed according to all the target namespaces occurring in the instance.

Example: Creating an XML Schema Collection

Suppose you want to use an XML schema with target namespace http://myBooks to type your XML instances. Create an XML schema collection named myCollection and supply the XML schema as the content of myCollection as shown in the following code segment.

CREATE XML SCHEMA COLLECTION myCollection AS 
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
  xmlns="http://myBooks" 
   elementFormDefault="qualified"
  targetNamespace="http://myBooks">
  <xsd:element name="bookstore" type="bookstoreType" />
  <xsd:complexType name="bookstoreType">
   <xsd:sequence maxOccurs="unbounded">
     <xsd:element name="book" type="bookType" />
   </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="bookType">
   <xsd:sequence>
     <xsd:element name="title" type="xsd:string" />
     <xsd:element name="author" type="authorName" />
     <xsd:element name="price" type="xsd:decimal" />
   </xsd:sequence>
   <xsd:attribute name="genre" type="xsd:string" />
   <xsd:attribute name="publicationdate" type="xsd:string" />
   <xsd:attribute name="ISBN" type="xsd:string" />
  </xsd:complexType>
  <xsd:complexType name="authorName">
   <xsd:sequence>
     <xsd:element name="first-name" type="xsd:string" />
     <xsd:element name="last-name" type="xsd:string" />
   </xsd:sequence>
  </xsd:complexType>
</xsd:schema>'

A new metadata entity is created for myCollection in which the XML schema is registered. A new row can be added to the table named XmlCatalog (for the table definition, see Example: typed XML column in table earlier in this document) as follows:

INSERT XmlCatalog VALUES(1, '<?xml version="1.0"?>
<bookstore xmlns="http://myBooks">
  <book genre="autobiography" publicationdate="1981" 
      ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <price>8.99</price>
  </book>
  <book genre="novel" publicationdate="1967" 
      ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <price>11.99</price>
  </book>
  <book genre="philosophy" publicationdate="1991" 
      ISBN="1-861001-57-6">
    <title>The Gorgias</title>
    <author>
      <first-name>Sidas</first-name>
      <last-name>Plato</last-name>
    </author>
    <price>9.99</price>
  </book>
</bookstore>
')

Modifying XML Schema Collections

The ALTER XML SCHEMA COLLECTION statement supports extending an XML schema in an XML schema collection with new top-level schema components, and registering new XML schemas with the XML schema collection. This is illustrated in the following example.

Example: Altering an XML Schema Collection

The following statement shows how a new XML schema with target namespace http://myDVD can be added to the XML schema collection myCollection.

ALTER XML SCHEMA COLLECTION myCollection ADD
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
  xmlns="http://myDVD" 
   elementFormDefault="qualified"
  targetNamespace="http://myDVD">
  <xsd:element name="dvdstore" type="dvdstoreType" />
  <xsd:complexType name="dvdstoreType">
   <xsd:sequence maxOccurs="unbounded">
     <xsd:element name="dvd" type="dvdType" />
   </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="dvdType">
   <xsd:sequence>
     <xsd:element name="title" type="xsd:string" />
     <xsd:element name="price" type="xsd:decimal" />
   </xsd:sequence>
   <xsd:attribute name="genre" type="xsd:string" />
   <xsd:attribute name="releasedate" type="xsd:string" />
  </xsd:complexType>
</xsd:schema>'

Catalog Views for XML Schema Collections

SQL catalog views of XML schema collections allow users to reconstruct the contents of individual XML schema namespaces. XML schema collections are enumerated in the catalog view sys.xml_schema_collections. The XML schema collection "sys" is defined by system and contains predefined namespaces that can be used in all user-defined XML schema collections without having to load them explicitly. This list contains the namespaces for xml, xs, xsi, fn, and xdt.

Two other catalog views worth mentioning are sys.xml_schema_namespaces, which enumerates all namespaces within each XML schema collection; and sys.xml_schema_components, which enumerates all XML schema components within each XML schema.

The built-in function XML_SCHEMA_NAMESPACE() accepts the names of a relational schema, an XML schema collection, and optionally the target namespace of an XML schema. It returns an XML data type instance containing the XML schema. If the target namespace argument is left out, the built-in function returns an XML instance that contains all the XML schemas in the XML schema collection, except for the predefined XML schemas.

Example: Enumerate XML Namespaces in XML Schema Collection

Use the following query for the XML schema collection myCollection.

SELECT XSN.name
FROM    sys.xml_schema_collections XSC 
    JOIN sys.xml_schema_namespaces XSN ON 
        (XSC.xml_collection_id = XSN.xml_collection_id)
WHERE    XSC.name = 'myCollection'

Example: Output a Specified XML Schema from an XML Schema Collection

The following statement outputs the XML schema with target namespace http://myBooks from the XML schema collection myCollection within the (relational) schema dbo.

SELECT XML_SCHEMA_NAMESPACE (N'dbo', N'myCollection', 
   N'http://myBooks')

Access Control on XML Schema Collections

XML schema collections can be secured like any SQL object by using the security model in SQL Server 2005. You can grant a principal the privilege to create XML schema collections within a database. Each XML schema collection supports the permissions ALTER, CONTROL, TAKE OWNERSHIP, REFERENCES, EXECUTE, and VIEW DEFINITION.

The ALTER permission is required to execute an ALTER XML SCHEMA COLLECTION statement.
The TAKE OWNERSHIP permission is required to transfer ownership of the XML schema collection from one principal to another by executing an ALTER AUTHORIZATION statement.
The REFERENCES permission authorizes the principal to use the XML schema collection wherever schema binding is required, such as to type or constraint XML columns and parameters.
The EXECUTE permission is required to validate values inserted or updated by the principal against the XML schema collection. It is also required to query values from a typed XML column, variable, or parameter using the XML data type.
The VIEW DEFINITION permission allows the principal to access rows in catalog views corresponding to the XML schema collection, all XML schemas contained in it, and all schema components contained in those XML schemas.
The CONTROL permission gives the principal the permission to perform any operation on the XML schema collection, including dropping the XML schema collection by using the DROP XML SCHEMA COLLECTION statement. It implies other permissions on the XML schema collection.

Permission on an XML schema collection is required in addition to other permissions on a table or an XML column. For example, to create a table T with an XML column X typed according to an XML schema collection C, the principal requires permission to create tables and REFERENCES permission on the XML schema collection C. A principal with permission to insert data into column X can do so provided the principal has EXECUTE permission on the XML schema collection C. Similarly, a principal requires SELECT permission on column X and EXECUTE permission on C to query the data in column X using XML data type methods. However, SELECT permission on X is adequate to retrieve entire XML values from column X, such as in SELECT X FROM T or SELECT * FROM T.

Permissions can be revoked from a principal, and a principal can be denied permissions as allowed by the security model of SQL Server 2005.

Visibility of Catalog Views

A principal having ALTER, TAKE OWNERSHIP, REFERENCES, VIEW DEFINITION, or CONTROL permission on an XML schema collection can access catalog view rows for the XML schema collection, its contained XML schemas, and their XML schema components. With any of these permissions, the principal can also access the contents of the XML schema collection by using the built-in function XML_SCHEMA_NAMESPACE() and in FOR XML… XMLSCHEMA.

If the principle is denied VIEW DEFINITION permission, the principal cannot access the XML schema collection in catalog views, or using XML_SCHEMA_NAMESPACE() or FOR XML… XMLSCHEMA.

Enhancements to FOR XML

The TYPE directive generates an XML data type instance that can be assigned to an XML column, variable, or parameter, or queried using XML data type methods. This allows the nesting of SELECT… FOR XML TYPE statements.

The PATH mode allows users to specify the path in the XML tree where a column's value should appear and—together with the aforementioned nesting—is more convenient to write than FOR XML EXPLICIT. However, it may not perform as well for deep hierarchies.

The directive XSINIL used in conjunction with ELEMENTS maps NULL to an element with the attribute xsi:nil="true". The new ROOT directive allows a root node to be specified in all modes of FOR XML. The new XMLSCHEMA directive generates an XSD inline schema.

Performance Guidelines

The XML data model is richer and more complex than the relational one. Not only does the XML data model allow you to model complex data, but it also must preserve hierarchical relationships and document order within the data. Document order is maintained by sorting based on XML node identifiers; this simultaneously maintains hierarchical relationships. These contribute to a more complex query plan.

Structured data should be stored in relational columns of tables for better performance. Choose the XML data model for modeling needs when your data is semi-structured, or unstructured, and contains XML markup but not with the expectation of better performance. XML schemas aid in query optimization.

XML Support in SQL Server CLR

Using the SQL Server CLR (common language runtime) support, you can write server-side logic in managed code to enforce business rules. This business logic can be added to XML data in several ways:

You can write SQLCLR functions in managed code to which you pass XML values, and use XML processing capabilities provided by System.Xml namespace. An example is to apply an XSL transformation to XML data, as shown in the next code example. Alternatively, you can deserialize the XML into one or more managed classes and operate on them using managed code.
You can write Transact-SQL stored procedures and functions that invoke processing on the XML column for your business needs.

Example: Applying an XSL Transformation

Consider a CLR function TransformXml.ApplyXslTransform() that accepts an XML data type instance and the file path for an XSL transformation, applies the transformation to the XML data, and returns the transformed XML in the result. A skeleton function written in Microsoft Visual C# is as follows:

using System;
using System.Data.SqlTypes;
using System.Xml;
using System.Xml.XPath;
using System.Xml.Xsl;

public class TransformXml
{
  public static SqlXml ApplyXslTransform (SqlXml XmlData,
  string xslPath){
   // Load XSL transformation
      XslCompiledTransform xform = new XslCompiledTransform();
      xform.Load (xslPath);

   System.IO.MemoryStream ms = new System.IO.MemoryStream ();
      xform.Transform (XmlData.CreateReader(), null, ms);
      ms.Seek (0, System.IO.SeekOrigin.Begin);
   
// Return the transformed value
SqlXml retSqlXml = new SqlXml(ms);
      return (retSqlXml);
  }
}

The transformed result is written into the in-memory stream ms. The stream pointer is reset using the ms.Seek() call before passing ms to the SqlXml class constructor.

The assembly must be registered in the database by using the CREATE ASSEMBLY statement. A user-defined Transact-SQL function SqlXslTransform() corresponding to ApplyTransformXml() must be created by using the statement CREATE FUNCTION. For more information about these statements, see SQL Server 2005 Books Online. Then the SQL function can be invoked from Transact-SQL as in the following query.

SELECT SqlXslTransform (xCol, 'C:\temp\xsltransform.xsl')
FROM docs
WHERE  xCol.exist('/book/title/text()[contains(.,"secure")]') =1

The query result contains a rowset of the transformed XML.

SQLCLR opens up a whole new world that can be used for decomposing XML data into tables or property promotion, and querying XML data using managed classes in the System.Xml namespace. For more information, see SQL Server 2005 Books Online and Visual Studio 2005 Books Online.

Client-Side XML Processing in SQL Server 2005

Client-Side Support for the XML Data Type

Client access to the XML data type is provided by using ADO.NET, SQL native client (SQLNCLI), and SOAP over HTTP. The first two are discussed below; for SOAP access, see SQL Server Books Online.

ADO.NET XML Support in the .NET Framework V2.0

The XML data type is exposed as a class SqlXml in the System.Data.SqlTypes namespace from the SqlDataReader.GetSqlXml() method. You can obtain XmlReader from the SqlXml object by using the SqlXml.CreateReader() function.

The three-part name of the XML schema collection typing an XML column can be obtained from the XML column metadata (by using GetSchemaTable() or GetSqlMetaData (int) on the SqlDataReader object) as three properties indicating the names of the database (XmlSchemaCollectionDatabase), relational schema (XmlSchemaCollectionOwingSchema), and the XML schema collection (XmlSchemaCollectionName).

A new schema rowset named XMLSCHEMA is available for clients to retrieve XML schemas from the server. The XMLSCHEMA rowset contains three columns for an XML schema collection, target namespace, and the XML schema content itself.

The following examples show skeletal code for managed access to XML data type.

Example: In-Process Access to XML Data Type

The following Visual C# code illustrates how the XML data type can be accessed from the in-process provider. The context connection in the example allows you to execute Transact-SQL statements in the same context that the CLR code was invoked. For out-of-process access, a new connection to the database must be established.

using System;
using System.Xml;
using System.Data;
using System.Data.SqlTypes;
using System.Data.SqlClient;

class xmldtADONETReadAccessInProc
{
   static void ReadXmlDataType () {
      // in-proc connection to server
      SqlConnection conn = 
         new SqlConnection("context connection=true");

      // prepare query to select xml data
      SqlCommand cmd = conn.CreateCommand();
      cmd.CommandText = 
"SELECT xCol.query('//section') FROM docs";

      // execute query and retrieve incoming data
      SqlDataReader r = cmd.ExecuteReader();
      r.Read();

      // access XML data type field in rowset
      SqlXml xml = r.GetSqlXml(0);
      new XmlTextWriter(Console.Out).WriteNode( 
xml.CreateReader(), true);
   }
}

Example: Accessing the Result of FOR XML TYPE

FOR XML with the TYPE directive yields an XML data type instance that a managed client can retrieve by using the SqlXml class. Thus, the code in the previous example remains the same when the cmd.CommandText is modified as follows:

cmd.CommandText = 
   "SELECT xCol.query('//section') FROM docs FOR XML AUTO, TYPE";

Example: Updating XML Data Type Column Using SQL Client Provider

The following code segment shows a method WriteXmlDataType() that replaces the value in an XML column using the SQL client provider. The code for the in-process provider is similar.

using System;
using System.Xml;
using System.Data;
using System.Data.SqlTypes;
using System.Data.SqlClient;

class xmldtADONETUpdateAccess
{
   static void WriteXmlDataType () {
      // connection to server
      SqlConnection conn = new SqlConnection("server=server1;" +
" database=XMLtest; Integrated Security=SSPI");
      conn.Open();
   // update XML column at the server
   SqlCommand cmd = conn.CreateCommand();
   cmd.CommandText = "UPDATE docs SET xCol=@x WHERE id=1";
// set value of XML parameter
   SqlParameter p = cmd.Parameters.Add("@x", SqlDbType.Xml);
p.Value = new SqlXml(new XmlTextReader("<hello/>", 
XmlNodeType.Document, null));

// execute update and close connection
   cmd.ExecuteNonQuery();
conn.Close();
   }
}

SQL Native Client Access

In the OLE DB provider of the new SQL native access (SQLNCLI), an XML data type column can be retrieved as Unicode text (DBTYPE_XML, DBTYPE_BYTES, DBTYPE_BSTR, DBTYPE_WSTR, and DBTYPE_VARIANT) or as a Unicode character stream (DBTYPE_UNKNOWN) by using ISequentialStream. The default is DBTYPE_XML. XML data can be sent to the server in UTF-16 encoding, in which case the application must ensure that it is already UTF-16 encoded. The first byte must be a byte-order mark 0xFFFE. An application can send XML data to the server in some other encoding, provided the encoding is compatible with that of the database server. Encoding specified within the XML data is honored. XML should be sent as binary data in all other cases.

The three-part XML schema collection name is carried in three new columns of COLUMNS schema rowset returned by IDBSchemaRowset::GetRowset(): SS_XML_SCHEMACOLLECTION_CATALOGNAME gives the name of the catalog; SS_XML_ SCHEMACOLLECTION SCHEMANAME, the name of the relational schema in which the XML schema collection resides; and SS_XML_SCHEMACOLLECTIONNAME, the name of the XML schema collection. These names are of type DBTYPE_WSTR. These columns have NULL values for untyped XML column for both data retrieval and update.

Similar changes have been made to PROCEDURE_PARAMETERS schema rowset and IColumnRowset:GetColumnRowset().

To retrieve the contents of the XML schema collection, the client can make a separate access to the server by using these names in a call to XML_SCHEMA_NAMESPACE() and get back XML schemas as XML data type. Alternatively, IDBSchemaRowset with the new SS_XMLSCHEMA schema rowset returns catalog name, relational schema name, XML schema collection name, target namespace, and XML schema.

For ODBC access using SQLNCLI, the XML data type is mapped to Unicode variable-length character data called SQL_SS_XML. The 3-part XML schema collection name is surfaced through SQLColAttribute for the XML column as CharacterAttributePtr. These field identifiers are SQL_CA_SS_XML_SCHEMACOLLECTION_CATALOG_NAME, SQL_CA_SS _XML_SCHEMACOLLECTION_SCHEMA_NAME, and SQL_CA_SS _XML_SCHEMACOLLECTION_NAME for the names of the database, relational schema, and XML schema collection, respectively.

Users must install the database server or client tools of SQL Server 2005 to get the SQL native client.

XML data from SQL Server 2005 is available to MDAC APIs by using SQLOLEDB as DBTYPE_WSTR by default. XML columns cannot be returned as the result of a user-defined function to clients that are running versions of SQL Server previous to SQL Server 2005. If this is attempted, an error is returned.

SQLXML - Mapping Between XML and Relational Schemas

You can create logical XML views of your relational data by using SQLXML mapping technology. An XML view, also referred to as a mapping or annotated schema, is created by adding special annotations to a given XSD schema. Other SQLXML technologies can then use this annotated schema to transform queries and updates against the logical XML view to queries and updates against relational tables:

When the XML view is combined with an XPath query, SQLXML generates a FOR XML query to find the requested data and shape it as specified in the schema.
SQLXML updategrams represent changes to an XML instance that, when combined with an annotated schema, persist these changes back to relational changes by using optimistic concurrency to ensure that the proper data is updated.
SQLXML bulkload uses a XML view to "shred" XML data into relational tables.

More information about any of these topics can be found in the SQLXML documentation. This is available on MSDN or by downloading SQLXML from the MSDN SQLXML site [ http://msdn.microsoft.com/sqlxml ] .

Creating an XML view of Relational Tables

To create an XML view of the database, you begin with an XSD Schema for your XML data. The rows in the database table/view map to complex-type elements in the schema. The column values in the database map to attributes or simple-type elements.

By default, if no explicit annotations are given, SQLXML assumes that complex-type elements map to tables and simple-type elements, and attributes map to columns. This only works if the names of the elements and attributes are exactly the same as the names of your tables and columns in your database.

If the name of an element/attribute is not the same name as the table (view) or column name to which it maps, an explicit mapping must be created. The following annotations are used to specify the mapping between an element or attribute in an XML document and the table (view) or column in a database.

sql:relation—Maps an XML element to a database table.
sql:field—Maps an element or attribute to a database column.

Mapping Relationships to Create Hierarchies in XML Views

In your database, tables may be related by foreign key relationships. In XML, these relationships are represented by a nested hierarchy of the elements. In order to construct a proper nesting in your mapping, you must specify how the elements are related. You can establish these relationships among mapping schema elements by using the sql:relationship annotation. Within this annotation, you can specify the parent and child tables as well as which columns in each table should be used to perform the join. SQLXML uses this information to construct the proper nesting hierarchy for your mapping.

Using Overflow to Store Unconsumed Data

Mapping works when your XML data has a regular structure. However, there may be some data in your XML that is unstructured or some data that does not map to a specific column. To store this data and later retrieve it, the sql:overflow annotation can be used. The sql:overflow annotation specifies the column into which all unconsumed data is stored and from where it is retrieved when querying.

The overflow column also allows for expansion of your XML without having to add to your database. Elements and attributes can be added at any time to your XML structure without adding columns to store them in your database. They will simply be stored in the overflow field and retrieved at the proper time.

For More Information

For more information about creating XML views and to see examples of mappings, see Creating XML Views by Using Annotated XSD Schemas [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/ssxsdannotations_0gqb.asp ] on MSDN.

Querying the XML View by Using XPath

Once you've created an XML view of your database, you can query that view as if it were an actual XML document by using the XPath query language. SQLXML supports a subset of the XPath 1.0 query language. When an XPath query is issued against a mapping, SQLXML composes them together and creates a FOR XML EXPLICIT statement that is sent to SQL Server. The proper data is retrieved and then shaped according to the mapping.

For details on the subset of XPath supported over XML views, see the SQLXML documentation. This is available on MSDN or by downloading SQLXML from the MSDN SQLXML page [ http://msdn.microsoft.com/sqlxml ] .

Updating Through XML Views Using Updategrams

You can modify (insert, update, or delete) a database in SQL Server through an XML view by using an updategram against an XML view of your database.

The Structure of an Updategram

An updategram is an XML document with <sync>, <before>, and <after> elements that form the syntax of the updategram. Each <sync> block contains one or more <before> and <after> blocks. The <before> block identifies the existing state (also referred to as "the before state") of the record instance. The <after> block identifies the new state to which data is to be changed. Whether an updategram deletes, inserts, or updates a record instance depends on the contents of the <before> and <after> blocks.

Insert Operations

An updategram indicates an insert operation when a record instance appears in the <after> block, but not in the corresponding <before> block. In this case, the updategram inserts the record in the <after> block into the database.

Delete Operations

An updategram indicates a delete operation when a record instance appears in the <before> block with no corresponding records in the <after> block. In this case, the updategram deletes the record in the <before> block from the database.

If an element that is specified in the updategram either matches more than one row in the table or does not match any table row, the updategram returns an error and cancels the entire <sync> block. Only one record at a time can be deleted by an element in the updategram.

Update Operations

When you are updating existing data, you must specify both the <before> and <after> blocks. The updategram uses the elements that are specified in the <before> block to identify the existing records in the database. The corresponding elements in the <after> block indicate how the records should look after the update operation is executed.

An element in the <before> block must match only one table row in the database. If the element either matches multiple table rows or does not match any table row, the updategram returns an error and cancels the entire <sync> block.

For More Information

For more information about creating and using updategrams to modify data through your XML views, see Using Updategrams to Modify Data [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/updategram_5kkh.asp ] on MSDN.

Bulk Loading XML Data Through the XML View

XML Bulk Load is a COM object that allows you to load XML data into SQL Server tables. You could insert XML data into a SQL Server database by using an INSERT statement and the OPENXML function; however, the bulk load utility provides higher performance when you need to insert large amounts of XML data. XML Bulk Load interprets the mapping schema and identifies the tables into which the XML data is to be inserted. It then shreds your XML data into your relational tables.

Because the source XML document can be large, the entire document is not read into memory for bulk load processing. Instead, XML Bulk Load interprets the XML data as a stream and interprets it reads it. As the utility reads the data, it identifies the database table, generates the appropriate record from the XML data source, and then sends the record to SQL Server for insertion.

For details about how XML Bulk Load works and how to use it, see Performing Bulk Load of XML Data [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/bulkload_7pv0.asp ] on MSDN.

SQLXML Data Access Methods

Since SQL Server 2000, two new ways to access SQLXML functionality have been added:

SQLXML Managed Classes
SQLXML Web Services

In addition, HTTP access to SQL Server has been enhanced to provide support for updategrams within templates.

SQLXML Managed Classes

SQLXML Managed Classes expose the functionality of SQLXML 3.0 inside the Microsoft .NET Framework. With SQLXML Managed Classes, you can write a C# application to access XML data from an instance of SQL Server, bring the data into the .NET Framework environment, process the data, and send the updates back to SQL Server to apply the updates.

For more details on how to use SQLXML managed classes, see SQLXML .NET Support [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/dotnet_0jck.asp ] on MSDN.

Conclusion

This article describes complementary technologies for XML in SQL Server 2005. Server-side features include a native implementation for XML storage, indexing, and query processing. Existing features such as FOR XML and OpenXML have also been enhanced. Client-side support consists of enhancements to ADO.NET to support the XML data type and in the System.Xml to support XQuery for querying differing sources of XML. In addition, the SQLXML mapping technology Web release enhancements have been incorporated into SQL Server 2005.

The server-side and the client-side support are useful in different scenarios. The XML data type provides a simple mechanism for storing XML data by inserting XML data into an untyped XML column. Using XML schemas to define typed XML helps the database engine optimize storage and query processing in addition to providing data validation.

The XML data type preserves document order and is useful for applications such as document management applications. It can also handle recursive XML schemas. The relational data model is still the best choice for structured data with a known schema. Even [n]varchar(max) is suitable for scenarios where fine-grained query and update are unimportant.

The SQLXML mapping technology is useful whenever you want to use an XML-centric programming model over relational data stored in tables at the server. The mapping is based on defining an XML schema as an XML view. The mapping can be used for bulk loading XML data into tables and for querying the tables by using XPath 1.0. Document order is not preserved, so the mapping technology is useful for XML data processing as opposed to XML document processing.

The core XML classes in System.Xml in the .NET Framework 2.0 release enable you to read, write, manipulate, and transform XML. With improvements in performance, usability, typing, and querying, the XML support in the 2.0 release continues to lead the industry in innovation, standards support, and ease of use.

The server- and client-side technologies complement one another. The mapping technology can leverage server-side features to augment features such as maintenance of document order and recursive schemas, and find more areas of applications. On the other hand, the CLR brings extensibility and the wealth of existing XML tools such as XSLT transformation to the XML data type. The XQuery implementation in the server is aligned with the July 2004 draft of the XQuery specification. The XML schema implementations at the server and the client are aligned with each other.

References

XML Best Practices for Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345115.aspx ]

Indexing XML Data Stored in a Relational Database [ http://www.vldb.org/conf/2004/ind5p2.pdf ]

XML Options in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345110.aspx ]

What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ]

About the authors

Shankar Pal is Program Manager in SQL Server engine working on server-side XML technologies. His blog is http://blogs.msdn.com/spal [ http://blogs.msdn.com/spal ] .

Mark Fussell is Lead Program Manager in the Microsoft XML Messaging team. He has worked on data access technologies including the components within the System.Xml and System.Data namespaces of the .NET Framework, Microsoft XML Core Services (MSXML), and Microsoft Data Access Components (MDAC). His blog is http://weblogs.asp.net/mfussell [ http://weblogs.asp.net/mfussell ] .

Irwin Dolobowsky is Program Manager in the Microsoft MSN team.

Contents

Introduction

Why Use Relational Databases for XML Data?

XML Support in SQL Server 2000

Server-Side Support

Client-Side Support

Limitations in XML Support

Overview of XML Support in SQL Server 2005

XML Data Type

XML Data Type Query and Data Modification

XML Indexing

XML Schema Processing

Relational and XML Integration

FOR XML and OpenXML enhancements

Client Access to XML Data Type

Motivating Scenarios for XML Storage

Custom Property Management

Data Exchange and Workflow

Document Management

Server-Side XML Processing in SQL Server 2005

The XML Data Type

Untyped XML

Example: Untyped XML Column in Table

Typed XML

Example: Typed XML Column in Table

Constraining XML Data Type Columns

Example: Constraining XML Column

Example: Limiting XML Schema Collection

Text Encoding

Storing XML Data

Example: Inserting Data into Untyped XML Column

Example: Inserting Data into Untyped XML Column from File

Example: Inserting Data into Typed XML Column

Example: Storing XML Data Generated Using FOR XML with TYPE Directive

Storage Representation

Data Modeling Considerations

Querying and Modifying XML Data

Methods on XML Data Type

Example: Using Query() Method

Example: Using Exist() Method

Example: Using Value() Method

Example: Using GROUP BY with XML Data Type Methods

Example: Executing SQLCMD

Example: Difference in Behavior Between cast() and value() Method

The XQuery Language

Example: Using Rich Language Constructs in XQuery

Query Compilation and Execution

XML Data Modification

Example: Insertion of Subtree into XML Instances

Example: Update Price of this Book to $49.99

Type Checking and Static Errors

Example: Type Checks in value() Method

Example: Known Singleton

Binding Relational Data in XQuery and XML DML

Example: Binding Relational Data Using sql:variable()

Example: Specifying Ordinal Position Using sql:variable()

Rowset Generation from XML Data

Example: Extract Properties from XML Instance

Example: Extract Properties from XML Variable

Indexing XML Data

Primary XML Index

Example: Creating Primary XML Index

Secondary XML Indexes

Example: Path-Based Lookup

Example: Fetching Properties of an Object

Example: Value-Based Query

Content Indexing

Example: Create Full-Text Index on XML Column

Example: Combining Full-Text Search with XML Querying

Example: Search Context Defined by XML Data Type Methods

Query Execution Using XML Indexes

Catalog Views for XML Indexes

Example: Space Usage of XML Indexes

XML Schema Processing

XML Schema Collections

Example: Creating an XML Schema Collection

Modifying XML Schema Collections

Example: Altering an XML Schema Collection

Catalog Views for XML Schema Collections

Example: Enumerate XML Namespaces in XML Schema Collection