Shankar Pal, Mark Fussell, and Irwin Dolobowsky
Microsoft Corporation
December 2005
Applies to:
Microsoft SQL Server
Microsoft .NET Framework 2.0
Summary:
This paper provides an overview of the XML support that is built into
Microsoft SQL Server 2005. It includes a discussion of how
XML integrates with client-side programming support in both the .NET
Framework 2.0 and native code such as OLEDB and SQLXML. (40
printed pages)
Contents
Introduction
Motivating Scenarios for XML Storage
Server-Side XML Processing in SQL Server 2005
Client-Side XML Processing in SQL Server 2005
Conclusion
References
Introduction
eXtensible
Markup Language (XML) has been widely adopted as a platform-independent
format for data representation. It is useful for exchanging information
among loosely coupled, disparate systems, such as in
business-to-business (B2B) applications and workflow situations. Data
interchange has been a major driver of XML technologies.
XML is
increasingly present in enterprise applications that are used for
modeling semi-structured and unstructured data. One such application is
document management. Documents (e-mail messages, for example) are
semi-structured by nature. If documents are stored inside a database
server as XML, powerful applications can be developed such as:
- Applications that retrieve documents based on their content.
- Applications that query for partial content, such as finding the section whose title contains the word "Background."
- Applications that aggregate documents.
Such
scenarios are becoming feasible with the increase in the development
and availability of applications that generate and consume XML. For
example, the Microsoft Office 2003 System allows users to generate
Microsoft Word, Excel, Visio, and Infopath documents as XML markup.
Why Use Relational Databases for XML Data?
Storing
XML data in a relational database provides benefits in the areas of
data management and query processing. Microsoft SQL Server
provides powerful query and data modification capabilities over
relational data. In SQL Server 2005, these capabilities are
extended to querying and modifying XML data. This allows your company
to leverage investments made over past releases, such as investments in
the areas of cost-based optimizations and data storage. For example,
indexing techniques in relational databases are well-known. These have
been extended to indexing XML data so that queries can be optimized
using cost-based decisions.
XML data can interoperate with
existing relational data and SQL applications. This means that XML can
be introduced into the system as data modeling needs arise without
disrupting existing applications. The database server also provides
administrative functionality for managing XML data (for example,
backup, recovery, and replication).
Native XML support within
SQL Server 2005 is necessary to address increasing XML usage.
Enterprise application development benefits from the XML support in
SQL Server 2005.
The following sections give an
overview of XML support in SQL Server 2000 and 2005, describe
some of the motivating scenarios for XML usage, and include detailed
discussions of the server-side and client-side XML feature sets.
XML Support in SQL Server 2000
This
section provides a brief, high-level overview of the XML support in
SQL Server 2000 and subsequent Web releases of the SQLXML
client-side programming platform that provides rich support for mapping
data back and forth between the relational and the XML data models.
Server-Side Support
At
the server, XML data is generated from tables and query results by
using a FOR XML clause in a SELECT statement. This is ideal for data
interchange and Web service applications. The converse of FOR XML is a
relational rowset generator function named OpenXML; it extracts values
from the XML data into columns of a rowset by evaluating XPath 1.0
expressions. OpenXML is used by applications that shred incoming XML
data into tables or for querying by using the Transact-SQL language.
Client-Side Support
The client programming support for SQL Server 2000 is referred to as SQLXML. (For more information, see
SQLXML
[ http://msdn2.microsoft.com/en-us/library/aa286527.aspx ] on the
Microsoft Developer Network (MSDN)). At the center of this technology
is XML view, which is a bi-directional mapping of an XML Schema to
relational tables. SQL Server 2000 supports only the mapping
of XDR schemas, though support for XSD was added in later Web releases.
The XML view allows querying by using a subset of XPath 1.0 where
the mapping is used to translate the path expressions into SQL queries
on the underlying tables, and the query results are packaged into XML
results.
SQL XML also enables you to create XML templates that
you can use to create an XML document that has dynamic sections. Within
the XML document, you can embed FOR XML queries and XPath 1.0
expressions over mapping queries. When the XML template is executed,
the query block is replaced with the result of the query. In this way
you can create XML documents that include both some static content and
some dynamic content that is data-driven.
In SQL Server 2000, there are two main ways to access SQLXML functionality.
- SQLXMLOLEDB Provider—The SQLXMLOLEDB Provider is an OLE DB provider that exposes Microsoft SQLXML functionality through ADO.
- HTTP Access—SQLXML
functionality in SQL Server 2000 can also be accessed via
HTTP by using the SQLXML ISAPI filter. By using our configuration tool,
you can set up a Web site to receive incoming requests to execute the
XML templates, and the FOR XML and XPath 1.0 statements over XML
views using HTTP.
Limitations in XML Support
The
server and client programming platforms provide rich support for the
generation and consumption of XML data based on mapping between tabular
and XML data. This handles fairly structured XML data well. In SQLXML,
the query language is a subset of XPath 1.0 and has some
limitations. For example, the descendant axes (//-operator) are not
supported. Consequently, restrictions exist that affect the development
of certain solutions. For example, XML document order is not preserved,
which is so crucial for applications such as document management.
Furthermore, recursive XML schemas are not supported. In spite of these
limitations, client SQLXML and server XML functionality have been
widely used in application development. SQL Server 2005
addresses many of these limitations, extends the relational-XML
interchange features, and provides native XML support.
Overview of XML Support in SQL Server 2005
This
section provides a brief, high-level overview of the new XML support in
SQL Server 2005, which is complemented by support in the .NET
Framework 2.0 and by native client data access such as OLE DB.
XML Data Type
The
XML data model has characteristics that make it very hard if not
practically impossible to map to the relational data model. XML data
has a hierarchical structure that may be recursive; relational
databases provide weak support for hierarchical data (modeled as
foreign key relationships). Document order is an inherent property of
XML instances and must be preserved in query results. This is in
contrast with relational data, which is unordered; order must be
enforced with additional ordering columns. Re-assembling the result
during querying is costly for realistic XML schemas that decompose the
XML data into a large number of tables.
SQL Server 2005
introduces a native data type called XML. A user can create a table
that has one or more columns of type XML in addition to relational
columns. XML variables and parameters are also allowed. XML values are
stored in an internal format as large binary objects (BLOBs) in order
to support the XML model characteristics, such as document order and
recursive structures, more faithfully.
SQL Server 2005
provides XML schema collections as a way to manage W3C XML Schemas as
metadata. An XML data type can be associated with an XML schema
collection to enforce schema constraints on XML instances. When the XML
data is associated with an XML schema collection, it is called typed XML; otherwise it is called untyped XML.
Both typed and untyped XML are accommodated within a single framework,
the XML data model is preserved, and query processing enforces XML
semantics. The underlying relational infrastructure is used extensively
for this purpose. It supports interoperability between relational and
XML data, thereby making way for more widespread adoption of the XML
features.
XML Data Type Query and Data Modification
XML
instances can be retrieved by using a Transact-SQL SELECT statement.
Five built-in methods on the XML data type are provided to query and
modify XML instances.
The XML data type methods accept XQuery.
XQuery is an emerging W3C standard language (currently in Last Call).
It includes the navigational language XPath 2.0. A language for
modifying XML data, such as adding or deleting subtrees and updating
scalar values, is also available. Together with a large set of
functions, embedded XQuery and data modification languages provide rich
support for manipulating XML data.
The XQuery type system is
aligned with that of W3C XML schema types. Most of the SQL types are
compatible with the XQuery type system (for example, decimal). A handful of types (for example, xs:duration) are stored in an internal format and suitably interpreted to be compatible with the XQuery type system.
The
compilation phase checks static type correctness of XQuery expressions
and data modification statements, and uses XML schemas for type
inferences in the case of typed XML. Static type errors are raised if
an expression could fail at run time due to a type safety violation.
XML Indexing
Query
execution processes each XML instance at run time. This becomes
expensive whenever the XML value is large in size or the query is
evaluated on a large number of rows in a table. Consequently, a
mechanism for indexing XML columns is provided to speed up queries.
B+trees have been extensively used for indexing relational data. A primary XML index on an XML column creates a B+tree
index on all tags, values, and paths of the XML instances in the
column. The primary XML index provides efficient evaluation of queries
on XML data and reassembly of the XML result from the B+tree, while preserving document order and document structure.
Secondary
XML indexes can be created on an XML column to speed up different
classes of commonly occurring queries: PATH index for path-based
queries, PROPERTY index for property bag scenarios, and VALUE index for
value-based queries.
XML Schema Processing
XML
columns, variables, and parameters can optionally be typed according to
a collection of XML schemas that may be related (for example, by using
<xs:import>) or unrelated to one another. Each typed XML instance
specifies the target namespace from the XML schema collection it
conforms to. During data assignment and modification, the database
engine validates the instance according to the XML schema.
XML
schema information is used in storage and query optimizations. Typed
XML instances contain typed values in the internal, binary
representation as well as in XML indexes. This provides efficient
processing of typed XML data.
Relational and XML Integration
Users
can store both relational and XML data within the same database.
Briefly, the database engine knows how to honor the XML data model in
addition to the relational data model. Relational data and SQL
applications continue to behave correctly upon upgrade to
SQL Server 2005. XML data residing in files and text or image
columns can be moved into XML data type columns at the server. The XML
column can be indexed, queried, and modified by using the XML data type
methods.
The database leverages existing relational
infrastructure and engine components such as the storage engine and the
query processor for XML processing. For example, XML indexes create B+trees
and query plans can be viewed in Showplan output. Because data
management functionality, such as backup/restore and replication, is
integrated into the relational framework, this functionality is
available on XML data. In addition, the new data management features,
such as database mirroring and snapshot isolation, work with the XML
data type to provide a seamless user experience.
Structured data
should be stored in tables and relational columns. The XML data type is
a suitable choice for semi-structured and markup data using XML when
the application needs to perform fine-grained query and modification of
the data.
FOR XML and OpenXML enhancements
The
existing FOR XML functionality has been enhanced in several ways. It
works over XML data type instances and other new SQL types such as [n]varchar(max). For more information about FOR XML enhancements, see
What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ] on MSDN.
The
new TYPE directive generates an XML data type instance that can be
assigned to an XML column, variable, or parameter, or queried by using
XML data type methods. This allows the nesting of SELECT… FOR XML TYPE
statements.
The PATH mode allows users to specify the path in
the XML tree where a column's value should appear and—together with the
aforementioned nesting—is more convenient to write than FOR XML
EXPLICIT.
The directive XSINIL, used in conjunction with ELEMENTS, maps NULL to an element with the attribute xsi:nil="true".
Also, the new ROOT directive allows a root node to be specified in all
modes of FOR XML. The new XMLSCHEMA directive generates an XSD inline
schema. FOR XML in SQL Server 2005 also allows users to
specify element names to replace the default <row> in FOR XML RAW mode.
OpenXML functional enhancements consist of the ability to accept the XML data type in sp_preparedocument, and to generate XML and new SQL type columns in the rowset.
Client Access to XML Data Type
Clients
can access XML data in the server in several ways. Native SQL client
access by using ODBC and OLE DB delivers XML data as a Unicode string.
OLE DB also provides ISequentialStream access to XML data for streaming Unicode data.
Managed access via ADO.NET in the .NET Framework 2.0 delivers XML data as a new class called SqlXml. It supports a method named CreateReader() that returns an XmlReader instance to read the returned XML. Equally, DataSet
can load instances of the XML data type into columns on the mid-tier
that can be edited as an XML document and saved back to
SQL Server. Both of these enable SQL queries to be issued to the
server to retrieve XML columns for manipulation on the mid-tier.
SOAP access directly to HTTP endpoints in SQL Server 2005 can be used to query, retrieve, and modify XML data.
Both
native and managed client technologies provide new interfaces for
retrieving the XML schema collection that types an XML column.
Motivating Scenarios for XML Storage
XML
data is becoming more pervasive. It can represent customer data, with
or without XML schemas that describe the data. The XML data and the XML
schemas must be managed together. Quite often, the XML schemas for
realistic applications are complex. Mapping such XML schemas to tables
and columns is a complex task. Maintaining this mapping over time, when
XML schemas change or new ones are added to the system, is troublesome.
Quite often, XML data is stored in the file system or in text columns
at the database server. Using a text column provides the benefits of
data management capabilities, such as replication and backup/restore,
but it does not allow query support based on the XML structure of the
data. With native XML support, application development using XML
becomes faster.
Custom Property Management
Some
applications, such as user interface software, allow users to choose
from a fixed set of properties. Others allow users to define their own
custom properties of interest. These custom properties can be managed
nicely if they are stored in XML format. Applications can support more
types of properties than just scalar properties. They can also support:
- Multi-valued properties on objects, such as multiple phone numbers.
- Complex properties; for example, the author property of a document might be the author's contact information.
Object properties can be stored in an XML data type column and indexed for efficient query processing.
Data Exchange and Workflow
XML
allows a platform-independent way of exchanging data among
applications. The data can be modeled as messages with XML markup.
Instead of constantly shredding and generating XML messages, it is
prudent to store messages in XML format. This fits well with data flow
requirements. An XML message reaching a workflow stage carries the
current state. Each message is processed, the progress is recorded in
the XML content (for example, state change), and the XML data is
forwarded to the next stage of workflow processing. Because messages
might be of different types or even semi-structured, and have different
XML schemas associated with them, mapping them to tables is not always
an easy task.
XML-based standards are emerging for different,
vertical domains, such as for financial and geo-spatial data. These
standards describe the structure of the data, based on which instance
data can be queried and updated. Quite often, the actual data is in
binary form, while the XML data provides metadata information regarding
them.
As a simple example, to pass a table of input parameters
to a stored procedure or function, an application converts the data to
XML and passes it in as an XML data type parameter. Within the stored
procedure or function, the rowset is regenerated from the XML
parameter.
Document Management
Suppose a
call center maintains patient records and conversations as an XML
document. When a patient calls in, the call center wants to recall the
previous conversation to set the context of the incoming call. This is
possible by querying XML markup, which benefits the application.
Furthermore, viewing the details of conversations that occurred on
earlier occasions and recording the current conversation are
facilitated.
Documents, such as e-mail messages, are
semi-structured in nature. Documents with XML markup are becoming
easier to create, for example, with Microsoft Office 2003. These
XML documents can be stored in XML columns, indexed, queried, and
updated. Thus, developers can do more with native XML support.
Server-Side XML Processing in SQL Server 2005
SQL Server 2005 support consists of providing one database in which you can store both relational and XML data.
The XML Data Type
You
can create a table with an XML column by using the usual CREATE TABLE
statement. The XML column can then be indexed in a special way.
Untyped XML
The
SQL Server 2005 XML data type implements the ISO SQL-2003
standard XML data type. As such, it can store not only well-formed
XML 1.0 documents but also so-called XML content fragments with
text nodes and an arbitrary number of top-level elements. Checks for
well-formedness of the data are performed, which does not require the
XML data type to be bound to XML schemas. Data that is not well-formed
is rejected.
Untyped XML is useful when the schema is not known a priori
so that a mapping-based solution is not possible. It is also useful
when the schema is known but mapping to a relational data model is very
complex and hard to maintain, or multiple schemas exist and are late
bound to the data based on external requirements.
Example: Untyped XML Column in Table
The following statement creates a table named docs with an integer primary key named pk and an untyped XML column named xCol.
CREATE TABLE docs (pk INT PRIMARY KEY, xCol XML not null)
A table can also be created with more than one XML or relational column with or without a primary key.
Typed XML
If
you have XML schemas that describe your XML data in an XML schema
collection, you can associate the XML schema collection with the XML
column to yield typed XML. The XML schemas are used to validate
the data, perform more precise type checks than untyped XML during
compilation of query and data modification statements, and optimize
storage and query processing.
Type information is stored in both
the XML BLOB and the XML indexes, and uses more space for XML BLOBs.
Typed XML yields better performance with value-based queries (that is,
where the search value is more selective than, say, the path in which
the value occurs) by avoiding run-time value conversions so that index
seeks into the XML indexes become possible.
Typed XML columns,
parameters, and variables can store XML documents or content, which you
can specify as an option (DOCUMENT or CONTENT, respectively, with
CONTENT as the default) at the time of declaration. Furthermore, you
must provide the collection of XML schemas. Specify DOCUMENT if each
XML instance has exactly one top-level element; otherwise, use CONTENT.
The query compiler uses the DOCUMENT flag in type checks to infer
Singleton top-level elements.
Example: Typed XML Column in Table
XML columns, variables, and parameters can be bound to a collection of XML schemas. (For more details and examples, see XML schema processing later in this paper.) Suppose myCollection is the name of one such collection. The following statement creates a table named XmlCatalog with an XML column Document typed using myCollection. The typed XML column is also specified to accept XML fragments, not just XML documents.
CREATE TABLE XmlCatalog (
ID INT PRIMARY KEY,
Document XML(CONTENT myCollection))
Constraining XML Data Type Columns
In
addition to typing an XML column, you can use relational (column or
row) constraints on typed and untyped XML data type columns. Most SQL
constraints are applicable to XML columns as well. The notable
exceptions to this are unique, primary key, and foreign key
constraints, since XML data type instances cannot be compared. Thus,
you can specify an XML column to be nullable or non-nullable, supply a
default value, and define CHECK constraints on the column. For example,
an untyped XML column can have a CHECK constraint to verify that the
stored XML instances confirm to an XML schema.
Use constraints under the following conditions:
- Your business rules cannot be expressed in XML
schemas. For example, the delivery address of a flower shop must be
within 50 miles of its business location. This can be written as a
constraint on the XML column. The constraint may involve XML data type
methods.
- Your constraint involves other XML or non-XML
columns in the table. An example is the enforcing the constraint so
that the value of Customer ID (that is, the result of the path
expression /Customer/@CustId) in an XML instance equals the value of an
integer in the CustomerID column.
- You want to limit an XML
schema collection to the top-level elements or the schema namespaces
that are to be permitted in a typed XML column. This is useful when the
same XML schema collection is used to type multiple XML columns and
each XML column should contain a different top-level element. It is
also useful when the insertion of XML instances conforming to only the
latest version of an XML schema is allowed.
Example: Constraining XML Column
To ascertain that the <last-name> of an <author> of a <book> is different from the <author>'s <first-name>, the following CHECK constraint CK_name is specified. XML data type methods must be provided within a user-defined function, for which udf_Check_Names() is introduced.
CREATE FUNCTION udf_Check_Names (@xmlData XML)
RETURNS int AS
BEGIN
RETURN (SELECT @xmlData.exist('/book/author[first-name = last-name]'))
END
GO
CREATE TABLE docs (pk INT PRIMARY KEY,
xCol XML not null
CONSTRAINT CK_name CHECK (dbo.udf_Check_Names(xCol) = 0))
GO
Example: Limiting XML Schema Collection
Suppose the XML column Document in the table XmlCatalog (described in the typed XML column in table example) is to be constrained to allow the storage of only a single <dvdstore> element per row and to prevent the insertion of <bookstore> elements. The following CHECK constraint achieves this.
CREATE FUNCTION udf_Check_Top_Level_Nodes (
@xmlData XML(CONTENT myCollection))
RETURNS int AS
BEGIN
RETURN (@xmlData.value ('count(/*)', 'int'))
END
CREATE FUNCTION udf_Check_Dvdstore_Nodes (
@xmlData XML(CONTENT myCollection))
RETURNS int AS
BEGIN
RETURN (@xmlData.value('declare default element namespace "http://myDVD";
count(/dvdstore)', 'int'))
END
ALTER TABLE XmlCatalog
ADD CONSTRAINT dvdstore_only_check
CHECK (dbo.udf_Check_Top_Level_Nodes (Document) = 1 AND
dbo.udf_Check_Dvdstore_Nodes (Document) = 1)
Text Encoding
SQL Server 2005
stores XML data as Unicode (UTF-16). XML data retrieved from the server
comes out in UTF-16 encoding as well. If you want a different encoding,
you must perform the necessary conversion, either by casting or on the
mid-tier, after retrieving the data. For example, you could cast your
XML data to varchar type on the server, in which case the database engine serializes the XML with an encoding determined by the collation of the varchar.
Storing XML Data
You can supply an XML value for an XML column, parameter, or variable in multiple ways:
- As a character or binary SQL type that is implicitly converted to XML data type.
- As the content of a file.
- As the output of the XML publishing mechanism FOR XML with the TYPE directive that generates an XML data type instance.
The
supplied value is checked for well-formedness. An XML column by default
allows both XML documents and XML fragments to be stored. If the data
fails the well-formedness check, it is rejected with an appropriate
error message.
For typed XML, the supplied value is checked for
conformance to XML schemas that are registered with the XML schema
collection that is typing the XML column. The XML instance is rejected
if it fails this validation. Furthermore, the DOCUMENT flag on typed
XML restricts accepted values to XML documents only while CONTENT
allows both XML document and content to be supplied.
Example: Inserting Data into Untyped XML Column
The following statement inserts a new row into the table named docs with the value 1 for the integer column pk and a <book> instance for the XML column. The <book> data, supplied as a string, is implicitly converted to XML data type and checked for well-formedness during insertion.
INSERT INTO docs VALUES (1, '<book genre="security"
publicationdate="2002" ISBN="0-7356-1588-2">
<title>Writing Secure Code</title>
<author>
<first-name>Michael</first-name>
<last-name>Howard</last-name>
</author>
<author>
<first-name>David</first-name>
<last-name>LeBlanc</last-name>
</author>
<price>39.99</price>
</book>')
INSERT INTO docs VALUES (2,
'<doc id="123">
<sections>
<section num="1"><title>XML Schema</title></section>
<section num="3"><title>Benefits</title></section>
<section num="4"><title>Features</title></section>
</sections>
</doc>')
Example: Inserting Data into Untyped XML Column from File
The INSERT statement in the following code segment reads the content of file C:\temp\xmlfile.xml as a BLOB by using OPENROWSET. A new row is inserted into the table named docs with a value of 10 for the primary key and the BLOB for the XML column xCol. A well-formedness check occurs when file content is assigned to the XML column.
INSERT INTO docs
SELECT 10, xCol
FROM (SELECT * FROM OPENROWSET
(BULK 'C:\temp\xmlfile.xml',
SINGLE_BLOB) AS xCol) AS R(xCol)
Example: Inserting Data into Typed XML Column
Typed
XML columns require that the XML instance data specify the target
namespace of the XML schema that is used to type it (the namespace may
be empty). In the following code segment, this is done via the
namespace declaration xmlns=http://myDVD.
INSERT XmlCatalog VALUES(2,
'<?xml version="1.0"?>
<dvdstore xmlns="http://myDVD">
<dvd genre="Comedy" releasedate="2003">
<title>My Big Fat Greek Wedding</title>
<price>19.99</price>
</dvd>
</dvdstore>')
Example: Storing XML Data Generated Using FOR XML with TYPE Directive
FOR
XML has been enhanced with a TYPE directive to generate the result as
an XML data type instance. The resulting XML can be assigned to an XML
column, variable, or parameter. In the following statement, the XML
instance generated by using FOR XML TYPE is assigned to an XML data
type variable @xVar. The variable can be queried using XML data type methods.
DECLARE @xVar XML
SET @xVar = (SELECT * FROM docs FOR XML AUTO,TYPE)
Storage Representation
XML data type
instances are stored in an internal, binary representation that is
streamable and optimized for efficient parsing. Tags are mapped to
integer values and the mapped values are stored in the internal
representation. This yields some compression of the data, as well.
For
untyped XML, node values are stored as Unicode (UTF-16) strings, so
that run-time type conversion is required to perform operations. For
example, to evaluate the predicate /book/price > 9.99, the value of
the book's price is converted to decimal. On the other hand,
for typed XML, values are encoded in the type specified in the XML
schemas. This makes parsing of the data much more efficient and avoids
run-time conversion.
The stored binary form is limited to
2 GB per XML instance, which can accommodate most XML data.
Furthermore, the depth of XML hierarchy is limited to 128 levels.
The
InfoSet content of the XML data is preserved. It may not be an exact
copy of the text XML, since the following information is not retained:
insignificant white spaces, order of attributes, namespace prefixes,
and XML declaration. (For more information about InfoSet, see the WC3
recommendations for XML information at
http://www.w3.org/TR/xml-infoset [ http://www.w3.org/tr/xml-infoset ] ).
Data Modeling Considerations
Quite
often, a combination of relational and XML data type columns is
appropriate for data modeling. Some of the values from your XML data
can be stored in relational columns, and the rest, or the entire XML
value, stored in an XML column. This may yield better performance and
locking characteristics.
Values within the XML data can be
promoted to computed columns in the same table for Singleton values
(that is, single-valued properties). A multi-valued property requires a
separate table for the property, which must be populated and maintained
by using triggers. Queries need to be written directly against the
property table.
The granularity of the XML data stored in an XML
column is critical for locking and update characteristic.
SQL Server employs the same locking mechanism for both XML and
non-XML data. When the granularity is large, locking large XML
instances for updates causes throughput to decline in a multi-user
scenario. On the other hand, severe decomposition loses object
encapsulation and raises re-assembly cost.
Querying and Modifying XML Data
Querying
XML instances stored in an XML column requires parsing binary XML data
in the column. Parsing binary XML is much faster than parsing the text
form of the XML data. XML indexing avoids reparsing and is discussed in
Indexing XML data later in this paper.
Methods on XML Data Type
You
can retrieve entire XML values or you can retrieve parts of XML
instances. This is possible by using four XML data type methods that
take an XQuery expression as argument: query(), value(), exist() and nodes(). A fifth method, modify(), allows modification of XML data and accepts an XML data modification statement as input.
- The query() method is useful for
extracting parts of an XML instance. The XQuery expression evaluates to
a list of XML nodes. The subtree rooted at each of these nodes is
returned in document order. The result type is untyped XML.
- The value()
method extracts a scalar value from an XML instance. It returns the
value of the node the XQuery expression evaluates to. This value is
converted to a Transact-SQL type specified as the second argument of
the value() method.
- The exist() method is
useful for existential checks on an XML instance. It returns 1 if the
XQuery expression evaluates to non-null node list; otherwise it returns
0.
- The nodes() method yields instances of a special
XML data type, each of which has its context set to a different node
that the XQuery expression evaluates to. The special XML data type
supports the query(), value(), nodes(), and exist() methods, and can be used in count(*) aggregations and NULL checks. All other uses result in an error.
- The modify()
method permits modifying parts of an XML instance, such as adding or
deleting subtrees, or replacing scalar values such as the price of a
book from 9.99 to a 39.99.
Example: Using Query() Method
Consider the following query on the XML column named xCol of table docs that extracts <section> elements anywhere under the <doc> element whose id is 123. The query also retrieves the value from the integer primary key column. The query() method in the SELECT list is evaluated for each row in the table yielding a sequence of <section> elements, which, together with their subtrees, are retrieved in document order. Each XML instance without a <doc> element having id 123 or without a <section> element below it returns no result; the return value of the query() method is empty XML.
SELECT pk, xCol.query('/doc[@id = 123]//section')
FROM docs
Empty XML values can be filtered in an outer SELECT statement. Alternatively, the exist() method may be used, as shown in the next example.
Example: Using Exist() Method
Consider the following query. It involves the query() and exist() methods on the XML column xCol of table docs. The exist() method evaluates the path expression /doc[@id = 123], checking for the existence of a top-level <doc> element that has an attribute named id with the value 123. For each such row, the query() method in the SELECT clause is evaluated; in this example, the query() method yields a sequence of <section> elements anywhere under the <doc> element. Any row that returns 0 from the exist() method is skipped.
SELECT xCol.query('/doc[@id = 123]//section')
FROM docs
WHERE xCol.exist ('/doc[@id = 123]') = 1
Example: Using Value() Method
The following query extracts the title of the third section of a document as Unicode string using the value() method. The SQL type nvarchar(max) of the result is specified as the second argument of the value() method. The XQuery function data() extracts a scalar value from the <title> node.
SELECT xCol.value(
'data((/doc//section[@num = 3]/title)[1])', 'nvarchar(max)')
FROM docs
Example: Using GROUP BY with XML Data Type Methods
XML
data type methods are not allowed in the Transact-SQL GROUP BY clause.
However, you can extract values from an XML column in a subquery, alias
the grouping columns, and use the aliases in the GROUP BY clause. The
following query illustrates this by computing the number of books
published by authors with the same first name.
SELECT Fname, count(Fname)
FROM
(SELECT nref.value('(author/first-name)[1]', 'nvarchar(max)') Fname
FROM docs CROSS APPLY xCol.nodes('/book') T(nref)
WHERE nref.exist ('author/first-name') = 1) Result
GROUP BY FName
ORDER BY Fname
Example: Executing SQLCMD
Executing
XQuery and XML data modification statements requires that the
connection option QUOTED_IDENTIFIER be ON. The default value of this
option in SQLCMD is OFF. This must be changed to ON by using the –I
switch.
sqlcmd -E -I -d <database> -Q "SELECT xCol.query('//author') FROM docs"
Example: Difference in Behavior Between cast() and value() Method
The query() method returns XML data type instances, which have special XML characters entitized when converted to a string type. A value()
method, on the other hand, returns an SQL type value, which does not
entitize the special characters. This difference is evident in event
notification. While the first of the following two queries may contain
entitized carriage returns:
SELECT EVENTDATA().value('
(/EVENT_INSTANCE/TSQLCommand/CommandText/text())[1]','nvarchar(max)')
the second query does not:
SELECT CAST (EVENTDATA().query('
/EVENT_INSTANCE/TSQLCommand/CommandText/text()') AS nvarchar(max))
The XQuery Language
There are numerous
sources of XML from Office documents stored on the file system, Web
services, or configuration files. In fact, data is being increasingly
generated either in XML format or as virtual XML documents. To cope
with this increasing amount of data with a powerful query language,
XQuery was conceived. The justification for XQuery is described in the
W3C XQuery Language specification [ http://www.w3.org/tr/xquery ] as:
- A query language that uses the structure of
XML intelligently can express queries across all kinds of data, whether
physically stored in XML or viewed as XML via middleware. This
specification describes a query language called XQuery, which is
designed to be broadly applicable across many types of XML data sources.
- XQuery
is designed to meet the requirements identified by the W3C XML Query
Working Group XML Query 1.0 Requirements and the use cases in XML
Query Use Cases. It is designed to be a language in which queries are
concise and easily understood. It is also flexible enough to query a
broad spectrum of XML information sources, including both databases and
documents.
- XQuery can also be summarized in the following
statement: The XQuery language is to XML as the SQL language is to
relational databases.
A subset of XQuery
[ http://www.w3.org/tr/xquery/ ] embedded in Transact-SQL is the
language supported for querying the XML data type. The language is
under development (currently in Last Call) by the World Wide Web
Consortium (W3C) with the participation of all major database vendors
including Microsoft Corporation. Our implementation is aligned with the
July 2004 draft of XQuery.
XQuery includes XPath 2.0
as a navigation language. The implementation of XQuery by
SQL Server 2005 provides constructs for iteration over nodes
(FOR), node check (WHERE), returning values (RETURN), and sorting
(ORDER BY). It also provides element construction for reshaping data
during querying.
SQL Server 2005 also provides
language constructs for data modification language (DML) of the XML
data type. (For more information, see XML Data Modification later in this document). The following example shows how to use XQuery on the XML data type.
Example: Using Rich Language Constructs in XQuery
The following query shows several XQuery language constructs used together. It returns the title, wrapped in a new tag <topic>, of sections with section number 3 and higher from a document with id 123.
SELECT pk, xCol.query('
FOR $s in /doc[@id = 123]//section
WHERE $s/@num >= 3
RETURN <topic>{data($s/title)}</topic>')
FROM docs
The FOR iterates over all <section> elements under <doc> elements with id 123, and binds each such <section> to the variable $s. The WHERE clause ensures that the section number (@num attribute of the <section> element) is 3 or higher. The query returns the value of the section <title> in document order wrapped in a constructed element called <topic>.
Query Compilation and Execution
The
SQL statement is parsed by the SQL parser. When it encounters the
XQuery expression, it jumps into the XQuery compiler, which then
compiles the XQuery expression. This yields a query tree that is
grafted into the tree for the overall query.
The overall query
tree undergoes query optimization and produces a physical query plan
that it picked based on cost-based estimates. The Showplan output shows
mostly relational operators and some new operators such as UDX for XML
processing.
Query execution is tuple-oriented as in the rest
of the relational framework. The WHERE clause is evaluated on each row
of the table named docs; this involves parsing the XML BLOB at
run time to evaluate XML data type methods. If the condition is
satisfied, the row is locked and the SELECT clause is evaluated in the
row. The result is produced as XML data type for query() method and converted into the specified target type for value() method.
If,
on the other hand, the row does not satisfy the conditions in the WHERE
clause, it is skipped and execution moves to the next row.
XML Data Modification
SQL Server 2005
provides constructs for data modification as an extension to XQuery.
Subtrees can be inserted before or after a specified node, or as the
leftmost or rightmost child. Furthermore, a subtree can be inserted
into a parent node, in which case it becomes the rightmost child of the
parent. Attribute, element, and text node insertions are all supported.
Deletion of subtrees is supported. In this case, the entire subtree is removed from the XML instance.
Scalar values can be replaced with new scalar values.
Example: Insertion of Subtree into XML Instances
This example shows the use of the modify() method to insert a new <section> element to the right of the <section> element whose number is 1.
UPDATE docs SET xCol.modify('
insert
<section num="2">
<title>Background</title>
</section>
after (/doc//section[@num=1])[1]')
Example: Update Price of this Book to $49.99
The following UPDATE statement replaces the <price>
of a book whose ISBN is 1-8610-0311-0 to $49.99. The XML instance is
typed with the XML schema http://myBooks, hence the namespace
declaration in the XML data modification statement.
UPDATE XmlCatalog
SET Document.modify ('
declare namespace bk = "http://myBooks";
replace value of (/bk:bookstore/bk:book
[@ISBN="1-861003-11-0"]/bk:price)[1] with 49.99')
Type Checking and Static Errors
XQuery
introduces type checking. The compilation phase checks the static type
correctness of XQuery expressions and data modification statements, and
uses XML schemas for type inferences in case of typed XML. It raises
static type errors if an expression could fail at run time due to a
type safety violation. Examples of static errors are the addition of a
string to an integer, receiving a sequence of values where the
operation expects a single value, and querying for a non-existent node
for typed data. Explicit casting to the proper type is a workaround for
static errors resulting from type mismatches. XQuery run-time errors
are converted into empty sequences.
Location steps, function
parameters, and operators (for example, eq) that require Singletons
return an error if the compiler cannot determine whether a Singleton is
guaranteed at run time. The problem arises often with untyped data. For
example, the lookup of an attribute requires a Singleton parent
element; an ordinal selecting a single parent node is adequate.
Example: Type Checks in value() Method
The following query on an untyped XML column requires an ordinal specification on //author/last-name since the value() method expects a Singleton node as the first argument. Without it, the compiler cannot determine whether only one <last-name> node will occur at run time.
SELECT xCol.value('(//author/last-name)[1]',
'nvarchar(50)') LastName
FROM docs
The evaluation of the node()-value() combination to extract attribute values may not require the ordinal specification, as shown in the next example.
Example: Known Singleton
The nodes() method in the following statement generates a separate row for each <book> element. The value() method evaluated on a <book> node extracts the value of @genre, which, being an attribute, is a Singleton.
SELECT nref.value('@genre', 'varchar(max)') LastName
FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref)
Binding Relational Data in XQuery and XML DML
When
your data resides in a combination of relational and XML data type
columns, you might want to write queries that combine relational and
XML data processing. You can convert the data in relational and XML
columns to an XML data type instance using FOR XML with the TYPE
directive and query it by using XQuery. Conversely, you can generate a
rowset from XML values and query it by using Transact-SQL.
A
more convenient and efficient way of writing cross-domain queries is to
use the value of an SQL variable or column within XQuery or XML data
modification expressions as follows:
- Apply the value of a SQL variable in your XQuery or XML DML expression using sql:variable().
- Use the values from a relational column in XQuery or XML DML context with sql:column().
This approach allows applications to parameterize queries, as shown in the following example. sql:column()
is used in a similar way and provides additional benefits. Indexes over
the column can be used for efficiency as decided by the cost-based
query optimizer. Furthermore, a computed column can be used.
XML and user-defined types cannot be used with sql:variable() and sql:column().
Example: Binding Relational Data Using sql:variable()
In this query, the ISBN of a <book> element is passed in using a SQL variable named @isbn. Instead of using a constant, sql:variable() supplies the value of the ISBN, and the query can be used to search for any ISBN, not just the one whose ISBN is 0-7356-1588-2.
DECLARE @isbn varchar(20)
SET @isbn = '0-7356-1588-2'
SELECT xCol
FROM docs
WHERE xCol.exist ('/book[@ISBN = sql:variable("@isbn")]') = 1
Example: Specifying Ordinal Position Using sql:variable()
The following query retrieves the <last-name> of the N-th <author> of a <book> where the value of N is specified by using sql:variable():
DECLARE @aNum int
SET @aNum = 2
SELECT xCol.value ('(/book/author,
[sql:variable("@aNum")]/last-name)[1]''nvarchar(64)')
FROM docs
Rowset Generation from XML Data
In
custom property management and data interchange scenarios, applications
quite often map some part of the XML data to a rowset. For example, to
pass a table of input parameters to a stored procedure or function, an
application converts the data to XML and passes it in as an XML data
type parameter. Within the stored procedure or function, the rowset is
regenerated from the XML parameter.
SQL Server 2000 provides OpenXml()
for this purpose. It is a facility for generating a rowset from an XML
instance by specifying the relational schema for the rowset and how
values inside the XML instance map to columns in the rowset.
Alternatively, the nodes() method can be used to generate node contexts within an XML instance, and use the node contexts in value(), query(), exist(), and nodes() methods to generate the desired rowset. The nodes()
method accepts an XQuery expression, evaluates it on each XML instance
in an XML column, and uses XML indexes effectively. The next example
illustrates the use of the nodes() method for rowset generation.
Example: Extract Properties from XML Instance
Suppose you want to extract first and last names of authors, whose first name is not "David", as a rowset consisting of two columns, FirstName and LastName. Using the nodes() and value() methods, you can achieve this as follows:
SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM docs CROSS APPLY xCol.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1
In this example, nodes('//author') yields a rowset of references to <author> elements for each XML instance. The first and last names of authors are obtained by evaluating value()
methods relative to those references. For good performance, the XML
column should be indexed, which is the topic of the next section.
Example: Extract Properties from XML Variable
The
CROSS APPLY operator in the previous query is not required when
properties are extracted from an XML variable or parameter. This
example considers an XML variable named @xVar to which a <book> instance is assigned and retrieves the <first-name> and <last-name> of authors.
DECLARE @xVar XML
SET @xVar =
'<book genre="security" publicationdate="2002" ISBN="0-7356-1588-2">
<title>Writing Secure Code</title>
<author>
<first-name>Michael</first-name>
<last-name>Howard</last-name>
</author>
<author>
<first-name>David</first-name>
<last-name>LeBlanc</last-name>
</author>
<price>39.99</price>
</book>'
SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName,
nref.value('last-name[1]', 'nvarchar(50)') LastName
FROM @xVar.nodes('//author') AS R(nref)
WHERE nref.exist('.[first-name != "David"]') = 1
Indexing XML Data
XML
data is stored in an internal binary form and can be up to 2 GB in
storage. Each query parses the XML blob at run time one or more times
in each row of the table. This makes query processing slow. If querying
is common in the workload, it is beneficial to index the XML column,
although the cost of XML index maintenance during data modification
must be taken into account.
XML indexes are created by using a new DDL statement on typed and untyped XML columns. This creates a B+tree for all XML instances in the column. The first index on an XML column is the primary XML index.
Using it, three types of secondary XML indexes are supported on the XML
column to speed up common classes of queries, as described in the
following section.
Primary XML Index
The
primary XML index requires a clustered index on the primary key of the
base table (that is, the table in which the XML column is defined). It
creates a B+tree on a subset of the Infoset items of XML nodes. Columns of the B+tree
represent tags such as element and attribute names, node values, and
node types. Other columns capture the document order and structure in
the XML data, and the path from the root of the XML instance to each
node for efficient evaluation of path expressions. The primary key of
the base table is duplicated in the primary XML index to correlate
index rows with base table rows.
Tags and type names given in XML schemas are mapped to integer values and the mapped values are stored in the B+tree
to optimize storage. The path column in the index stores a
concatenation of the mapped values in reversed order, that is, from a
node to the root of the XML instance. The reverse representation allows
path values to be matched when the path suffix is known (for example,
in a path expression such as //author/last-name).
If the base
table is partitioned, the primary XML index is partitioned the same
way; that is, using the same partitioning function and partitioning
scheme.
Full XML instances are retrieved from XML columns (for
example, SELECT * FROM docs or SELECT xCol FROM docs). Queries
involving XML data type methods use the primary XML index, returning
scalar values, or XML subtrees from the index itself.
Example: Creating Primary XML Index
The following statement creates an XML index named idx_xCol on the XML column xCol of table docs.
CREATE PRIMARY XML INDEX idx_xCol on docs (xCol)
Secondary XML Indexes
Once the primary
XML index is created, secondary XML indexes can be created to speed up
different classes of queries within a workload. Three types of
secondary XML indexes—PATH, PROPERTY, and VALUE—benefit path-based
queries, custom property management scenarios, and value-based queries,
respectively.
- The PATH index builds a B+-tree on the
columns (path, value) of the primary XML index. A value for the path is
computed from a path expression; a node's value, if one is provided, is
used as well. Since the leading fields of the PATH index are known,
index seek into the PATH index speeds up evaluation of the path
expression. The most common case is the use of the exist() method on
XML columns in the WHERE clause of a SELECT statement.
- The
PROPERTY index creates a B+-tree on the columns (PK, path, value) of
the primary XML index, where PK is the primary key of the base table.
This index benefits property value lookups within an XML instance.
- The
VALUE index creates a B+-tree on the columns (value, path) of the
primary XML index. This index benefits queries where a node's value is
known but its path is imprecisely specified in the query. This
typically occurs with descendant axes lookups, such as
//author/last-name [. ="Howard"], where <author> elements can
occur at any level of the hierarchy. It also occurs in wildcard
queries, such as /book [@* = "novel"], where the query looks for
<book> elements with some attribute having the value "novel".
Furthermore, the VALUE index is useful for value-based range scan of
typed XML.
Up to 128 levels of the XML hierarchy are
accommodated; XML instances containing longer paths are rejected during
insertion and modification.
Similarly, up to the first
128 bytes of a node's value are indexed; longer values are
accommodated within the system and are not indexed.
Example: Path-Based Lookup
Suppose the following query is common in your workload.
SELECT xCol
FROM docs
WHERE xCol.exist ('/book[@genre = "security"]') = 1
The path expression /book/@genre and the value securitycorrespond to the key fields of the PATH index. Consequently, a secondary XML index of type PATH is helpful for this workload.
CREATE XML INDEX idx_xCol_Path on docs (xCol)
USING XML INDEX idx_xCol FOR PATH
Example: Fetching Properties of an Object
Consider the following query that retrieves the properties genre, title, and ISBN of a book from each row in table T.
SELECT xCol.value ('(/book/@genre)[1]', 'varchar(50)'),
xCol.value ('(/book/title)[1]', 'varchar(50)'),
xCol.value ('(/book/@ISBN)[1]', 'varchar(50)')
FROM docs
The property index is useful in this case and is created as follows:
CREATE XML INDEX idx_xCol_Property on docs (xCol)
USING XML INDEX idx_xCol FOR PROPERTY
Example: Value-Based Query
In
the following query, the descendant-or-self axis (//-operator)
specifies a partial path so that the lookup based on the value of ISBN
benefits from the use of the VALUE index.
SELECT xCol
FROM docs
WHERE xCol.exist ('//book/@ISBN[. = "0-7356-1588-2"]') = 1
The VALUE index is created as follows:
CREATE XML INDEX idx_xCol_Value on docs (xCol)
USING XML INDEX idx_xCol FOR VALUE
Content Indexing
You can create a
full-text index on XML columns; this indexes the content of the XML
values while ignoring the XML markup. Attribute values are not
full-text indexed (since they are considered part of the markup) and
element tags are used as token boundaries. You can create both XML and
full-text indexes on an XML column, and combine full-text search with
XML index usage. Full-text search using CONTAINS() and XQuery fn:contains()
have different semantics. The latter as implemented is a
case-sensitive, substring search, while the former is a token match
using stemming.
Use the full-text index as the first filter to
narrow down the choices and then apply XQuery to filter further. The
overall semantics is that of case-sensitive substring searches.
Alternatively, a part of the XML data can be promoted to a computed XML
column XC on which a full-text index is created. This defines XC as the
full-text search context.
Example: Create Full-Text Index on XML Column
The
steps for creating full-text index on an XML column are the same as
those for other SQL type columns. A unique key column on the base table
is required. The DDL statements are as follows, in which PK__docs__7F60ED59 is the single-column primary key index of the table.
CREATE FULLTEXT CATALOG ft AS DEFAULT
CREATE FULLTEXT INDEX ON dbo.docs (xCol) KEY INDEX PK__docs__7F60ED59
Example: Combining Full-Text Search with XML Querying
The following query checks that an XML value contains the word Secure in the title of a book.
SELECT *
FROM docs
WHERE CONTAINS(xCol,'Secure')
AND xCol.exist('/book/title/text()[fn:contains(.,"Secure")]') =1
The CONTAINS() method uses the full-text index to subset the XML values that contain the word Secure anywhere in the document. The exist() method ensures that the word Secure occurs in the title of a book.
Example: Search Context Defined by XML Data Type Methods
An alternative to the exist()
method is to use the LIKE operator on the text value of the context
node as shown in the following code. This yields a case-insensitive,
substring search.
SELECT pk, xCol
FROM docs CROSS APPLY xCol.nodes('/book/title/text()') title(tRef)
WHERE CONTAINS (xCol, 'Secure')
AND title.tRef.value ('.', 'NVARCHAR(MAX)') LIKE '%Secure%'
Query Execution Using XML Indexes
XML
indexes speed up query execution. Queries are always compiled against
the primary XML index on an XML column, if one exists. A single query
plan is produced for the entire query (both relational and XML parts),
which is optimized by using the database engine's cost-based optimizer.
Secondary XML indexes are selected based on the query optimizer's cost
estimates.
Catalog Views for XML Indexes
Catalog views exist to provide metadata information regarding XML indexes. The catalog view sys.indexes contains entries for XML indexes with the index "type" 3. The name column contains the name of the XML index.
XML indexes are also recorded in the catalog view sys.xml_indexes, which contains all the columns of sys.indexes and a few special ones meaningful for XML indexes. The value NULL in the column secondary_type
indicates a primary XML index; the values 'P', 'R', and 'V' stand for
PATH, PROPERTY, and VALUE secondary XML indexes, respectively. The
column secondary_type_desc contains NULL for a primary XML
index, and the strings "PATH," "PROPERTY," and "VALUE" for the three
types of secondary XML indexes.
Space usage of XML indexes can be found by using the table-valued function sys.dm_db_index_physical_stats().
This function provides information such as the number of disk pages
occupied, average row size in bytes, number of records, and other
information for all index types, including XML indexes. This
information is available for each database partition; XML indexes use
the same partitioning scheme and partitioning function of the base
table.
Example: Space Usage of XML Indexes
SELECT sum (page_count)
FROM sys.dm_db_index_physical_stats (db_id(), object_id('docs'),
DEFAULT, DEFAULT, 'DETAILED') SDPS
JOIN sys.xml_indexes SXI ON (SXI.index_id = SDPS.index_id)
WHERE SXI.name = 'idx_xCol_Path'
This statement yields the number of disk pages occupied by the XML index idx_xCol_Path in table T across all partitions. Without the sum() function, the result would return the disk page usage per partition.
XML Schema Processing
XML
schemas are optional in the system. As previously mentioned, XML data
not bound to XML schemas is considered to be untyped—XML node values
are stored as Unicode strings, and XML instances are checked for
well-formedness. An untyped XML column can be indexed.
XML
typing is done by associating an XML data type with XML schemas that
are registered with an XML schema collection. A new DDL statement
allows the creation of an XML schema collection with which one or more
XML schemas may be registered. An XML column, parameter, or variable
bound to an XML schema collection is typed according to all the XML
schemas in the collection. Within an XML schema collection, the type
system identifies each XML schema using its target namespace.
Each
top-level XML element in an XML instance must specify a possible empty
target namespace it conforms to. Data is validated during insertion and
modification according to the target namespace of each top-level
element. The binary XML representation encodes typed values based on
the associated XML schema information and is fully described, so that
reparsing it is more efficient as compared to reparsing untyped XML.
Values are properly typed in XML indexes as well (for example,
/book/price is stored as decimal if it is defined in the XML schema as xs:decimal).
During
query compilation, XML schemas are used for type checking and static
errors are issued for type mismatches. The query compiler also uses XML
schemas for query optimizations.
The database engine's
metadata subsystem contains XML type information such as XML schema
collections and their contained XML schemas, and the mapping between
the primitive XSD and relational type systems. Almost all of the W3C
XML Schema 1.0 specification is supported. (For more information on the
W3C XML Schema 1.0 specification, see
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ [ http://www.w3.org/tr/2001/rec-xmlschema-1-20010502/ ] and
http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
[ http://www.w3.org/tr/2001/rec-xmlschema-2-20010502/ ] ) Comments and
annotations in XML schema documents are not preserved, and key/keyref
is not supported.
XML Schema Collections
An
XML schema collection is a metadata entity, scoped by a relational
schema. It contains one or more XML schemas that may be related (for
example, using <xs:import>) or unrelated. Individual XML
schemas within an XML schema collection are identified by using their
target namespace. XML schema collections are securable entities, much
like tables.
An XML schema collection is created by using CREATE
XML SCHEMA COLLECTION syntax and providing one or more XML schemas. You
can then type an XML column by using the XML schema collection. This
design yields a flexible data model in which XML that is typed
according to different XML schemas can be stored in the same column.
This is especially convenient when the number of XML schemas is large.
Furthermore, this design supports XML schema evolution to some extent.
Additionally,
the option DOCUMENT or CONTENT on a typed XML column specifies whether
XML trees or fragments, respectively, can be stored in the XML column.
The default behavior is CONTENT. For DOCUMENT, each XML instance must
specify the target namespace of its top-level element, according to
which it is validated and typed. For CONTENT, on the other hand, each
top-level element can specify any one of the target namespaces in the
XML schema collection. The XML instance is validated and typed
according to all the target namespaces occurring in the instance.
Example: Creating an XML Schema Collection
Suppose you want to use an XML schema with target namespace http://myBooks to type your XML instances. Create an XML schema collection named myCollection and supply the XML schema as the content of myCollection as shown in the following code segment.
CREATE XML SCHEMA COLLECTION myCollection AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://myBooks"
elementFormDefault="qualified"
targetNamespace="http://myBooks">
<xsd:element name="bookstore" type="bookstoreType" />
<xsd:complexType name="bookstoreType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="book" type="bookType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="bookType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string" />
<xsd:element name="author" type="authorName" />
<xsd:element name="price" type="xsd:decimal" />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
<xsd:attribute name="publicationdate" type="xsd:string" />
<xsd:attribute name="ISBN" type="xsd:string" />
</xsd:complexType>
<xsd:complexType name="authorName">
<xsd:sequence>
<xsd:element name="first-name" type="xsd:string" />
<xsd:element name="last-name" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>'
A new metadata entity is created for myCollection in which the XML schema is registered. A new row can be added to the table named XmlCatalog (for the table definition, see Example: typed XML column in table earlier in this document) as follows:
INSERT XmlCatalog VALUES(1, '<?xml version="1.0"?>
<bookstore xmlns="http://myBooks">
<book genre="autobiography" publicationdate="1981"
ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel" publicationdate="1967"
ISBN="0-201-63361-2">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre="philosophy" publicationdate="1991"
ISBN="1-861001-57-6">
<title>The Gorgias</title>
<author>
<first-name>Sidas</first-name>
<last-name>Plato</last-name>
</author>
<price>9.99</price>
</book>
</bookstore>
')
Modifying XML Schema Collections
The
ALTER XML SCHEMA COLLECTION statement supports extending an XML schema
in an XML schema collection with new top-level schema components, and
registering new XML schemas with the XML schema collection. This is
illustrated in the following example.
Example: Altering an XML Schema Collection
The following statement shows how a new XML schema with target namespace http://myDVD can be added to the XML schema collection myCollection.
ALTER XML SCHEMA COLLECTION myCollection ADD
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://myDVD"
elementFormDefault="qualified"
targetNamespace="http://myDVD">
<xsd:element name="dvdstore" type="dvdstoreType" />
<xsd:complexType name="dvdstoreType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="dvd" type="dvdType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="dvdType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string" />
<xsd:element name="price" type="xsd:decimal" />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
<xsd:attribute name="releasedate" type="xsd:string" />
</xsd:complexType>
</xsd:schema>'
Catalog Views for XML Schema Collections
SQL
catalog views of XML schema collections allow users to reconstruct the
contents of individual XML schema namespaces. XML schema collections
are enumerated in the catalog view sys.xml_schema_collections.
The XML schema collection "sys" is defined by system and contains
predefined namespaces that can be used in all user-defined XML schema
collections without having to load them explicitly. This list contains
the namespaces for xml, xs, xsi, fn, and xdt.
Two other catalog views worth mentioning are sys.xml_schema_namespaces, which enumerates all namespaces within each XML schema collection; and sys.xml_schema_components, which enumerates all XML schema components within each XML schema.
The
built-in function XML_SCHEMA_NAMESPACE() accepts the names of a
relational schema, an XML schema collection, and optionally the target
namespace of an XML schema. It returns an XML data type instance
containing the XML schema. If the target namespace argument is left
out, the built-in function returns an XML instance that contains all
the XML schemas in the XML schema collection, except for the predefined
XML schemas.
Example: Enumerate XML Namespaces in XML Schema Collection
Use the following query for the XML schema collection myCollection.
SELECT XSN.name
FROM sys.xml_schema_collections XSC
JOIN sys.xml_schema_namespaces XSN ON
(XSC.xml_collection_id = XSN.xml_collection_id)
WHERE XSC.name = 'myCollection'
Example: Output a Specified XML Schema from an XML Schema Collection
The following statement outputs the XML schema with target namespace http://myBooks from the XML schema collection myCollection within the (relational) schema dbo.
SELECT XML_SCHEMA_NAMESPACE (N'dbo', N'myCollection',
N'http://myBooks')
Access Control on XML Schema Collections
XML
schema collections can be secured like any SQL object by using the
security model in SQL Server 2005. You can grant a principal
the privilege to create XML schema collections within a database. Each
XML schema collection supports the permissions ALTER, CONTROL, TAKE
OWNERSHIP, REFERENCES, EXECUTE, and VIEW DEFINITION.
- The ALTER permission is required to execute an ALTER XML SCHEMA COLLECTION statement.
- The
TAKE OWNERSHIP permission is required to transfer ownership of the XML
schema collection from one principal to another by executing an ALTER
AUTHORIZATION statement.
- The REFERENCES permission
authorizes the principal to use the XML schema collection wherever
schema binding is required, such as to type or constraint XML columns
and parameters.
- The EXECUTE permission is required to
validate values inserted or updated by the principal against the XML
schema collection. It is also required to query values from a typed XML
column, variable, or parameter using the XML data type.
- The
VIEW DEFINITION permission allows the principal to access rows in
catalog views corresponding to the XML schema collection, all XML
schemas contained in it, and all schema components contained in those
XML schemas.
- The CONTROL permission gives the principal the
permission to perform any operation on the XML schema collection,
including dropping the XML schema collection by using the DROP XML
SCHEMA COLLECTION statement. It implies other permissions on the XML
schema collection.
Permission on an XML schema collection
is required in addition to other permissions on a table or an XML
column. For example, to create a table T with an XML column X typed
according to an XML schema collection C, the principal requires
permission to create tables and REFERENCES permission on the XML schema
collection C. A principal with permission to insert data into column X
can do so provided the principal has EXECUTE permission on the XML
schema collection C. Similarly, a principal requires SELECT permission
on column X and EXECUTE permission on C to query the data in column X
using XML data type methods. However, SELECT permission on X is
adequate to retrieve entire XML values from column X, such as in SELECT
X FROM T or SELECT * FROM T.
Permissions can be revoked from a
principal, and a principal can be denied permissions as allowed by the
security model of SQL Server 2005.
Visibility of Catalog Views
A
principal having ALTER, TAKE OWNERSHIP, REFERENCES, VIEW DEFINITION, or
CONTROL permission on an XML schema collection can access catalog view
rows for the XML schema collection, its contained XML schemas, and
their XML schema components. With any of these permissions, the
principal can also access the contents of the XML schema collection by
using the built-in function XML_SCHEMA_NAMESPACE() and in FOR XML…
XMLSCHEMA.
If the principle is denied VIEW DEFINITION
permission, the principal cannot access the XML schema collection in
catalog views, or using XML_SCHEMA_NAMESPACE() or FOR XML… XMLSCHEMA.
Enhancements to FOR XML
The
TYPE directive generates an XML data type instance that can be assigned
to an XML column, variable, or parameter, or queried using XML data
type methods. This allows the nesting of SELECT… FOR XML TYPE
statements.
The PATH mode allows users to specify the path in
the XML tree where a column's value should appear and—together with the
aforementioned nesting—is more convenient to write than FOR XML
EXPLICIT. However, it may not perform as well for deep hierarchies.
The directive XSINIL used in conjunction with ELEMENTS maps NULL to an element with the attribute xsi:nil="true".
The new ROOT directive allows a root node to be specified in all modes
of FOR XML. The new XMLSCHEMA directive generates an XSD inline schema.
Performance Guidelines
The XML data model
is richer and more complex than the relational one. Not only does the
XML data model allow you to model complex data, but it also must
preserve hierarchical relationships and document order within the data.
Document order is maintained by sorting based on XML node identifiers;
this simultaneously maintains hierarchical relationships. These
contribute to a more complex query plan.
Structured data should
be stored in relational columns of tables for better performance.
Choose the XML data model for modeling needs when your data is
semi-structured, or unstructured, and contains XML markup but not with
the expectation of better performance. XML schemas aid in query
optimization.
XML Support in SQL Server CLR
Using
the SQL Server CLR (common language runtime) support, you can
write server-side logic in managed code to enforce business rules. This
business logic can be added to XML data in several ways:
- You can write SQLCLR functions in managed
code to which you pass XML values, and use XML processing capabilities
provided by System.Xml namespace. An example is to apply an XSL
transformation to XML data, as shown in the next code example.
Alternatively, you can deserialize the XML into one or more managed
classes and operate on them using managed code.
- You can write Transact-SQL stored procedures and functions that invoke processing on the XML column for your business needs.
Example: Applying an XSL Transformation
Consider a CLR function TransformXml.ApplyXslTransform()
that accepts an XML data type instance and the file path for an XSL
transformation, applies the transformation to the XML data, and returns
the transformed XML in the result. A skeleton function written in
Microsoft Visual C# is as follows:
using System;
using System.Data.SqlTypes;
using System.Xml;
using System.Xml.XPath;
using System.Xml.Xsl;
public class TransformXml
{
public static SqlXml ApplyXslTransform (SqlXml XmlData,
string xslPath){
// Load XSL transformation
XslCompiledTransform xform = new XslCompiledTransform();
xform.Load (xslPath);
System.IO.MemoryStream ms = new System.IO.MemoryStream ();
xform.Transform (XmlData.CreateReader(), null, ms);
ms.Seek (0, System.IO.SeekOrigin.Begin);
// Return the transformed value
SqlXml retSqlXml = new SqlXml(ms);
return (retSqlXml);
}
}
The transformed result is written into the in-memory stream ms. The stream pointer is reset using the ms.Seek() call before passing ms to the SqlXml class constructor.
The assembly must be registered in the database by using the CREATE ASSEMBLY statement. A user-defined Transact-SQL function SqlXslTransform() corresponding to ApplyTransformXml()
must be created by using the statement CREATE FUNCTION. For more
information about these statements, see SQL Server 2005 Books
Online. Then the SQL function can be invoked from Transact-SQL as in
the following query.
SELECT SqlXslTransform (xCol, 'C:\temp\xsltransform.xsl')
FROM docs
WHERE xCol.exist('/book/title/text()[contains(.,"secure")]') =1
The query result contains a rowset of the transformed XML.
SQLCLR
opens up a whole new world that can be used for decomposing XML data
into tables or property promotion, and querying XML data using managed
classes in the System.Xml namespace. For more information, see
SQL Server 2005 Books Online and Visual Studio 2005
Books Online.
Client-Side XML Processing in SQL Server 2005
Client-Side Support for the XML Data Type
Client
access to the XML data type is provided by using ADO.NET, SQL native
client (SQLNCLI), and SOAP over HTTP. The first two are discussed
below; for SOAP access, see SQL Server Books Online.
ADO.NET XML Support in the .NET Framework V2.0
The XML data type is exposed as a class SqlXml in the System.Data.SqlTypes namespace from the SqlDataReader.GetSqlXml() method. You can obtain XmlReader from the SqlXml object by using the SqlXml.CreateReader() function.
The three-part name of the XML schema collection typing an XML column can be obtained from the XML column metadata (by using GetSchemaTable() or GetSqlMetaData (int) on the SqlDataReader object) as three properties indicating the names of the database (XmlSchemaCollectionDatabase), relational schema (XmlSchemaCollectionOwingSchema), and the XML schema collection (XmlSchemaCollectionName).
A
new schema rowset named XMLSCHEMA is available for clients to retrieve
XML schemas from the server. The XMLSCHEMA rowset contains three
columns for an XML schema collection, target namespace, and the XML
schema content itself.
The following examples show skeletal code for managed access to XML data type.
Example: In-Process Access to XML Data Type
The
following Visual C# code illustrates how the XML data type can be
accessed from the in-process provider. The context connection in the
example allows you to execute Transact-SQL statements in the same
context that the CLR code was invoked. For out-of-process access, a new
connection to the database must be established.
using System;
using System.Xml;
using System.Data;
using System.Data.SqlTypes;
using System.Data.SqlClient;
class xmldtADONETReadAccessInProc
{
static void ReadXmlDataType () {
// in-proc connection to server
SqlConnection conn =
new SqlConnection("context connection=true");
// prepare query to select xml data
SqlCommand cmd = conn.CreateCommand();
cmd.CommandText =
"SELECT xCol.query('//section') FROM docs";
// execute query and retrieve incoming data
SqlDataReader r = cmd.ExecuteReader();
r.Read();
// access XML data type field in rowset
SqlXml xml = r.GetSqlXml(0);
new XmlTextWriter(Console.Out).WriteNode(
xml.CreateReader(), true);
}
}
Example: Accessing the Result of FOR XML TYPE
FOR XML with the TYPE directive yields an XML data type instance that a managed client can retrieve by using the SqlXml class. Thus, the code in the previous example remains the same when the cmd.CommandText is modified as follows:
cmd.CommandText =
"SELECT xCol.query('//section') FROM docs FOR XML AUTO, TYPE";
Example: Updating XML Data Type Column Using SQL Client Provider
The following code segment shows a method WriteXmlDataType() that replaces the value in an XML column using the SQL client provider. The code for the in-process provider is similar.
using System;
using System.Xml;
using System.Data;
using System.Data.SqlTypes;
using System.Data.SqlClient;
class xmldtADONETUpdateAccess
{
static void WriteXmlDataType () {
// connection to server
SqlConnection conn = new SqlConnection("server=server1;" +
" database=XMLtest; Integrated Security=SSPI");
conn.Open();
// update XML column at the server
SqlCommand cmd = conn.CreateCommand();
cmd.CommandText = "UPDATE docs SET xCol=@x WHERE id=1";
// set value of XML parameter
SqlParameter p = cmd.Parameters.Add("@x", SqlDbType.Xml);
p.Value = new SqlXml(new XmlTextReader("<hello/>",
XmlNodeType.Document, null));
// execute update and close connection
cmd.ExecuteNonQuery();
conn.Close();
}
}
SQL Native Client Access
In the OLE DB
provider of the new SQL native access (SQLNCLI), an XML data type
column can be retrieved as Unicode text (DBTYPE_XML, DBTYPE_BYTES,
DBTYPE_BSTR, DBTYPE_WSTR, and DBTYPE_VARIANT) or as a Unicode character
stream (DBTYPE_UNKNOWN) by using ISequentialStream. The default
is DBTYPE_XML. XML data can be sent to the server in UTF-16 encoding,
in which case the application must ensure that it is already UTF-16
encoded. The first byte must be a byte-order mark 0xFFFE. An
application can send XML data to the server in some other encoding,
provided the encoding is compatible with that of the database server.
Encoding specified within the XML data is honored. XML should be sent
as binary data in all other cases.
The three-part XML schema collection name is carried in three new columns of COLUMNS schema rowset returned by IDBSchemaRowset::GetRowset():
SS_XML_SCHEMACOLLECTION_CATALOGNAME gives the name of the catalog;
SS_XML_ SCHEMACOLLECTION SCHEMANAME, the name of the relational schema
in which the XML schema collection resides; and
SS_XML_SCHEMACOLLECTIONNAME, the name of the XML schema collection.
These names are of type DBTYPE_WSTR. These columns have NULL values for
untyped XML column for both data retrieval and update.
Similar changes have been made to PROCEDURE_PARAMETERS schema rowset and IColumnRowset:GetColumnRowset().
To
retrieve the contents of the XML schema collection, the client can make
a separate access to the server by using these names in a call to
XML_SCHEMA_NAMESPACE() and get back XML schemas as XML data type.
Alternatively, IDBSchemaRowset with the new SS_XMLSCHEMA schema
rowset returns catalog name, relational schema name, XML schema
collection name, target namespace, and XML schema.
For ODBC
access using SQLNCLI, the XML data type is mapped to Unicode
variable-length character data called SQL_SS_XML. The 3-part XML schema
collection name is surfaced through SQLColAttribute for the XML column
as CharacterAttributePtr. These field identifiers are
SQL_CA_SS_XML_SCHEMACOLLECTION_CATALOG_NAME, SQL_CA_SS
_XML_SCHEMACOLLECTION_SCHEMA_NAME, and SQL_CA_SS
_XML_SCHEMACOLLECTION_NAME for the names of the database, relational
schema, and XML schema collection, respectively.
Users must install the database server or client tools of SQL Server 2005 to get the SQL native client.
XML
data from SQL Server 2005 is available to MDAC APIs by using
SQLOLEDB as DBTYPE_WSTR by default. XML columns cannot be returned as
the result of a user-defined function to clients that are running
versions of SQL Server previous to SQL Server 2005. If this is
attempted, an error is returned.
SQLXML - Mapping Between XML and Relational Schemas
You can create logical XML views of your relational data by using SQLXML mapping technology. An XML view, also referred to as a mapping or annotated schema,
is created by adding special annotations to a given XSD schema. Other
SQLXML technologies can then use this annotated schema to transform
queries and updates against the logical XML view to queries and updates
against relational tables:
- When the XML view is combined with an XPath
query, SQLXML generates a FOR XML query to find the requested data and
shape it as specified in the schema.
- SQLXML updategrams
represent changes to an XML instance that, when combined with an
annotated schema, persist these changes back to relational changes by
using optimistic concurrency to ensure that the proper data is updated.
- SQLXML bulkload uses a XML view to "shred" XML data into relational tables.
More
information about any of these topics can be found in the SQLXML
documentation. This is available on MSDN or by downloading SQLXML from
the MSDN SQLXML site [ http://msdn.microsoft.com/sqlxml ] .
Creating an XML view of Relational Tables
To
create an XML view of the database, you begin with an XSD Schema for
your XML data. The rows in the database table/view map to complex-type
elements in the schema. The column values in the database map to
attributes or simple-type elements.
By default, if no explicit
annotations are given, SQLXML assumes that complex-type elements map to
tables and simple-type elements, and attributes map to columns. This
only works if the names of the elements and attributes are exactly the
same as the names of your tables and columns in your database.
If
the name of an element/attribute is not the same name as the table
(view) or column name to which it maps, an explicit mapping must be
created. The following annotations are used to specify the mapping
between an element or attribute in an XML document and the table (view)
or column in a database.
- sql:relation—Maps an XML element to a database table.
- sql:field—Maps an element or attribute to a database column.
Mapping Relationships to Create Hierarchies in XML Views
In
your database, tables may be related by foreign key relationships. In
XML, these relationships are represented by a nested hierarchy of the
elements. In order to construct a proper nesting in your mapping, you
must specify how the elements are related. You can establish these
relationships among mapping schema elements by using the sql:relationship
annotation. Within this annotation, you can specify the parent and
child tables as well as which columns in each table should be used to
perform the join. SQLXML uses this information to construct the proper
nesting hierarchy for your mapping.
Using Overflow to Store Unconsumed Data
Mapping
works when your XML data has a regular structure. However, there may be
some data in your XML that is unstructured or some data that does not
map to a specific column. To store this data and later retrieve it, the
sql:overflow annotation can be used. The sql:overflow annotation specifies the column into which all unconsumed data is stored and from where it is retrieved when querying.
The
overflow column also allows for expansion of your XML without having to
add to your database. Elements and attributes can be added at any time
to your XML structure without adding columns to store them in your
database. They will simply be stored in the overflow field and
retrieved at the proper time.
For More Information
For more information about creating XML views and to see examples of mappings, see
Creating XML Views by Using Annotated XSD Schemas [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/ssxsdannotations_0gqb.asp ] on MSDN.
Querying the XML View by Using XPath
Once
you've created an XML view of your database, you can query that view as
if it were an actual XML document by using the XPath query language.
SQLXML supports a subset of the XPath 1.0 query language. When an
XPath query is issued against a mapping, SQLXML composes them together
and creates a FOR XML EXPLICIT statement that is sent to
SQL Server. The proper data is retrieved and then shaped according
to the mapping.
For details on the subset of XPath supported
over XML views, see the SQLXML documentation. This is available on MSDN
or by downloading SQLXML from the MSDN
SQLXML page [ http://msdn.microsoft.com/sqlxml ] .
Updating Through XML Views Using Updategrams
You
can modify (insert, update, or delete) a database in SQL Server
through an XML view by using an updategram against an XML view of your
database.
The Structure of an Updategram
An updategram is an XML document with <sync>, <before>, and <after> elements that form the syntax of the updategram. Each <sync> block contains one or more <before> and <after> blocks. The <before> block identifies the existing state (also referred to as "the before state") of the record instance. The <after>
block identifies the new state to which data is to be changed. Whether
an updategram deletes, inserts, or updates a record instance depends on
the contents of the <before> and <after> blocks.
Insert Operations
An updategram indicates an insert operation when a record instance appears in the <after> block, but not in the corresponding <before> block. In this case, the updategram inserts the record in the <after> block into the database.
Delete Operations
An updategram indicates a delete operation when a record instance appears in the <before> block with no corresponding records in the <after> block. In this case, the updategram deletes the record in the <before> block from the database.
If
an element that is specified in the updategram either matches more than
one row in the table or does not match any table row, the updategram
returns an error and cancels the entire <sync> block. Only one record at a time can be deleted by an element in the updategram.
Update Operations
When you are updating existing data, you must specify both the <before> and <after> blocks. The updategram uses the elements that are specified in the <before> block to identify the existing records in the database. The corresponding elements in the <after> block indicate how the records should look after the update operation is executed.
An element in the <before>
block must match only one table row in the database. If the element
either matches multiple table rows or does not match any table row, the
updategram returns an error and cancels the entire <sync> block.
For More Information
For more information about creating and using updategrams to modify data through your XML views, see
Using Updategrams to Modify Data [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/updategram_5kkh.asp ] on MSDN.
Bulk Loading XML Data Through the XML View
XML
Bulk Load is a COM object that allows you to load XML data into
SQL Server tables. You could insert XML data into a
SQL Server database by using an INSERT statement and the OPENXML
function; however, the bulk load utility provides higher performance
when you need to insert large amounts of XML data. XML Bulk Load
interprets the mapping schema and identifies the tables into which the
XML data is to be inserted. It then shreds your XML data into your
relational tables.
Because the source XML document can be
large, the entire document is not read into memory for bulk load
processing. Instead, XML Bulk Load interprets the XML data as a stream
and interprets it reads it. As the utility reads the data, it
identifies the database table, generates the appropriate record from
the XML data source, and then sends the record to SQL Server for
insertion.
For details about how XML Bulk Load works and how to use it, see
Performing Bulk Load of XML Data [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/bulkload_7pv0.asp ] on MSDN.
SQLXML Data Access Methods
Since SQL Server 2000, two new ways to access SQLXML functionality have been added:
- SQLXML Managed Classes
- SQLXML Web Services
In addition, HTTP access to SQL Server has been enhanced to provide support for updategrams within templates.
SQLXML Managed Classes
SQLXML
Managed Classes expose the functionality of SQLXML 3.0 inside the
Microsoft .NET Framework. With SQLXML Managed Classes, you can write a
C# application to access XML data from an instance of SQL Server,
bring the data into the .NET Framework environment, process the data,
and send the updates back to SQL Server to apply the updates.
For more details on how to use SQLXML managed classes, see
SQLXML .NET Support [ http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqlxml3/htm/dotnet_0jck.asp ] on MSDN.
Conclusion
This
article describes complementary technologies for XML in
SQL Server 2005. Server-side features include a native
implementation for XML storage, indexing, and query processing.
Existing features such as FOR XML and OpenXML have also been enhanced.
Client-side support consists of enhancements to ADO.NET to support the
XML data type and in the System.Xml to support XQuery for
querying differing sources of XML. In addition, the SQLXML mapping
technology Web release enhancements have been incorporated into SQL
Server 2005.
The server-side and the client-side support
are useful in different scenarios. The XML data type provides a simple
mechanism for storing XML data by inserting XML data into an untyped
XML column. Using XML schemas to define typed XML helps the database
engine optimize storage and query processing in addition to providing
data validation.
The XML data type preserves document order
and is useful for applications such as document management
applications. It can also handle recursive XML schemas. The relational
data model is still the best choice for structured data with a known
schema. Even [n]varchar(max) is suitable for scenarios where fine-grained query and update are unimportant.
The
SQLXML mapping technology is useful whenever you want to use an
XML-centric programming model over relational data stored in tables at
the server. The mapping is based on defining an XML schema as an XML
view. The mapping can be used for bulk loading XML data into tables and
for querying the tables by using XPath 1.0. Document order is not
preserved, so the mapping technology is useful for XML data processing
as opposed to XML document processing.
The core XML classes in System.Xml
in the .NET Framework 2.0 release enable you to read, write,
manipulate, and transform XML. With improvements in performance,
usability, typing, and querying, the XML support in the 2.0 release
continues to lead the industry in innovation, standards support, and
ease of use.
The server- and client-side technologies
complement one another. The mapping technology can leverage server-side
features to augment features such as maintenance of document order and
recursive schemas, and find more areas of applications. On the other
hand, the CLR brings extensibility and the wealth of existing XML tools
such as XSLT transformation to the XML data type. The XQuery
implementation in the server is aligned with the July 2004 draft
of the XQuery specification. The XML schema implementations at the
server and the client are aligned with each other.
References
XML Best Practices for Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345115.aspx ]
Indexing XML Data Stored in a Relational Database [ http://www.vldb.org/conf/2004/ind5p2.pdf ]
XML Options in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345110.aspx ]
What's New in FOR XML in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345137.aspx ]
About the authors
Shankar Pal is Program Manager in SQL Server engine working on server-side XML technologies. His blog is
http://blogs.msdn.com/spal [ http://blogs.msdn.com/spal ] .
Mark
Fussell is Lead Program Manager in the Microsoft XML Messaging team. He
has worked on data access technologies including the components within
the System.Xml and System.Data namespaces of the .NET Framework,
Microsoft XML Core Services (MSXML), and Microsoft Data Access
Components (MDAC). His blog is http://weblogs.asp.net/mfussell [ http://weblogs.asp.net/mfussell ] .
Irwin Dolobowsky is Program Manager in the Microsoft MSN team.