Prasadarao K. Vithanala
Microsoft Corporation
June 2005
Summary:
This white paper provides an introduction to various features of XQuery
implemented in SQL Server 2005 such as the FLWOR statement, operators
in XQuery, if-then-else construct, XML constructors, built-in XQuery
functions, type casting operators, and examples of how to use each of
these features. Non-supported features of XQuery in SQL Server 2005 and
workarounds are described in this article. It also presents three
scenarios where XQuery is useful. (30 printed pages)
Contents
Introduction
The XML Data Type in SQL Server 2005
Introduction to XQuery
Structure of the XQuery Expression
Operators in XQuery
The if-then-else Construct
Constructing XML Using XQuery
Built-in XQuery Functions
Type-Related Expressions
Accessing Relational Columns and Variables
Non-Supported Features and Workarounds
Best Practices and Guidelines
XML Data Modification
XQuery Usage Scenarios
Conclusion
Introduction
XML
was developed for use as a document format. However, the additional
features of XML, such as extensibility, support for
internationalization, the ability to represent both structured and
semi-structured data, and easy readability by humans and machines, have
rendered XML an immensely popular platform-independent data
representation format. As XML gains wide acceptance, users apply XML to
solve complex business problems such as those involving data
integration. There are many scenarios where it is recommended that
information be stored in XML format rather than to store it in tables
and then compose XML messages from the information. For more
information about these scenarios, see the MSDN article
XML Best Practices for Microsoft SQL Server 2005
[ http://msdn2.microsoft.com/en-us/library/ms345115.aspx ] . The
application of XML for storing documents and for representing
semi-structured data has led to the evolution of XML as a data storage
format that simplifies data management at the server.
This
evolution, however, posed a problem—extracting information from the XML
data stored in relational tables as BLOBs required a query language to
extract and shape information represented in the XML. Microsoft SQL
Server 2000 provided OpenXML which could be used for querying, but was
designed for mapping XML data to relational form and thus could not
provide full support for the XML data model (see Table 1).
The
relational data model and the XML data model differ in a lot of ways.
The following table outlines the major differences between the two data
models.
Table 1 Differences between the relational and XML data models
Feature | Relational Data Model | XML Data Model |
Flat structured data | Uses flat tables to store data in a column form. Recommended way to store flat structured data. | Useful when it is required to preserve the document order or when the schema is flexible or unknown. |
Semi-structured data | It is difficult to model semi-structured data using relational model. | Provides excellent support for representing semi-structured data with variable or evolving schema. |
Markup data | Not suitable for storing markup data beyond BLOB storage. | Excellent for storing markup data such as HTML, RTF and so on. |
Nested or hierarchical data structures | Supports
nested data by using multiple tables and linking them with foreign
keys, but the complexity of queries required for searching nested data
stored in relational form increases when the depth of nesting increases
or is unknown. | Provides excellent support for expressing nested and hierarchical data. |
Order of data | Not preserved. | Preserved. |
Input data | Homogeneous. | Heterogeneous. |
Result set | Homogeneous. | Heterogeneous. |
Both
the increasing need to deal with more heterogeneously structured data
and the necessity of having an implied order are two of the most
important reasons why the relational data model is being extended to
store XML documents natively. In addition, the limitations of the SQL
language in handling semi-structured or markup information resulted in
the development of the XQuery language. The XQuery language has been
designed from scratch taking into consideration the nature of XML data
and the issues involved in processing it.
SQL Server 2005 has
built-in support for the native storage of XML data using the XML data
type. XQuery 1.0 is a language that is being defined by the World Wide
Web Consortium (W3C) XML Query Working Group to formulate queries over
XML data. XQuery, like SQL, is a declarative query language that, as we
will see later, can be easily understood with a basic knowledge of SQL
and XPath.
This paper is based on the implementation of XQuery
1.0 in SQL Server 2005, which in turn is based on the XQuery 1.0 July
2004 working draft. The first section of this paper provides an
overview of the new XML data type and its associated features.
Subsequent sections introduce the new XQuery language and its
advantages, the FLWOR statement, various operators in XQuery, built-in
functions in XQuery, type-related expressions, and non-supported
features in SQL Server 2005, respectively. Finally, the last sections
provide information on best practices and guidelines, XML data
modification, and XQuery usage scenarios.
The XML Data Type in SQL Server 2005
The
new XML data type introduced in SQL Server 2005 provides users with the
ability to store XML documents and fragments in the database. An XML
data type can be used to create a column, a parameter to a stored
procedure or function, or a variable.
Additionally, the user
can associate a column of type XML with an XML schema collection to
create a typed XML column. The XML schemas in the collection are used
to validate and type the XML instances.
Typed vs. Untyped XML Data Type
An
XML data type can be associated with an XML schema collection to have
schema constraints enforced on XML instances. If the XML data is
associated with an XML schema collection, it is called typed XML. Otherwise, it is called untyped XML.
The
SQL Server 2005 XML data type implements the ISO SQL-2003 standard XML
data type. It can store not only well-formed XML 1.0 documents, but
also so-called XML content fragments with top-level text nodes and an
arbitrary number of top-level elements. Checks for well-formedness of
the data are performed, and these checks do not require the XML data
type to be bound to XML schemas. Data that is not well-formed (subject
to the SQL-2003 content relaxations) is rejected.
Untyped XML is useful when the schema is not known a priori.
It is also useful when the schema is known but changing rapidly and
thus hard to maintain, or multiple schemas exist and are late bound to
the data based on external requirements. Furthermore, untyped XML is
useful when the XML schemas contain XML Schema constructs not supported
by the database engine (e.g. key/keyref, lax validation). In that case,
you could use the System.XML validator inside a CLR (common language
runtime) user-defined function to provide validation.
If you
have XML schemas in an XML schema collection describing your XML data,
you can associate the XML schema collection with the XML column to
yield typed XML. The XML schemas are used to validate the data, perform
more precise type checks than untyped XML during compilation of query
and data modification statements, and optimize storage and query
processing.
Typed-XML columns, parameters, and variables can
store XML documents or content fragments, which you can specify as an
option (DOCUMENT or CONTENT, respectively, with CONTENT as the default)
at the time of declaration. Furthermore, you have to provide the
collection of XML schemas. Specify DOCUMENT if each XML instance has
exactly one top-level element; otherwise, use CONTENT. The XQuery
compiler uses the DOCUMENT flag information to infer Singleton
top-level elements during static type inference.
Methods of the XML Data Type
The
XML data type supports five methods that can be used to manipulate XML
instances. The methods of the XML data type can be described as follows:
The query()
method takes an XQuery expression that evaluates to a list of XML nodes
and allows the user to extract fragments of an XML document. The result
of this method is an instance of untyped XML.
The value()
method is useful for extracting scalar values from XML documents as a
relational value. This method takes an XQuery expression that
identifies a single node and the desired SQL type to be returned. The
value of the XML node is returned cast to the specified SQL type.
The exist()
method allows the user to perform checks on XML documents to determine
if the result of an XQuery expression is empty or nonempty. The result
of this method is 1 if the XQuery expression returns a nonempty result,
0 if the result is empty, and NULL if the XML instance itself is NULL.
Decomposing an XML document into relational data is facilitated by the nodes() method of the XML data type. The nodes()
method accepts an XQuery expression and returns a rowset in which each
row represents a context node identified by the query expression.
Methods of the XML data type such as query(), value(), exist(), and nodes() can also be invoked on the context nodes returned from the nodes() method.
The modify()
method can be used to modify the content of an XML document. It accepts
XML DML statements to insert, update, or delete one or more nodes from
an XML instance. It raises an error if applied to a NULL value.
For more information, see the
XML Support in Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345117.aspx ] white paper.
Introduction to XQuery
XQuery
is a new language for querying XML data that allows navigational access
based on XPath 2.0. This section provides an overview of various
aspects of the XQuery language such as the relationship between XQuery
and XPath, advantages of using XQuery, application areas of XQuery, the
role of XML Schema in XQuery, and so on.
Overview of XPath 2.0
XPath 1.0
[ http://www.w3.org/tr/xpath ] , as defined by the W3C, is a language
for locating parts of an XML document. XPath uses path-based syntax for
identifying nodes in the XML document. It also defines the core syntax
for both XSLT 1.0 [ http://www.w3.org/tr/xslt ] and
XPointer
[ http://www.w3.org/tr/wd-xptr ] . XPath 1.0 has built-in functions to
handle strings, Boolean values, and floating point numbers. It defines
the syntax for filtering the node set with the ability to specify
filtering criteria. XPath 1.0 is being extended in
XPath 2.0 [ http://www.w3.org/tr/xpath20/ ] to support a more detailed type system and to provide more functionality.
XQuery 1.0
[ http://www.w3.org/tr/xquery/ ] in turn is based on XPath 2.0 and adds
ordering, reshaping, construction, and validation capabilities to the
navigational and filtering aspects of XPath 2.0.
Overview of XQuery
XQuery
is a declarative, typed, functional language designed from scratch by
the XML Query Working Group specifically for the purpose of querying
data stored in XML format. XQuery shares the same data model and the
same XML Schema [ http://www.w3.org/xml/schema ] -based type system with other members of the XML standards family such as XPath 2.0 and
XSLT 2.0
[ http://www.w3.org/tr/xslt20/ ] . XQuery is designed to work with XML
documents that are untyped (no schema associated with the data), typed
with XML schemas, or a combination of both. As previously mentioned,
XQuery 1.0 is basically a superset of XPath 2.0. In addition to the
features of XPath 2.0, it has the following capabilities:
- Adds an order by clause to the FLWOR clause to sort in non-document order.
- Adds a let clause to the FLWOR clause to name results of expressions for further use (not supported in SQL Server 2005).
- Provides a way to specify static context items in the query prolog (such as namespace prefix bindings).
- Provides the ability to construct new nodes.
- Provides the ability to define user-defined functions (not supported in SQL Server 2005).
- Provides the ability to create modules/libraries (not supported in SQL Server 2005).
Advantages of XQuery
- It is easy to learn if knowledge of SQL and XPath is present.
- When queries are written in XQuery, they require less code as compared to queries written in XSLT.
- XQuery
can be used as a strongly typed language when the XML data is typed,
which can improve the performance of the query by avoiding implicit
type casts and provide type assurances that can be used when performing
query optimization.
- XQuery can be used as a weakly typed
language for untyped data to provide high usability. SQL Server 2005
implements static type inferencing with support for both strong and
weak type relationships.
- Because XQuery requires less code to perform a query than does XSLT, maintenance costs are lower.
- XQuery is going to be a W3C recommendation and will be supported by major database vendors.
Caveat regarding XQuery 1.0 language as of the writing of this document:
- The XQuery specification is currently under
development and may change in the future. SQL Server 2005 implements a
stable part of the W3C working draft.
Application Areas of XQuery
Application areas of XQuery can be classified broadly as follows:
- XQuery for query/analysis: XQuery is
excellent for querying huge chunks of data and provides the capability
to filter, sort, order, and repurpose the required information. Typical
applications include querying XML documents that represent
semi-structured information, name-value pair property bags, analysis of
application logs, transaction logs and audit logs to identify potential
application errors and security issues, and so on.
- XQuery for application integration: As
organizations move away from proprietary application integration
approaches and start adopting standards based application integration
approaches, the need for transforming data from internal
application-specific formats to standard exchange formats is gaining
more focus. Because of its ability to construct and transform XML data,
XQuery caters to this need. One typical use of XQuery in the
application-integration domain is translating the vocabulary used by
one application that uses native XML database/relational data source
into a language used by another application that uses XML/relational
data format.
Advantages of Using XQuery at the Server
Performing
XML processing at the server using XQuery has many advantages compared
to client-side XML processing. Some of these advantages can be
summarized as follows:
- Reduced traffic on the network: When
XML data is processed on the server, only the results are forwarded to
the client. This results in reduced traffic on the network.
- More security:
Only the data that is required is sent to the client thereby avoiding
the risk of exposing the entire data to the network as is the case when
using client-side XML processing.
- Better maintainability:
Processing XML on the server results in browser independent code on the
client, which leads to better maintainability on the client side.
- Improved performance:
Queries written using XQuery on the server are subjected to
optimization by the SQL query engine. This results in improved
performance when compared to retrieving the entire data and filtering
the data at the client. Furthermore, indexes can be created on the XML
data type column to achieve enhanced performance.
How XQuery Implementation Uses XML Schemas
The XML schema collection associated with an XML data type is used by the relational engine as follows:
- To validate the XML instances during insertion operations.
- To validate the XML instances during modification operations.
- Type
information contained in the XML schema is used during static type
checking for early error detection and for query performance
improvements by generating better query plans and avoiding many runtime
inspections.
- Type information present in the XML schema is used by the SQL Server to optimize storage.
Structure of the XQuery Expression
An
XQuery expression in SQL Server 2005 consists of two sections—a prolog
and a body. A prolog can in turn contain a namespace declaration
subsection. Namespace declarations are used to define a mapping between
prefix and namespace URI thereby enabling you to use the prefix instead
of the namespace URI in the query body. You can also refer to element
names without the prefix by binding a default namespace for element
names, using the declare default namespace declaration.
The
body of an XQuery expression contains query expressions that define the
result of the query. It can for example be the signature FLWOR
expression (see The FLWOR Statement in this paper), an XPath 2.0 expression (see XPath 2.0 Expressions in this paper), or another XQuery expression such as a construction or arithmetic expression.
Example: Specifying default namespace in prolog section of XQuery
The following query selects all Employment nodes for a candidate whose JobCandidateID is 3. The query defines a default namespace and does not use a namespace prefix:
SELECT Resume.query('
declare default namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
/Resume/Employment
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Specifying namespace using the "WITH XMLNAMESPACES" clause
SQL
Server also supports the SQL-2003 standard extension that allows a user
to declare XML namespace bindings in the SQL WITH clause on a per SQL
query basis, thus avoiding repetitive declarations in multiple XML data
type method invocations. The following query shows the modified version
of the query shown in the previous example. This query declares a
namespace using the WITH XMLNAMESPACES clause:
WITH XMLNAMESPACES( 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume' AS "RES")
SELECT Resume.query('
/RES:Resume/RES:Employment
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
XPath 2.0 Expressions
XQuery
uses XPath 2.0 expressions to locate nodes in a document and to
navigate from one location to another within a single document or
across documents. Navigation paths defined using XPath consist of a
sequence of steps separated by /. A single step comprises an axis, a
node test, and zero or more step qualifiers.
The axis specifies the direction of movement, relative to the context node. Supported axes in SQL Server 2005 are child, descendant, parent, attribute, self and descendant-or-self.
A node test
specifies a condition that all the nodes that are selected by a step
must satisfy. The node condition can be specified based on node name or
node type.
Step qualifiers can be defined by using predicates or dereferences. A predicate is an expression that acts as a filter on a node sequence and is specified within square brackets. A dereference
maps the elements and/or attributes nodes in a node sequence to the
nodes that they reference. A node sequence passed as an input to the
dereference must contain elements or attributes of type IDREF or
IDREFS. The dereference generates a new sequence consisting of the
element nodes whose ID-type attribute values match the IDREF values
extracted from the elements and attributes in the input sequence.
The
steps of an XPath expression are evaluated from left to right.
Execution of a step sets the evaluation context items for the next
step. A context item in a path expression is a node that is selected as
a result of the execution of a step in an XPath expression. A step is
evaluated relative to the context item that was obtained in the
previous step. The result of an XPath expression is a sequence of nodes
in document order that are obtained after executing all the steps in
the expression in the order from left to right in the path expression.
This
example expression uses the column Resume which is of type XML in table
[HumanResources].[JobCandidate] from the AdventureWorks database for
the purpose of illustrating the concept of path expressions. The
following path expression selects all address nodes for which the
address type is set to Home.
//child::ns:Addr.Type[.="Home"]/parent::node()
In the preceding path expression,
- child is the axis specifier.
- :: is the axis separator.
- ns is the namespace prefix.
- Addr.Type is the node test.
- [.="Home"] is the predicate expression where . refers to the context node.
XQuery
also supports abbreviated syntax for specifying the axis. The following
table shows the axis and corresponding abbreviated syntax.
Table 2 Abbreviated syntax for axes
Axis | Abbreviated form |
Attribute | @ |
Child | |
descendant-or-self::node() | // |
parent::node() | .. |
self::node() | . |
Example: Selecting organization names from employment history
The following XPath expression selects the children text nodes of Emp.OrgName elements that are child nodes of node Resume/Employment. Here, text() is used to select the text node of the Emp.OrgName element :
/Resume/Employment/Emp.OrgName/text()
The FLWOR Statement
FLWOR
statements form the core expressions of XQuery and are similar to the
SELECT statements of SQL. The acronym FLWOR (pronounced "flower")
stands for FOR, LET, WHERE, ORDER BY, RETURN. FLWOR expressions in
XQuery enable users to specify operations such as declarative
iteration, variable binding, filtering, sorting, and returning the
results. SQL Server 2005 supports FOR, WHERE, ORDER BY, and RETURN.
For
The for
clause in a FLWOR expression enables users to define a declarative
iteration of a bound variable over an input sequence. The input
sequence can be specified using XPath expressions, sequence of atomic
values, a sequence constructed using literals, or constructor
functions. It is therefore analogous to the SQL SELECT FROM clause and
is not like a programming language "for" construct.
Variable binding is also specified in the for clause.
Example: Selecting all home address elements from resume using the for clause
The following query selects all Address nodes where the type of address is set to Home and the JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A in /RES:Resume/RES:Address/RES:Addr.Type[.="Home"]/..
return
$A
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
where
The where clause filters the results of an iteration by applying the expression specified with the where clause.
Example: Selecting all home address elements using the where clause
The following query selects all Address nodes where the type of address is set to Home and the JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A in /RES:Resume/RES:Address
where $A/RES:Addr.Type[.="Home"]
return
$A
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
order by
The order by keyword enables you to sort the values in the returned result set. The order by keyword accepts a sorting expression, which should return an atomic value. Optionally, you can also specify ascending or descending for the sort order. The default sort order is ascending.
Example: Selecting employment history in ascending order using the order by clause
The following query selects all Employment nodes in ascending order of employment starting date for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
order by $EMP/RES:Emp.StartDate
return
$EMP
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
return
The return clause, which
is analogous to the SELECT clause in SQL, enables users to define the
result of a query. You can specify any valid XQuery expression in the return clause. You can also construct XML structures in the return section by specifying constructors for elements, attributes, etc.
Example: Selecting specific elements of employment history using the return clause
The following query selects StartDate, EndDate, OrgName, JobTitle elements of Employment node for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
order by $EMP/RES:Emp.StartDate
return
<Employment>
{ $EMP/RES:Emp.StartDate }
{ $EMP/RES:Emp.EndDate }
{ $EMP/RES:Emp.OrgName }
{ $EMP/RES:Emp.JobTitle }
</Employment>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Current implementation of XQuery in SQL Server 2005 does not support the let clause. This aspect is discussed further in Non-supported Features and Workarounds in this paper.
FLWOR Expressions vs. XPath Expressions
Using
a FLWOR expression to define the result sequence when the sequence can
be expressed using an XPath expression incurs a performance penalty
because the query plan contains a JOIN operation between the for variable and the body of the for clause. The use of a FLWOR expression is justified only if one or more of the following conditions are satisfied:
- If you want to iterate over a sequence of values that are returned as a result of an expression. This is achieved using the for
clause, which binds a variable to successive values of the result set.
Examples are the construction of new elements within the scope of the for clause and the retention of duplicates.
- When you want to filter the result sequence of the for clause based on a predicate which cannot be defined using simple XPath expressions. The where clause is used to eliminate unwanted values in the result set. An example is as follows:
DECLARE @Result xml
SET @Result = '<Result />'
SELECT @Result.query('
for $i in (1, 2, 3), $j in (3, 4, 5)
where $i < $j
return sum($i + $j)
') as Result
- If you want to sort the result set based on a sorting expression. Sorting is defined on the result set using the order by clause.
- When you want to define the shape of the returned result set using the results obtained from the for clause. The return statement is used to perform the shaping of the result set.
In all other cases, using XPath expressions is recommended.
Operators in XQuery
As
a functional language, XQuery in SQL Server 2005 supports various types
of functions and operators that can be grouped under the following
categories:
- Arithmetic operators
- Comparison operators
- Logical operators
Table 3 Operators supported in SQL Server 2005
Type | Operators |
Arithmetic operators | +,-,*,div, mod |
General comparison operators | =, !=, <, >, <=, >= |
Value comparison operators | eq, ne, lt, gt, le, ge |
Node comparison operator | is |
Node order comparison operators | >>, << |
Logical operators | and, or |
Arithmetic Operators
SQL Server 2005 supports five arithmetic operators. These are +, b, *, div, and mod. Currently, it does not support idiv.
Example: Converting selected values of store survey information
This
example is based on the [Sales].[Store] table in the AdventureWorks
database. The following query returns values of AnnualSales,
AnnualRevenue in yen and the store area in square meters for a store
whose CustomerID is 3:
SELECT Demographics.query('
declare namespace ST="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";
for $S in /ST:StoreSurvey
return
<StoreDetails
SalesInYen ="{ $S/ST:AnnualSales*106.8100 }"
RevenueInYen = "{ $S/ST:AnnualRevenue*106.8100 }"
StoreAreaInSqMeters = "{ $S/ST:SquareFeet*0.0929 }">
</StoreDetails>
') as Result
FROM [Sales].[Store]
WHERE CustomerID = 3
Comparison Operators
SQL Server 2005
has implemented support for four types of comparison operators—general
comparison operators, value comparison operators, node comparison
operators, and node order comparison operators.
General comparison operators
General
comparison operators help compare atomic values, sequences, or a
combination of the two. General comparison operators are =, !=, <, >, <=, and >=. A general comparison is existentially quantified, meaning that any match will result in true.
Example: Selecting all address elements where the address type is not set to Home
The following query selects all Address nodes where the type of address is not set to Home and the JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A in /RES:Resume/RES:Address
where $A/RES:Addr.Type[.!="Home"]
return
$A
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Value comparison operators
Value comparison operators help compare atomic values. Value comparison operators supported by SQL Server 2005 are eq, ne, lt, gt, le, and ge.
The current implementation of XQuery with respect to promoting untyped
atomic values is not aligned with the July 2004 draft of the XQuery
specification. The untyped atomic type is promoted to the type of the
other operand instead of xs:string as specified in the XQuery
specification. This is to maintain consistency across general and value
comparison operators which we consider more important than making the
value comparison transitive.
Example: Selecting all education elements where the GPA is greater than 3.5
The following query selects all Education nodes where the GPA is greater than 3.5 for the candidate whose JobCandidateID is 2:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where xs:decimal($ED/RES:Edu.GPA) gt 3.5
return
$ED
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Node comparison operators
You can use the node comparison operator is
to compare two nodes to determine if they represent the same node or
not. The node comparison operator accepts two operands that are of type
node.
Example: Comparing two address nodes to check their identity
The following query compares two address nodes to check if they represent the same node in the document:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
if ( (/RES:Resume/RES:Address)[1] is (//RES:Address)[1] )
then
<Result>Nodes are equal</Result>
else
<Result>Nodes are not equal</Result>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Node order comparison operators
You can
use the node order comparison operators to ascertain the order of two
nodes in a document. The node order comparison operators supported by
SQL Sever 2005 are >> and << and both the operators accept
two operands. The >> operator returns true if the left operand precedes the right operand in document order, and the << operator returns true if the left operand follows the right operand in document order.
Example: Comparing the order of two address nodes
The following query compares the order of two address nodes for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
if ( (/RES:Resume/RES:Address/RES:Addr.Type[.="Home"])[1] << (/RES:Resume/RES:Address/RES:Addr.Type[.="Permanent"])[1] )
then
<Result>Home address precedes Permanent address</Result>
else
<Result>Home address follows Permanent address</Result>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Logical Operators
The logical operators supported by XQuery in SQL Server 2005 are and and or. The value of any logical expression formed using these operators can be either true or false.
Example: Using the and operator to create a logical expression
The following query returns the candidate's education element that contains bachelor level business degree:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
/RES:Resume/RES:Education[RES:Edu.Level="Bachelor" and RES:Edu.Major="Business"]
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
The if-then-else Construct
Like other functional languages, XQuery supports the if-then-else construct. You can use the if-then-else statement to perform operations based on the value of a conditional expression.
Example: Using a conditional expression
The following query displays the type of address that is specified in the resume for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A in /RES:Resume/RES:Address
return
if ( $A/RES:Addr.Type eq "Home" )
then
<Result>Home Address</Result>
else
<Result>Other Address</Result>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Constructing XML Using XQuery
XQuery
constructors enable you to create XML structures within a query.
Constructors are available for elements, attributes, processing
instructions, text nodes, and comments.
The following examples illustrate these approaches to constructing XML.
Example: Using constant expressions
The following query displays employment history details constructed using constant expressions:
SELECT Resume.query('
<Employer IndustryCategory="ITServices">
<Organization>ABC Technologies</Organization>
<JobTitle>Software Engineer</JobTitle>
<StartDate>2001-10-01</StartDate>
<EndDate>2003-05-09</EndDate>
</Employer>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using data obtained dynamically
The
following query displays employment history details constructed using
the results obtained from a query for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
return
<Employer Organization = "{ $EMP/RES:Emp.OrgName }" >
{ $EMP/RES:Emp.StartDate }
{ $EMP/RES:Emp.EndDate }
{ $EMP/RES:Emp.JobTitle }
</Employer>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using constant names for computed element and attribute constructors
The following query displays the employment history of a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
return
element Employer
{
attribute Organization { $EMP/RES:Emp.OrgName },
element StartDate { string($EMP/RES:Emp.StartDate) },
element EndDate { string($EMP/RES:Emp.EndDate) },
element JobTitle { string($EMP/RES:Emp.JobTitle) }
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Performing XQuery Construction vs. Shaping Using FOR XML
Some
applications have a requirement to generate XML from a rowset. On the
server, XML can be generated using a FOR XML clause or XQuery
constructions or XML DML operations. Recommendations on using FOR XML
and XQuery constructors for constructing XML are as follows:
- If XML is to be aggregated from multiple columns and multiple rows, then FOR XML is the only choice.
- If
a single XML instance is to be reshaped, then it can be done using
XQuery as well as FOR XML. XQuery might be faster since the FOR XML
approach will require multiple invocations of XML data type methods on
the XML instance.
- You can use multiple XML DML statements to
construct an XML instance. This approach is significantly slower than
XQuery construction.
- Use the new TYPE directive available with
the FOR XML clause in SQL Server 2005 to generate the result of a FOR
XML query as an instance of XML data type.
Built-in XQuery Functions
Implementation
of XQuery in SQL Server 2005 supports a subset of the built-in
functions of XQuery 1.0 and XPath 2.0. These functions include data
accessor functions, string manipulation functions, aggregate functions,
context functions, numeric functions, Boolean functions, node
functions, and sequence functions. The following sections explore some
of these functions.
Data Accessors
You can
use the data accessor functions to extract values of nodes as strings
or typed values. XQuery supports two types of data accessor functions: string(), which extracts the string value of an item and data(), which gets the typed value. If the node is not a text node, an attribute node, or an element node then the data() function throws a static error. If the node is a document node of an untyped XML instance, then data() returns a string value of the document. The data() function returns a static error if the node is a complex typed element.
Example: Using the string() function
See the Using constant names for computed element and attribute constructors
example in the section "Constructing XML Using XQuery" in this paper
for a query that generates the employment history of a candidate by
using the string() function and computed element constructors.
Example: Using the data() function
The following query generates the employment history of a candidate by using the data() function and computed element constructors:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where xs:decimal( data($ED/RES:Edu.GPA) ) gt 3.5
return
element Education
{
element Level { data($ED/RES:Edu.Level) },
element Degree { data($ED/RES:Edu.Degree) },
element GPA { data($ED/RES:Edu.GPA) },
element GPAScale { data($ED/RES:Edu.GPAScale) }
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
String Manipulation
XQuery supports four string manipulation functions:
- concat() Helps concatenate two or more strings.
- contains()
Helps determine whether or not a string specified as the first operand
contains another string specified as the second operand. The length of
the search string is limited to 4,000 Unicode characters.
- substring() Helps extract portion of a string from another string known as source string.
- string-length() Helps calculate the length of a string.
The current release of SQL Server 2005 supports only the Unicode codepoint collation.
Example: Using the concat() and substring() functions
The following query generates the employment history of a candidate by concatenating the values of start date, end date, organization name, and job title:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
return
<Employment>
{
concat( substring(string($EMP/RES:Emp.StartDate),1,10)," to ",
substring(string($EMP/RES:Emp.EndDate),1,10), ", ",
string($EMP/RES:Emp.OrgName), ", ",
string($EMP/RES:Emp.JobTitle) )
}
</Employment>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using the contains() function
The following query demonstrates the use of the contains() function by displaying Education details for a node that contains the string science in the value of the element Edu.Degree:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where contains($ED/RES:Edu.Degree, "Science")
return
element Education
{
element Level { data($ED/RES:Edu.Level) },
element Degree { data($ED/RES:Edu.Degree) },
element GPA { data($ED/RES:Edu.GPA) },
element GPAScale { data($ED/RES:Edu.GPAScale) }
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Aggregate Functions
Aggregate functions
operate on a sequence of items and return the aggregate values of the
sequence. Aggregate functions currently supported in the XQuery support
in SQL Server 2005 are count(), min(), max(), avg(), and sum(). The functions min() and max() accept only base types that support the gt
operator (i.e., the three built-in numeric base types, the date/time
base types, xs:string, xs:boolean, and xdt:untypedAtomic). A sequence
of mixed types is not supported in these functions. Furthermore,
xdt:untypedAtomic is treated as xs:double.
For avg() and sum(),
the type of the passed expression needs to be a subtype of one of the
three built-in numeric base types or untypedAtomic (but not a mixture,
xdt:untypedAtomic is treated as xs:double).
The count() function returns the number of items in a sequence.
Example: Using the count() function
The following query displays the count of employment, education, and address elements present in the document using the count() function:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
<Employment>Element count is { count(/RES:Resume/RES:Address) }</Employment>,
<Education>Element count is { count(/RES:Resume/RES:Education) }</Education>,
<Address>Element count is { count(/RES:Resume/RES:Address) }</Address>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using the min() function
The following query displays the education element for which the GPA value is minimum using the min() function:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where $ED/RES:Edu.GPA = min(/RES:Resume/RES:Education/RES:Edu.GPA)
return
$ED
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Example: Using the max() function
The following query displays the education element for which the GPA value is maximum using the max() function:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where $ED/RES:Edu.GPA = max(/RES:Resume/RES:Education/RES:Edu.GPA)
return
$ED
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Example: Using the avg() function
The following query calculates weekly average high and low temperatures for the cities of New York and Boston using the avg() function:
DECLARE @Weather xml
SET @Weather = '
<WeatherInfo>
<NewYork>
<Temp Date="2004-11-01" High="55" Low="45" />
<Temp Date="2004-11-02" High="58" Low="42" />
<Temp Date="2004-11-03" High="60" Low="40" />
<Temp Date="2004-11-04" High="51" Low="47" />
<Temp Date="2004-11-05" High="54" Low="41" />
<Temp Date="2004-11-06" High="55" Low="43" />
<Temp Date="2004-11-07" High="58" Low="47" />
</NewYork>
<Boston>
<Temp Date="2004-11-01" High="53" Low="45" />
<Temp Date="2004-11-02" High="56" Low="42" />
<Temp Date="2004-11-03" High="54" Low="41" />
<Temp Date="2004-11-04" High="52" Low="45" />
<Temp Date="2004-11-05" High="52" Low="36" />
<Temp Date="2004-11-06" High="54" Low="41" />
<Temp Date="2004-11-07" High="56" Low="44" />
</Boston>
</WeatherInfo>'
SELECT @Weather.query('
<WeatherInfo>
<NewYork>
<AvgHigh>{ avg(/WeatherInfo/NewYork/Temp/@High) }</AvgHigh>
<AvgLow>{ avg(/WeatherInfo/NewYork/Temp/@Low) }</AvgLow>
</NewYork>
<Boston>
<AvgHigh>{ avg(/WeatherInfo/Boston/Temp/@High) }</AvgHigh>
<AvgLow>{ avg(/WeatherInfo/Boston/Temp/@Low) }</AvgLow>
</Boston>
</WeatherInfo>
') as Result
Example: Using the sum() function
The following query calculates weekly average high and low temperatures for the cities of New York and Boston using the sum() and count() functions:
DECLARE @Weather xml
SET @Weather = '
<WeatherInfo>
<NewYork>
<Temp Date="2004-11-01" High="55" Low="45" />
<Temp Date="2004-11-02" High="58" Low="42" />
<Temp Date="2004-11-03" High="60" Low="40" />
<Temp Date="2004-11-04" High="51" Low="47" />
<Temp Date="2004-11-05" High="54" Low="41" />
<Temp Date="2004-11-06" High="55" Low="43" />
<Temp Date="2004-11-07" High="58" Low="47" />
</NewYork>
<Boston>
<Temp Date="2004-11-01" High="53" Low="45" />
<Temp Date="2004-11-02" High="56" Low="42" />
<Temp Date="2004-11-03" High="54" Low="41" />
<Temp Date="2004-11-04" High="52" Low="45" />
<Temp Date="2004-11-05" High="52" Low="36" />
<Temp Date="2004-11-06" High="54" Low="41" />
<Temp Date="2004-11-07" High="56" Low="44" />
</Boston>
</WeatherInfo>'
SELECT @Weather.query('
<WeatherInfo>
<NewYork>
<AvgHigh>{ sum(/WeatherInfo/NewYork/Temp/@High) div count(/WeatherInfo/NewYork/Temp/@High) }</AvgHigh>
<AvgLow>{ sum(/WeatherInfo/NewYork/Temp/@Low) div count(/WeatherInfo/NewYork/Temp/@Low) }</AvgLow>
</NewYork>
<Boston>
<AvgHigh>{ sum(/WeatherInfo/Boston/Temp/@High) div count(/WeatherInfo/Boston/Temp/@High) }</AvgHigh>
<AvgLow>{ sum(/WeatherInfo/Boston/Temp/@Low) div count(/WeatherInfo/Boston/Temp/@Low) }</AvgLow>
</Boston>
</WeatherInfo>
') as Result
Context Functions
You can use the
context functions to obtain the contextual properties of a context
item. SQL Server 2005 implements two context functions—last() and position(). The last() function can be used to determine the number of items in a sequence and the position() function can be used to obtain the position of a context item. Both the last() and position()
functions without an argument can only be used in the context of a
context-dependent predicate (i.e., inside []) in SQL Server 2005.
Example: Using the last() function
The following query retrieves the last address element for a candidate using the last() function:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
/RES:Resume/RES:Address[last()]
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using the position() function
The following query retrieves the first two address elements for a candidate using the position() function:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
/RES:Resume/RES:Address[position()<=2]
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Type-Related Expressions
XQuery
supports various types of expressions or operators that are based on
the type information. These expressions can be classified as type
assertion expressions, type inspection expressions, and type casting
expressions. These expressions are discussed briefly in the following
sections.
Type Assertion Expressions
as xs:TYPE clause in for statement
You can use the as clause to specify the type for the binding variable used in the for statement.
When
a type is declared for the binding variable, binding values that are
not of the declared type would result in type error. The xs:TYPE clause
is not a cast expression but it serves as a type assertion.
Example: Using "as xs:TYPE" clause with for statement
The following query binds the address node sequence to a variable $A which is defined as type element(RES:Address):
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A as element(RES:Address) in /RES:Resume/RES:Address
return
$A
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Type Inspection Expressions
instance of xs:TYPE operator
The instance of operator helps identify the runtime type of an item in an XML document.
Example: Using "instance of xs:TYPE" operator
The
following query checks to see if the type of an address node identified
by an XPath expression matches element() type or not:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
if ( (/RES:Resume/RES:Address)[1] instance of element() )
then
<Result>Selected node is an Element</Result>
else
<Result>Selected node is not an Element</Result>
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Type Casting Expressions
Implicit Type Casting
The
XQuery engine performs implicit type casting for numeric types and
untypedAtomic values in expressions that contain arithmetic operations
or function invocations. This process is known as type promotion. Type
promotion occurs when an expression results in a numeric type that is
incompatible with the expected numeric type. Type promotion is
performed by casting the resulting expression to the required type.
Example: Implicit type casting
The
following query performs an arithmetic operation on a decimal value and
a double value. In the current scenario, the values in the expression
are added only after promoting the xs:decimal value to xs:double.
DECLARE @Result xml
SET @Result = '<Result />'
SELECT @Result.query('
<Result>{ xs:decimal("10.55") + xs:double(1.5e1) }</Result>
') as Result
Explicit Type Casting
Typed value constructors
XQuery
provides constructor functions for all built-in types defined in the
XML Schema specification. These constructors are useful for
constructing typed values and also for casting values from one type to
another. XQuery also makes constructors available for types that are
defined in imported schemas.
Example: Using a value constructor for constructing values
The following query returns all Employment nodes for which the StartDate is greater than a value constructed using constructor for type xs:date:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $EMP in /RES:Resume/RES:Employment
where $EMP/RES:Emp.StartDate gt xs:date("1995-01-01")
return
$EMP
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 3
Example: Using a value constructor for typecasting
The following query selects all Education nodes where the GPA is greater than or equal to 3.8 for the candidate whose JobCandidateID is 2. This query uses the value constructor for xs:decimal for typecasting the value of Edu.GPA from xs:string to xs:decimal:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
where xs:decimal( data($ED/RES:Edu.GPA) ) ge 3.8
return
element Education
{
element Level { string($ED/RES:Edu.Level)},
element StartDate { string($ED/RES:Edu.StartDate)},
element EndDate { string($ED/RES:Edu.EndDate)},
element Degree { string($ED/RES:Edu.Degree)},
element GPA { string($ED/RES:Edu.GPA)},
element GPAScale { string($ED/RES:Edu.GPAScale)}
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
cast as xs:TYPE ? Operator
XQuery in SQL Server 2005 supports the cast as TYPE ? operator, which is useful for performing explicit type casting. Explicit type casting can also be performed using the xs:TYPE() constructors, which are more convenient to write than the cast as TYPE ? operator.
Example: Using "cast as xs:TYPE ?" operator
The following query generates an XML that contains typed values of selected elements from Education node set for a candidate whose JobCandidateID is 3:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
return
element Education
{
element Level { $ED/RES:Edu.Level cast as xs:string? },
element StartDate { $ED/RES:Edu.StartDate cast as xs:date? },
element EndDate { $ED/RES:Edu.EndDate cast as xs:date? },
element Degree { $ED/RES:Edu.Degree cast as xs:string? },
element GPA { $ED/RES:Edu.GPA cast as xs:decimal? },
element GPAScale { $ED/RES:Edu.GPAScale cast as xs:decimal? }
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Example: Using "xs:TYPE()" operator
The following query generates the same results as the query in the previous example using the xs:TYPE() operator instead of the cast as xs:TYPE ? operator:
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $ED in /RES:Resume/RES:Education
return
element Education
{
element Level { xs:string($ED/RES:Edu.Level) },
element StartDate { xs:date($ED/RES:Edu.StartDate) },
element EndDate { xs:date($ED/RES:Edu.EndDate) },
element Degree { xs:string($ED/RES:Edu.Degree) },
element GPA { xs:decimal($ED/RES:Edu.GPA) },
element GPAScale { xs:decimal($ED/RES:Edu.GPAScale) }
}
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = 2
Accessing Relational Columns and Variables
When
writing queries using XQuery, it is not uncommon to have the
requirement for accessing relational columns and variables from within
the query. SQL Server 2005 caters to this requirement by implementing
two functions—sql:column() and sql:variable().
The function sql:column()
can be used to access non-XML columns in a relational table from within
a query. This function is useful for scenarios such as aggregating
information from XML and non-XML type columns of one or more tables,
using the values of non-XML columns to filter the results of an XQuery,
and so on. The functions sql:column() and sql:variable() cannot be used with datetime, CLR user-defined functions or XML .
Example: Using sql:column() function
The following query generates an XML that contains values from the CustomerID and Name columns of non-XML data type and values of YearOpened, NumberOfEmployees, AnnualSales, and AnnualRevenue elements from the Demographics column:
SELECT Demographics.query('
declare namespace ST="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";
element CustomerInfo
{
element CustomerID { sql:column("Store.CustomerID") },
element Name { sql:column("Store.Name") },
element YearOpened { string((/ST:StoreSurvey/ST:YearOpened)[1]) },
element NumberOfEmployees { string((/ST:StoreSurvey/ST:NumberEmployees)[1]) },
element AnnualSales { string((/ST:StoreSurvey/ST:AnnualSales)[1]) },
element AnnualRevenue { string((/ST:StoreSurvey/ST:AnnualRevenue)[1]) }
}
') as Result
FROM [Sales].[Store] Store
WHERE Store.CustomerID = 4
Example: Using sql:variable() function
The following stored procedure returns home address nodes of a specified candidate by using the value of the @AddrType variable in the XQuery where clause to filter the results of the query:
CREATE PROCEDURE [GetCandidateAddress]
@JobCandidateID [int],
@AddrType [varchar](20)
AS
BEGIN
SET NOCOUNT ON;
SELECT Resume.query('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
for $A in /RES:Resume/RES:Address
where $A[ RES:Addr.Type = sql:variable("@AddrType") ]
return
$A
') as Result
FROM [HumanResources].[JobCandidate]
WHERE JobCandidateID = @JobCandidateID
END
Example: Using sql:variable() function in conjunction with the exist() method of the XML data type
The following query returns resumes of candidates who hold a Bachelor's degree with a major in Business:
DECLARE @EducationLevel varchar(20)
SET @EducationLevel = 'Bachelor'
DECLARE @Major varchar(20)
SET @Major = 'Business'
SELECT JobCandidateID, Resume
FROM [HumanResources].[JobCandidate]
WHERE Resume.exist ('
declare namespace RES="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
/RES:Resume/RES:Education[ RES:Edu.Level = sql:variable("@EducationLevel") and RES:Edu.Major = sql:variable("@Major") ]') = 1
Non-Supported Features and Workarounds
Current implementation of XQuery in SQL Server 2005 does not support the following features:
- The let clause: The let clause, which is a part of the FLWOR expression is useful for binding a variable to the results of an expression. Workaround—Instead of using the let clause, use an inline expression.
- Range expression (to operator): A range expression can be used to construct a sequence of consecutive integers using the to operator. For example, using a range expression such as (6 to 10), it is possible to construct the sequence (6, 7, 8, 9, 10). Workaround—Instead of using the to operator, list all items in your sequence.
- Type information:
Some features that are based on type system such as typeswitch, treat
as, castable, and validate expressions are currently not supported.
- The functionality of a typeswitch
expression is similar to the switch-case construct available in other
programming languages. In a switch-case statement, the branches/cases
are selected based on the value of the argument passed to the switch.
In a typeswitch expression, the branches are selected based on the type
of the argument passed to typeswitch. Workaround—use if then else and instance of.
- The treat as
expression can be used to change the static type of the result of an
expression to a specific static type and raises a static type error if
the static type of the expression does not match the specified type. It
does not change the dynamic type or value of the expression.
Workaround—none
- The castable expression is useful for checking whether an atomic value can be cast to the specified type. Workaround—Instead of using the expression "$x castable as T?", use the expression "empty(data($x)) or not(empty(T($x)))"
- The validate expression performs validation on its argument based on the schema definitions present in the current scope. Workaround—Instead of using the validate expression, use the Transact-SQL casting to the requested schema collection.
- Reusability:
As in other programming languages, it is possible to write reusable
functions, known as user-defined functions, that contain useful and
complex queries. In addition, it is also possible to package a
collection of user-defined functions as modules. Queries can make use
of the functions available in modules by importing the modules. Modules
can be imported into a query by including them in the prolog section of
the query. No workaround in SQL Server 2005.
- Built-in functions:
The following built-in functions are currently not supported in SQL
Server 2005. For more information about these functions, see
XQuery 1.0 and XPath 2.0 Functions and Operators [ http://www.w3.org/tr/xquery-operators/ ] on the W3C.org Web site.
- Accessors: fn:node-name(), fn:nilled(), fn:base-uri(), fn:document-uri().
- Error function: fn:error().
- Trace function: fn:trace().
- Functions on numeric values: abs(), round-half-to-even().
- String-handling functions:
codepoints-to-string(), string-to-codepoints(), compare(),
string-join(), normalize-space(), normalize-unicode(), upper-case(),
lower-case(), translate(), escape-uri(), starts-with(), ends-with(),
substring-before(), substring-after(), matches(), replace(), tokenize().
- Functions and operators for anyURI: resolve-uri().
- Functions and operators on durations, dates, and time:
years-from-duration(), months-from-duration(), days-from-duration(),
hours-from-duration(), minutes-from-duration(),
seconds-from-duration(), year-from-dateTime(), month-from-dateTime(),
month-from-dateTime(), day-from-dateTime(), hours-from-dateTime(),
minutes-from-dateTime(), seconds-from-dateTime(),
timezone-from-dateTime(), year-from-date(), month-from-date(),
day-from-date(), timezone-from-date(), hours-from-time(),
minutes-from-time(), seconds-from-time(), timezone-from-time(),
adjust-dateTime-to-timezone(), adjust-date-to-timezone(),
adjust-time-to-timezone(),
subtract-dateTimes-yielding-yearMonthDuration(),
subtract-dateTimes-yielding-dayTimeDuration(),
subtract-dates-yielding-yearMonthDuration(),
subtract-dates-yielding-dayTimeDuration(). Also types
xdt:dayTimeDuration and xdt:yearMonthDuration are not supported.
- Functions related to QNames: resolve-QName(), QName(), namespace-uri-for-prefix(), in-scope-prefixes().
- Functions on nodes: name(), lang(), root(). Workaround—use / instead of root().
- Functions and operators on sequences:
fn:boolean(), fn:index-of(), fn:exists(), insert-before(), remove(),
reverse(), subsequence(), unordered(), zero-or-one(), one-or-more(),
exactly-one(), deep-equal(),two-argument version of id(), idref(),
doc(), collection() functions and union, intersect and except
operators. Workarounds—use not(not()) instead of fn:boolean(),
use not(empty()) instead of fn:exists(). Use either an explicit for
iteration or [1] instead of zero-or-one() or exactly-one().
- Context functions: current-dateTime(), current-date(), current-time(), default-collation(), implicit-timezone().
- Positional variable: Positional variables can be defined as part of a FLWOR statement using the at clause and are useful for identifying the location of an item in the result of an expression.
- Order modifier: The order modifier "empty greatest | least" specified with an order by clause is not supported.
- Other features: The following features are not supported:
- The idiv operator.
- Explicit schema import is not supported.
- External variables are not supported.
- Boundary white space preservation option is not supported.
- Concatenation of heterogeneous sequences such as nodes and values is not supported.
- No support for time zone preservation.
- Accessing a text node of an element with a simple typed content is not supported.
- No support for accessing xsi:* attributes in a typed XML data type instance.
Best Practices and Guidelines
- Take
advantage of type information if it is available. This will provide
performance improvements and the ability to perform static type
checking to detect errors.
- Use XQuery for property promotions
if you want to extract the values of a few properties from XML data
type and use them in relational queries. Frequently used properties are
promoted to relational columns and indexed for better performance.
- Use an ordinal, such as [1], or explicit FLWOR to avoid static errors due to cardinality mismatch.
- Use explicit casts to avoid static type errors.
- Index
usage: Create a PATH index if you have queries with heavy use of XPath
expressions. Use a VALUE index when the XQuey query contains XPath
expressions that involve searching for element or attribute values with
imprecisely known paths (e.g., /a/* or //b). PROPERTY index is useful
when the query involves searching for all occurrences of a known
property within an XML instance.
- Use a default namespace when most of the types referred are part of a single namespace. Otherwise, use a prefix.
For more information on best practices, see the MSDN articles
XML Best Practices for Microsoft SQL Server 2005 [ http://msdn2.microsoft.com/en-us/library/ms345115.aspx ] and
Performance Optimizations for the XML Data Type [ http://msdn2.microsoft.com/en-us/library/ms345118.aspx ] .
XML Data Modification
SQL
Server 2005 provides support for modifying XML instances stored in the
database. The current version of the W3C XQuery working draft does not
define a syntax for modifying XML documents. In order to provide a
mechanism for modifying XML documents, Microsoft has developed XML Data
Modification Language (DML). XML documents can be modified using the modify method of the XML data type and specifying the modification using XML DML statements.
XML DML uses the insert, delete, and replace value of
keywords to support insert, delete, and update operations on XML
documents. Modification of a typed XML instance is subjected to
validation checks according to the schema constraints defined on the
XML data type.
The following table and XML instance are used to illustrate XML DML operations:
Table:
CREATE TABLE [CandidateInfoXMLDataType]
(
[JobCandidateID] [int] IDENTITY(1,1) NOT NULL,
[Resume] [xml] NOT NULL,
[ModifiedDate] [datetime] NOT NULL DEFAULT (getdate())
)
Sample XML Instance:
<JobCandidate>
<Name>
<FirstName>Mike</FirstName>
<MiddleName></MiddleName>
<LastName>Chen</LastName>
</Name>
<Address>
<Address1>34 181st Place SE</Address1>
<Address2>Apt 3344</Address2>
<City>Redmond</City>
<State>WA</State>
<Country>US</Country>
<PhoneNumber>9870909023</PhoneNumber>
</Address>
<Education>
<BachelorDegree>BS</BachelorDegree>
<MasterDegree>MS</MasterDegree>
</Education>
<Skills>
<Skill>ASP.NET</Skill>
<Skill>SQL</Skill>
</Skills>
<Employement>
<Employer>
<OrgName>ABC Technologies</OrgName>
<Location>NY, US</Location>
<StartDate>20/02/2000</StartDate>
<EndDate>10/04/2004</EndDate>
<JobTitle>Project Leader</JobTitle>
<Responsibility>Responsible for design,development,testing activities</Responsibility>
</Employer>
</Employement>
</JobCandidate>
The Insert Operation
You can use the insert keyword to insert one or more nodes in an XML document. The insert
keyword accepts an XQuery expression that identifies the nodes to be
inserted and another XQuery expression that specifies the reference
node.
In addition, you can include keywords such as into, after, and before to specify the position of new nodes in relation to the reference node. When you specify the into keyword, new nodes are inserted as children of the reference node. If you include the into keyword, you must also specify the as first or the as last
keyword to indicate the position of inserted nodes with respect to the
existing child nodes of the reference node. You can also insert new
nodes as sibling nodes after or before the reference node by specifying
the after or the before keyword.
If the target
expression (Expression2) does not identify a single node statically,
the insert operation will fail with a static error.
Syntax:
insert
Expression1 (
{{{as first | as last} into} | after | before}
Expression2 )
Example: Inserting a skill
The following stored procedure allows the user to insert a new skill for a specified candidate. This stored procedure uses the sql:variable()
function to access a Transact-SQL variable inside the XML DML
statements. For details on how to bind non-XML relational data inside
XML, see Accessing Relational Columns and Variables.
The
following stored procedure is written based on the assumption that the
user will pass a string value of one skill as the second argument to
the stored procedure. This stored procedure can be modified to accept
an XML fragment that contains one or more skill elements, thereby
enabling the user to insert multiple skill nodes, with a single call,
to the stored procedure.
/* Stored procedure to insert a new skill element for a candidate */
CREATE PROCEDURE [InsertSkillInfo]
@JobCandidateID [int],
@Skill [varchar](200)
AS
BEGIN
SET NOCOUNT ON;
UPDATE [CandidateInfoXMLDataType]
SET Resume.modify('
insert element Skill {sql:variable("@Skill")}
as last
into (/JobCandidate/Skills)[1]
')
WHERE JobCandidateID = @JobCandidateID
END
The Delete Operation
The delete keyword enables you to delete one or more nodes from an XML instance. The delete keyword accepts an XQuery expression that identifies one or more nodes to be deleted from the XML document.
Syntax:
delete Expression
Example: Deleting a skill
The following example illustrates the use of the delete keyword to delete a skill for a specified candidate.
The
following stored procedure is written based on the assumption that the
user will pass a string value of one skill as the second argument to
the stored procedure. This stored procedure can be modified to accept
an XML fragment that contains one or more skill elements, thereby
allowing the user to delete multiple skill nodes with a single
invocation of the stored procedure.
/* Stored procedure to delete a specified skill element for a candidate */
CREATE PROCEDURE [DeleteSkillInfo]
@JobCandidateID [int],
@Skill [varchar](200)
AS
BEGIN
SET NOCOUNT ON;
UPDATE [CandidateInfoXMLDataType]
SET Resume.modify('
delete (/JobCandidate/Skills/Skill[.=sql:variable("@Skill")])
')
WHERE JobCandidateID = @JobCandidateID
END
This stored procedure can easily be modified to accept an XML
fragment which contains one or more skill elements thereby allowing the
user to delete multiple skill nodes with a single invocation of stored
procedure.
The Update Operation
The replace value of
keyword enables you to modify the value of an existing node. To modify
the value of an existing node, you have to specify an XQuery expression
that identifies the node whose value is to be updated and another
expression that specifies the new value of the node.
While
modifying an untyped XML instance, the target expression should return
a simply typed node. In the case of a typed XML instance, the target
expression should evaluate to the same type or a subtype of the source
expression.
Syntax:
replace value of
Expression1
with
Expression2
Example: Updating a skill
The following example shows how to update an existing skill value for a specified candidate using the replace value of keyword.
/* Stored procedure to update a specified skill element for a candidate */
CREATE PROCEDURE [UpdateSkillInfo]
@JobCandidateID [int],
@SkillOld [varchar](200),
@SkillNew [varchar](200)
AS
BEGIN
SET NOCOUNT ON;
UPDATE [CandidateInfoXMLDataType]
SET Resume.modify('
replace value of (/JobCandidate/Skills/Skill[.=sql:variable("@SkillOld")]/text())[1]
with sql:variable("@SkillNew")
')
WHERE JobCandidateID = @JobCandidateID
END
Unlike the delete operation, update and insert operations can only affect one node in one operation.
XQuery Usage Scenarios
Scenario 1: Performance Appraisal System
The
Human Resources department of an organization needs a performance
appraisal system which can handle typical appraisal-related activities
for the company. Typically, appraisal records contain information that
is mostly descriptive in nature such as employee performance measured
against the key objective set for the appraisal period, defining key
objectives for the next appraisal period, identifying employee training
needs, and so on. XML is best suited for handling such information. The
appraisal information for an employee can be stored in a column of type
XML which would allow the users of the system to query and analyze the
performance history of employees using XQuery. Furthermore, XML DML can
be used to modify the appraisal records.
Let us assume that the
training department of the organization is conducting a training
session on ASP.NET and would like to invite all employees who requested
training in ASP.NET. The following query selects all employees who
opted for a training session on ASP.NET during their appraisal process.
SELECT Appraisal.query('
for $PA in /PerformanceAppraisal,
$Skill in $PA/TrainingNeeds/Technical/Skill
where contains($Skill, "ASP.NET")
return
element Employee
{
element EmpID { data($PA/Employee/EmployeeID) },
element EmpName { data($PA/Employee/EmployeeName) },
element EMail { data($PA/Employee/EMailID) },
element Skill { data($Skill) }
}
') as Result
FROM [EmployeePerformanceAppraisal]
WHERE Appraisal.exist('/PerformanceAppraisal/TrainingNeeds/Technical/Skill/text()[contains(.,"ASP.NET")]') = 1
Scenario 2: Medical Records System
A
hospital needs a system which can capture medical information related
to patients. This information includes patient details, insurance
information, admission details, diagnosis information, treatment
information, and so on. This information will be used for research or
reference purposes. XML is the format of choice for storing this
information as the patient medical data such as symptoms, lab reports,
and treatment information contains descriptive information. Patient
data can be stored in a column of type XML. The hospital can use XQuery
for analyzing this information.
The following table and the XML instance are used to demonstrate the use of XQuery in a Patient Medical Records scenario:
Table:
CREATE TABLE [MedicalRecords](
[PatientID] [int] IDENTITY(1,1) NOT NULL,
[PatientRecord] [xml] NOT NULL,
[ModifiedDate] [datetime] NOT NULL DEFAULT (getdate())
,
PRIMARY KEY CLUSTERED
(
[PatientID] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE PRIMARY XML INDEX idx_PatientRecord on [MedicalRecords] (PatientRecord)
GO
CREATE XML INDEX idx_PatientRecord_Path on [MedicalRecords] (PatientRecord) USING XML INDEX idx_PatientRecord FOR PATH
Sample XML Instance:
<PatientRecord>
<PatientDetails>
<Name>Robert</Name>
<Gender>Male</Gender>
<Age>5</Age>
<InsuranceInfo>
<Company>Blue Cross Blue Shield</Company>
<ID>D8456798</ID>
</InsuranceInfo>
</PatientDetails>
<HospitalDetails>
<Name>KK Hospital</Name>
<Department>Pediatrics</Department>
</HospitalDetails>
<AdmissionDetails>
<RegistrationNo>D4321</RegistrationNo>
<DateAdmitted>2004-05-02</DateAdmitted>
<DateDischarged>2004-05-08</DateDischarged>
</AdmissionDetails>
<ProblemDetails>
<Symptoms>
<Symptom>Abdominal pain</Symptom>
<Symptom>Dehydration</Symptom>
</Symptoms>
<Diagnosis>Diarrhea</Diagnosis>
<TreatmentInfo>
<Therapy>Oral Rehydration Therapy</Therapy>
<PrescriptionDetails>
<Item>Pepto-Bismol</Item>
<Item>Electrolyte Solutions</Item>
</PrescriptionDetails>
</TreatmentInfo>
</ProblemDetails>
</PatientRecord>
Let us assume that a doctor is interested in viewing the
medical records of patients who were admitted with the symptoms of
"Fever" and "Abdominal Pain". The following query retrieves records of
patients who were admitted with the symptoms of "Fever" and "Abdominal
Pain".
SELECT PatientID, PatientRecord.query('
element PatientName { data(/PatientRecord/PatientDetails/Name) },
element MedicalInfo { /PatientRecord/ProblemDetails }
') as Result
FROM [MedicalRecords]
WHERE PatientRecord.exist('/PatientRecord/ProblemDetails/Symptoms/Symptom/text()[contains(.,"Fever")]') = 1
AND PatientRecord.exist('/PatientRecord/ProblemDetails/Symptoms/Symptom/text()[contains(.,"Abdominal Pain")]') = 1
Scenario 3: Asset Management System
The
Information Technology department of an organization needs to develop
an application which can manage assets such as hardware and software.
This application should be capable of tracking information related to
hardware assets including Asset ID, user details, system information
such as processor, memory, BIOS, motherboard, video card details,
software installed on the system, purchase information, and warranty
information. The application should also maintain information related
to the software assets of the organization such as software type,
number of licenses purchased, etc. The application should be extensible
to handle any new asset types that will be defined in the future. This
asset management system will be used for generating reports such as
hardware usage information, software license usage information, and
hardware items due for maintenance. In the current scenario, there is a
need for storing information confirming to different schemas in the
same column of a table. Untyped XML can be used to store information
with varying schemas. Typed XML with a schema collection can also be
used to store such information. The asset information stored as an XML
data type can be queried using XQuery.
Let us assume that the
Information Technology department is interested in selecting a list of
computer systems where both Microsoft Windows XP and Microsoft Office
2000 are installed. The following query selects records of hardware
assets where both Windows XP and Office 2000 are installed.
SELECT AssetID, AssetDetails.query('
<Asset>
{
element UserName { data(/AssetInfo/UserInfo/Name) },
element ComputerName { data(/AssetInfo/SystemInfo/ComputerName) },
element OS { data(/AssetInfo/SystemInfo/OS) }
}
</Asset>
') as Result
FROM [Assets]
WHERE Category = 'Hardware'
AND AssetDetails.exist('/AssetInfo/SystemInfo/OS/text()[contains(.,"Windows XP")]') = 1
AND AssetDetails.exist('/AssetInfo/SoftwaresInstalled/Software/text()[contains(.,"Office 2000")]') = 1
Conclusion
This
paper serves as the starting point for readers who are interested in
learning the basics of XQuery in the context of SQL Server 2005.
Different features of the XQuery language that are implemented in SQL
Server 2005 along with non-supported features are discussed in the
paper. Examples illustrating the use of different features of XQuery
language are also presented. The XQuery usage scenarios in this paper
will help the user to identify scenarios where using XQuery is
appropriate.
For more information: