Konstantin Surkov
June 2006
Applies to:
Microsoft SQL Server
Summary:
This article describes different ways of improving the performance of
SQL Server queries, with a focus on index optimization. (6 printed
pages)
Contents
Introduction
Data and Index Organization
Optimization Rules of Thumb
Introduction
The
purpose of this document is to describe different ways of improving the
performance of SQL Server queries. Most of this document will describe
index optimization, with occasional references to particular code
snippets. In other words, this document will describe how to achieve
the best performance, given the tables and queries to run against.
Database
design issues and entity-relationship modeling techniques are out of
scope of this document, even though flawed design can severely impact
the performance in many ways.
Data and Index Organization
Index
and data organization will be discussed together for two reasons.
First, there is rarely a table without an index. Even a small lookup
table has a primary key that allows other table to refer to it, and a
primary key is always based on the index. Second, data organization
depends on the indexes that exist for the table; a table with a
clustered index is organized differently than a table without one.
Non-Clustered Indexes/Heap-Based Tables
Figure 1 shows the organization of a non-clustered index defined on a heap-based table.

Figure 1. Non-clustered index/heap-based table
A
heap table, by definition, is a table that doesn't have any clustered
indexes. Different pages of the heap-based table occupy different
non-contiguous areas on a disk, and they are not linked together in any
way. Each non-clustered index defined on the table will have a
corresponding entry in a sysindexes table with an indid
between 2 and 254, pointing to the first IAM (Index Allocation Map)
page. IAM pages represent the linked list of all pages used by the
database object (table or index) and used by SQL Server for allocating
and deallocating storage space. It is important to remember that IAM
pages are not used to search through data or index pages, but only to
allocate and deallocate them.
An observant reader might ask at
this point: What if there is no index defined for a table at all? Well,
the address of the first IAM page of the heap table itself is stored in
the sysindexes table with indid = 0 (see Figure 2).
In that respect, the abbreviation IAM is a little misleading; it would
be better called SAM (Storage Allocation Map or Space Allocation Map).

Figure 2. Table with no index defined
All the indexes (clustered and non-clustered) are B-Trees, where B
stands for "balanced" (not "binary," as is sometimes thought), which
basically means that it will take about the same number of jumps to get
to the leaf node for any value of the key. The number of jumps required
to get to the leaf node has a direct impact on the total time required
to read the data using that index, so let's explore this a little bit
more.
All pages, including non-leaf index pages, have a size of
8192 bytes, including 96 bytes occupied by the header—this leaves 8096
bytes for the index data. Let's calculate how many jumps will be
required for the index built on the single field of the integer type.
Each pointer to the index page of the next level or to the data page in
the heap takes 6 bytes, so the total size of one index entry is 10
bytes. This means that each index page can contain up to 800 index
entries. Let's assume that the average number of entries on each index
page is 400. This means that for a table holding up to 160,000 rows of
data, it will take only three jumps to get to the data page. The first
jump is to the root index page, the second jump is to the
leaf-node-level index page, and the third jump is to the data page
itself. For a table holding 160,000 to 64,000,000 rows of data, it will
take four jumps.
This calculation demonstrates that the width of
the key directly affects the number of required jumps. If we have the
key defined on the char (200) column, it will take only three jumps for
a table holding 400 rows or less; four jumps for a table holding up to
8,000 rows; and five jumps for a table holding up to 160,000 rows. This
leads to the conclusion that indexes built on wider keys perform
searches slower (in addition to taking more space).
Covering Indexes
The term covering index
does not mean a separate kind of index having a different internal
structure. Rather, this term is used to describe a certain technique
that is used to improve performance. It is discussed here because, as
will be shown later, for most practical purposes, a clustering index
can be thought of as a covering index.
Let's consider a simple but typical example. Suppose that we need to join two tables—Master, defined as (MasterId int primary key, Data int), and Detail, defined as (DetailId int primary key, Data int, MasterId int)—and select columns (MasterID, Detail.Data) where Master.Data satisfies a certain condition. The condition can be anything that is built directly on the value of Master.Data—for example, Data = @data, Data between @start and @end, or Data in (1, 2, 3). Let's also assume that, besides primary key indexes, the Master table has an index idxMaster built on (Data, MasterID), and the Detail table has an index idxDetail built on (MasterID). The query plan will look as shown in Figure 3.

Figure 3. Query plan
What does that mean, exactly? Let's analyze it step by step:
- The query engine searches idxMaster, to find the entries that satisfy our condition for Master.Data, and it builds a list of corresponding Master.MasterID values. This is fast, because:
- Index pages are sorted by Master.Data, so the engine can jump directly to leaf index pages that satisfy our condition.
- Those leaf index pages already contain values of Master.MasterId, so there is no need to jump somewhere else to read them.
- The engine selects entries from idxDetail having values of Detail.MasterID that match the list of Master.MasterID built during Step 1. This can be done by simply looping through the list of Master.MasterID and looking up each corresponding entry in idxDetail, or by using more complicated hash or merge algorithms.
The query engine determines which algorithm to use in each particular
case, based on the relative cost of each algorithm. This, in turn,
depends on the following criteria:
- Selectivity of the index
- Estimated number of lookups
- Estimated number of rows to find.
For
the low relative values of the second and third criteria, the loop
algorithm is preferred, whereas for higher values, the hash/merge
algorithms become more efficient. The output of this step is a list of (MasterID, Detail:FID:PN:RN) values (see Figure 1).
- A list of heap pointers (FID:PN:RN), also known as bookmarks, is used to read randomly located heap data pages of the Detail table, and to retrieve values of the Data field in order to compile the requested result set (MasterID, Detail.Data). This step is called bookmark lookup,
and, in our case, it is the most expensive one. The cost of bookmark
lookup can be anywhere from 90 to 99 percent of the total query cost.
This means that eliminating this step can improve the performance
ten-fold or even one-hundred-fold. The cost of bookmark lookup depends
on the fragmentation of data pages, the number of data pages to read,
that availability of read-ahead optimization, the IO-to-CPU cost ratio,
and other factors.
Please note that, in this example, bookmark lookup has only been performed on the Detail table, not on the Master table. This is because the required output field MasterID was a part of the index. This leads us to the definition of covering index:
A covering index is the index that contains all output fields required
by the operation performed on that index. Please note that if the Master
table contained more fields, and at least one of these fields was
present in the SELECT clause, the query engine would have to perform
bookmark lookup on the Master table, and therefore idxMaster
could not serve as a covering index anymore. That is why it is not
possible to tell whether the index is covering without taking into
consideration the query it is used in, and even the particular
execution plan.
It is very easy now to devise a way to improve the performance of our query. Adding Detail.Data as a second field to idxDetail eliminates the need for bookmark lookup and considerably improves the performance.
Good candidates for covering indexes are junction tables
(tables that exist to resolve many-to-many relationships). In the
simplest case of a many-to-many relationship between T1 and T2, it is
enough to have two indexes, (T1_Id, T2_Id) and (T2_Id, T1_Id), in order to save the query engine from ever doing bookmark lookups on this table.
It
is rarely feasible to use covering indexes for the fact tables, since
there are usually many fields requested from these. Putting all
required fields into the index, in order to make it covering, would in
a way make it a very ineffective clustered index, which brings us to
the next topic.
Clustered Indexes/Index-Based Tables
Figure 4 shows the organization of an index-based table (a table that has a clustered index).

Figure 4. Clustered index/index-based table
As
you can see, this diagram is very similar to the diagram of a
non-clustered index (Figure 1), except that leaf nodes contain
data, instead of pointers to data. Therefore, the easiest way to
understand a clustered index is to treat it as a covering index
consisting of all the fields in the table, with three additional
advantages:
- Data stored in leaf nodes is not duplicated.
On the other hand, in the case of a covering index on a heap table,
data will be stored both in the index and in the heap.
- Table scan (called clustered index scan
in the case of an index-based table) is faster than table scan of a
heap table, because leaf nodes of the clustered index are stored
together. The internal implementation of modern hard drives allows
storing all the data read from one logical cylinder of the hard drive
in its internal cache; in the case of an index-based table, it is more
likely that this data will contain more data pages of the table being
scanned.
- If hash or merge algorithms are selected by the query
engine for the join, data retrieved from the table should be sorted by
the join key. If a clustered index is built on the join key, retrieved
data is already sorted.
Several special considerations have to be kept in mind about index-based tables. The most important one is that all other (non-clustered) indexes of the index table contain values of the clustering key, instead of heap pointers.
In the available sources, it is never mentioned what exactly was the
reason for this design decision, so we can only guess that the
additional cost of searching through the clustering index is outweighed
by the cost saved during the reallocation of the clustered index pages.
Indeed, nothing would prevent non-clustered indexes of the index-based
table from pointing directly to the row in the leaf node that contains
the data, but this pointer should be updated each time this row is
moved to another location. Also, obviously, only one clustered index
can exist for the table. So out of all possible indexes that could be
clustered, it is necessary to select one that will give maximum
performance gain.
Good candidates for clustered indexes are:
- Primary keys of the lookup/reference/dimension/master tables
- Foreign keys of the fact/detail tables
- Datetime fields of the tables queried by the date range
Optimization Rules of Thumb
- Always
look at the query plan first. It will show you the optimal current
execution plan from the query engine's point of view. Find the most
expensive part of the execution plan and start optimizing from there.
However, even before that, make sure that the statistics on all tables
in your query are up to date, by running the update statistics <TableName> command on all tables in your query.
- If
you see table scan, optimize. Table scan is the slowest possible way of
execution. Table scan means not only that no index is used, but that
there is no clustered index for this table at all. Even if you can only
replace table scan with clustered index scan, it is still worth it.
- If
you see clustered index scan, find out whether it can be replaced with
index seek. For that, find what conditions are applied to this table.
Usually, conditions exist for two or three fields of the table. Find
out the most selective condition (that is, the condition that would
produce the smallest number of records if applied alone), and see
whether an index on this field exists. Any index that lists this field
first will qualify. If there is no such index, create it and see
whether the query engine picks it up.
- If the query engine is
not picking up the existing index (that is, if it is still doing a
clustered index scan), check the output list. It is possible that seek
on your index is faster than clustered index scan, but involves
bookmark lookup that makes the combined cost greater than use of a
clustered index. Clustered index operations (scan or seek) never need
bookmark lookup, since a clustered index already contains all the data.
If the output list is not big, add those fields to the index, and see
whether the query engine picks it up. Please remember that the combined
size is more important than the number of fields. Adding three integer
fields to the index is less expensive than adding one varchar field
with an average data length of 20.
Summarizing this rule, try to make your index covering, and see
whether it works better than clustered index scan. Please note that it
is not always possible to make the query engine pick up your index
automatically. A small table or a low-selectivity index will produce
clustered index scan, even if your index is covering.
- If you see bookmark lookup, it means that your index is
not covering. Try to make it covering if it makes sense (see the
preceding guidelines).
- The execution plan selected by the
query engine may be not the best one. The query engine makes certain
assumptions about disk subsystem and CPU cost versus IO cost. These
assumptions sometimes can be incorrect. If you don't believe that the
query engine's selection is the best one, run a query in the loop for
10 to 15 minutes with automatic selection, change the query to use your
index (you will have to use index hint to force it), and then run it
for 10 to 15 minutes again. Compare the results to see which one works
better.
- Avoid any operations on the fields, where possible.
Some operations will prevent the use of the index on this field even if
it exists—for example, the infamous ltrim(rtrim(FieldName)); other operations will degrade the performance. For example, instead of using the condition cast(DateField as varchar(20)) = @dateString, try to convert @dateString to an expression of datetime type first, and then compare it to DateField.
- Please
note that the query engine cost estimate does not include the cost of
embedded procedure or function calls. If you compare between plain join
and select from table-value functions, the latter would seem to
have smaller cost, but it usually does not. In such a situation, use
your own metrics to find out which query performs better.
- When it is not possible to avoid operation on the field, use an index built on that expression. This can be done in two ways:
- Create a calculated field based on your expression.
- Create a view, and build an index on it.
Note
SQL Server requires certain conditions to be met in order to allow the
use of calculated fields and indexed views (set quoted_identifier on, set arithabort on, and so on).
- Indexed
views are a good way to further speed up the query if you are not
satisfied with the results. Indexed view is a clustered index built
over the view's select list. You can also define additional indexes for
the indexed view, just as you can for any regular table. Indexed views
take disk space and involve some maintenance overhead (every time
underlying tables change, the indexed view also has to change), but
they usually provide a good boost in performance, even after all other
optimization techniques are exhausted.
Special thanks to Andrei Volkov for many interesting discussions about the SQL Server internals.