Building Reliable, Asynchronous Database Applications Using Service Broker
Roger Wolter
February 2005
Updated June 2005
Applies to:
Microsoft SQL Server 2005 Service Broker
Summary:
Microsoft SQL Server 2005 Service Broker is a new platform for building
distributed asynchronous database applications. Including an
asynchronous, reliable messaging feature in the SQL Server database
makes it possible to build a variety of database applications that were
difficult, if not impossible, to build before. Service Broker and the
other developer features in SQL Server 2005, such as common language
runtime (CLR) integration and the XML data type, combine to make SQL
Server 2005 a great platform for building highly scaleable database
applications. (13 printed pages)
Contents
Introduction
Why Write Asynchronous, Queued Applications?
Why are Queued Applications Hard to Write?
Queuing at the Grocery Store
Queuing at the Airport
Service Broker Programming
Why Do Messaging in the Database?
Scenarios
Conclusion
Introduction
One
of the major pushes in the development of Microsoft SQL Server 2005 was
to enable reliable, scalable, and functionally rich database
applications. CLR (common language runtime) integration made it
possible for developers to incorporate significant business logic into
stored procedures and new functionality in Transact-SQL and XML
expanded the available range of data manipulation and storage functions
available to the developer. The other significant feature in this area
is SQL Server Service Broker, which adds reliable, scaleable,
distributed, asynchronous functionality to database applications.
Why Write Asynchronous, Queued Applications?
When
we talked to people who have successfully built large, scalable
database applications, we found that almost all of their applications
had one or more operations that were executed in an asynchronous,
queued manner. Stock trading systems queue the settlement activity so
it can execute in the background while the front end goes on to handle
other trades. Order entry systems put the shipping information in a
queue to be read later by a shipping system that is running on another
server—often in another location. Travel booking systems make the
actual reservations after the customer has completed the itinerary and
a confirmation e-mail message is sent after the reservations are made.
In all these cases, the responsiveness of the system is improved by
doing much of the work asynchronously so the interactive user doesn't
have to wait for it to complete before receiving a response.
In
most large systems, a careful analysis will uncover parts of the
processing that can be done asynchronously. While the queued parts of
the application don't need to be done immediately, the system has to
ensure that these tasks are done reliably. Service Broker makes the
execution of asynchronous, queued tasks reliable and simple to
implement.
Another advantage to making parts of your application
asynchronous is that this part of the processing can be done when
resources are available. If the shipping tasks in an order entry system
are executed from a queue, the shipping system doesn't have to be able
to handle the peak order load. In many order entry systems, the peak
order rate might be two or three times the average order rate. Since
the order entry system must be able to handle the peak load, at slack
times there is a significant amount of unused processing power
available. If asynchronous tasks are queued during peak times and
executed during slack times, system usage efficiency is significantly
increased.
Why are Queued Applications Hard to Write?
If
asynchronous, queued applications have so many advantages, why aren't
all applications written that way? Because writing them is hard! Many
developers that set out to use a database table as a queue in their
application find out that it's a lot harder than it looks. A table that
is used as a queue has multiple processes simultaneously inserting,
reading, and deleting a small number of records. This leads to
concurrency issues, performance challenges, and frequent deadlocks.
While many developers have successfully met this challenge, it's a lot
harder than it should be. Service Broker addresses these issues by
making Queues first class database objects in the SQL Server 2005
database. Most of the gnarly issues associated with writing queues are
already handled so the developer can concentrate on writing
asynchronous applications rather than writing the queuing
infrastructure. The rest of this section goes into the hard problems
involved in writing queued applications and explains how Service Broker
helps resolve them.
Message Integrity
In
many asynchronous, queued applications, the queued messages are
actually valuable business objects. For example, in an order entry
system, if you put shipments in a queue to be processed later, losing
the queued data means that orders won't get shipped. Many reliable
messaging systems persist messages to the file system so the loss of a
disk drive means the messages are lost. Service Broker messages are
persisted in hidden database tables, so all the data integrity features
that SQL Server offers for protecting your data work equally well for
protecting your queued messages. If you are using database mirroring
for disaster recovery, when your database fails over to the secondary
site, all your messages are already there and the Service Broker
applications continue to run with no data loss.
Multi-Reader Queues
Multi-reader
queues are one of the most effective means of scaling out asynchronous
applications. To demonstrate this, I would like to show how queuing
works in a couple scenarios we're all familiar with.
Queuing at the Grocery Store
Most
grocery stores scale out with multiple queues. Each checkout counter
has its own queue so when you want to buy your groceries, you have to
pick one. If you are like me, most of the time the queue you pick will
be the one behind the cart that requires a price check on every other
item and the customer who is paying with a post-dated third-party
check. Customers that were still at home when you got in line will be
done paying for their groceries before you get to the front of your
line. This demonstrates one problem with scaling out by using multiple
queues—a task queued behind a long running task doesn't get serviced
promptly.
The other major issue with multiple queues is that
adding a queue requires the rebalancing of tasks between queues, which
can lead to a lot of wasted movement between queues. Think of the
grocery cart demolition derby that happens when a new checkout counter
opens.
Queuing at the Airport
While
I hesitate to use airlines as an example of how to do things
efficiently, the way most ticket counters work is a better model for
efficient queuing than the grocery store. This is because multiple
ticket agents service the same queue. There is only one queue so you
don't have to worry about getting in the wrong one. If a particular
passenger takes a long time, the other ticket agents can continue to
service the line (assuming there is more than one ticket agent).
The
single queue with multiple readers also scales without difficulty. If
the line is getting too long, other agents can be added without
disrupting the line. Agents can also leave after servicing their
current passenger without causing too much chaos in the queue.
At
the risk of stretching this analogy too far, we can use the airport
queue to illustrate another common issue with queue-based applications.
Think about what happens when several people in the line are part of a
group. For example, my family is checking in for a trip. The family is
scattered throughout the line because we all arrived at different
times. If we want to sit together, an agent will have to reserve a
block of seats. If my wife and I end up at different ticket agents at
the same time, I might reserve five seats in row 4 and my wife five
seats in row 47. This is one of the key problems with multi-reader
queues—if related messages are processed on different threads
simultaneously, coordination can be difficult. Think of an order header
and an order line being processed simultaneously. The reader processing
the order line would think there was no header for the order because it
wasn't in the database yet. In order to work correctly, the order line
would have to retry the check for the order header multiple times to
ensure that it was delayed and not missing.
An easier way to
make this work would be if the first person to get to a ticket agent
called all of the related passengers to the front so they could all be
processed by the same ticket agent. Service Broker does a similar thing
by putting a lock on related messages when one of the messages is
received. The reader holding the lock can receive any of the messages
in the queue that belong to the same group but no other reader can read
one of them. The lock is held until the transaction it was taken in
commits. This lock is called a conversation group lock. A conversation group
is a developer-defined grouping of related messages. For example, all
of the dialogs necessary to process a particular order—order header,
order lines, inventory, shipping, billing, etc.—might be put into the
same conversation group. When a message from one of these conversations
is read, a lock is placed on the group so that only the reader holding
the lock can process any of the related messages in the queue. It's
important to note that only the messages from a single group are
locked. There may be hundreds of groups being processed simultaneously
but each group is processed on only one thread at a time. The lock
lasts until the transaction that it was created under commits or rolls
back.
The last issue I would like to illustrate with this
analogy is what happens when another message from a conversation group
arrives after the transaction holding the conversation group commits.
In the case of the ticket counter analogy, this would be one of my kids
getting to the airport after all the rest of us have checked in. Since
the initial transaction is over, the last passenger may get processed
by any of the ticket agents. The only way the new agent will know where
to seat this passenger is if the initial agent left a note indicating
where the rest of the party was seated. In the same way, once a
transaction that processes a related group of messages is complete, it
must record the "state" of the conversation so that when the next
message belonging to this group arrives, the queue reader that receives
the message will know where the previous transaction left off. Since
this is a database application, the natural place to store this state
is in a database table. Service Broker provides a handy way to tie the
state of the conversation to the messages in the conversation—the conversation group ID. This is a uniqueidentifier that appears with every message in the conversation group. If this uniqueidentifier
is used as a key in the table where the state is stored, it's easy for
the message processing logic to find the state that is associated with
every message it receives. Also, because only one queue reader can
handle messages from a particular conversation group at a time, the
developer doesn't have to worry about a state row being updated by two
transactions at the same time, thereby causing the loss of state
information.
As you can see from these examples, a multi-reader
queue is a simple, effective way to scale out a large application. The
conversation group locking mechanism provided by Service Broker makes
writing applications that use multi-reader queues as easy to write as
applications that use a single-reader queue.
Distribution
So
far, we have been talking about queues as if they all exist in a single
database. To build the kind of loosely coupled, distributed database
applications required for many business scenarios, we have to expand
this to include queues in many databases scattered across a network
that is communicating through reliable messaging. We need reliable
messaging because it doesn't make sense to use a database to ensure the
integrity of messages in a queue and then take a chance on losing them
when they are transferred to another database.
Service Broker uses a reliable messaging protocol called the dialog protocol
to ensure that messages sent to a remote queue arrive exactly once and
in order. Just as dialogs are bidirectional conversations, the dialog
protocol supports message transfer in both directions simultaneously.
Dialog
messages have a header that ensures that the message gets delivered
securely to the proper destination in the correct order. It contains a
sequence number, an identifier for the dialog it is in, the name of the
service it is being sent to, security information, and some other
information that is used to control message delivery. When a
destination successfully receives a message, it acknowledges receipt of
the message so that the source knows it was delivered successfully.
Whenever possible, this acknowledgement is sent in the header of
another message going back to the source so the number of messages is
minimized. If the source doesn't receive an acknowledgement within a
time limit, the message is resent until it is delivered successfully.
Message
delivery systems often have problems delivering large messages. A
gigabyte-sized message can take several minutes to send, which can
effectively tie up a network connection for a significant amount of
time. If a network error causes the message to be resent multiple
times, network performance can be severely affected. The Service Broker
dialog protocol deals with this issue by splitting very large messages
up into several smaller fragments which are sent individually. If a
network error causes a resend, only the message fragment that failed is
resent. This is the reason Service Broker is able to support 2-GB
message sizes while many reliable messaging systems can only send
messages of 100 MB or less.
Transactional Messaging
"Exactly
once" message processing requires transactional messages. To see why
this is true, think about an application that crashes part way through
processing a message. When the application restarts, how does it know
whether to process the message it was processing when it crashed? The
database may have already been updated with the results of the message
processing already so that reprocessing the message may duplicate the
data. As you can see, the only safe way to handle this is to make
receiving the message part of the same transaction that updates the
database. That way, if the system crashes, both the database update and
the message receive are rolled back so the database and the message
queue are in the same state as they were before the crash.
Because
all Service Broker operations take place in the context of a database
transaction, the transactional integrity of messaging operations is
preserved. A typical Service Broker message-processing transaction will
have the following steps:
- Begin transaction.
- Receive one or more messages from a conversation group.
- Retrieve the state of the conversation from the state tables.
- Process the messages and make one or more updates to application data based on the message contents.
- Send
out some Service Broker messages—either responses to incoming messages
or messages to other services required to process the incoming message.
- Read and process more messages for this conversation group if any are available.
- Update the conversation state tables with the new state of the conversation.
- Commit the transaction.
The
powerful part of Service Broker transactional messaging is that if the
system crashes or the application fails, the transaction rolls back and
everything is back to the way it was when the transaction started—the
state is unchanged, the application data is unchanged, no messages are
sent, and the received messages are back on the queue. This makes error
handling in this type of application very straightforward.
Queue-Reader Management
Message
processing in a Service Broker application starts when the queue reader
receives a message from a queue. Since messages are always pulled from
a queue, the receiving application must be running when messages arrive
in a queue. This is an issue with many asynchronous messaging
applications—how do you ensure that the queue reader will be running
when it is required? The two traditional approaches have been to make
the queue reader a service that runs continuously or to use triggers
that the messaging system fires when each message arrives. The
Microsoft Windows service approach means that an application is running
even when there are no messages to process. The trigger approach can
have performance issues because the queue reader starts and stops
frequently.
Service Broker takes a middle ground approach to queue-reader management called activation.
To set up activation, the database administrator (DBA) associates a
stored procedure with a Service Broker queue. When the first message
arrives in the queue, the activation logic will start the specified
stored procedure. The stored procedure is responsible for receiving and
processing messages until the queue is empty. Once the queue is empty,
the stored procedure can terminate to save resources.
If
Service Broker determines that messages are being added to the queue
faster than the stored procedure is able to process them, the
activation logic will start additional copies of the stored procedure
until either the stored procedures keep up with the incoming rate or
the maximum number of stored procedures that are configured for the
queue is reached. Since the number of queue readers servicing the queue
expands and contracts as the incoming message rate changes, the right
number of queue readers will be running at all times.
Service Broker Programming
Service
Broker programming is designed to be familiar to a database programmer.
Configuring a Service Broker application is done with the familiar
CREATE, ALTER, and DROP data definition language (DDL) statements that
are used to configure other database objects. The commands to create
Service Broker dialogs and to send and receive messages on them are
Data Manipulation Language (DML) extensions to the Transact-SQL
language. The receive command syntax is similar to a select command and
it returns a rowset containing messages, just as a select command
returns a rowset containing rows. Developers accustomed to Transact SQL
programming will find it very easy to learn Service Broker programming.
The client APIs that are used to program the Service Broker are the
same as the APIs used for all database programming—OLE DB, ODBC
(Open Database Connectivity), ADO (ActiveX Data Objects), ADO.NET, and
so on.
Why Do Messaging in the Database?
One
of the questions frequently asked about Service Broker is, "Why build
messaging into the database? Aren't there already enough reliable
messaging systems out there?"
I hope that the information
presented in this paper has helped to explain the decision to build
Service Broker into the database engine, but here are a few more
reasons why it makes sense to have messaging in the database:
- Single client connection for messages and
data. In addition to the unified programming model mentioned in the
previous section, this offers some other substantial advantages:
- An application can receive messages
transactionally when running on any client that can connect to the
database. Many messaging systems only allow transactional receives when
the receiver is running on the same computer as the queue.
- Transactional
messaging doesn't require a distributed transaction or two-phase
commit, unlike message systems that don't store messages in the
database.
- Integrated management, deployment, and
operations between data and messages. All the tools and techniques you
use to safeguard and manage your database data apply equally well to
messages:
- Backing up and restoring the database also backs up and restores queued messages.
- If you use clustering or database mirroring to protect your database from failures, your messages enjoy the same protection.
- Since
queues have relational views available, it's simple to find out what's
happening in the queue. Want to know how many messages are in the
queue? Select count(*) from queue. Want to know which messages haven't
been delivered yet? Select * from sys.transmission_queue. If your
messages contain XML data, you can use XQuery to search them. You can
join messages in queue with state information and even data tables to
find out the complete status of a particular order in your order entry
system.
- There are also some significant performance advantages for messaging built into the database.
- As previously mentioned, two-phase commits are not required for transactional messages.
- The
message updates, the state changes, and the data updates are all logged
in the same transaction log so only a single log write is required to
commit the transaction.
- Reliable message delivery usually
transfers the message from a transmit queue to a receive queue. If
Service Broker detects that the receive queue is in the same database
instance as the transmit queue, the message is put directly onto the
receive queue, thereby saving extra I/O and a transaction commit.
Scenarios
Now
that we've seen how Service Broker works, let's look at a few of the
many possible applications that you can build with Service Broker.
Order Entry
Order
entry is a commonly used scenario because it's one process that just
about everyone understands. Even developers who haven't worked on an
order entry system have at least used one to order something.
The
Service Broker-based order entry system uses queues to connect the
subsystems of the system together. This not only increases parallelism
and improves throughput, but it also provides a great deal of
flexibility in configuration and architecture.
In this scenario,
Service Broker is used to connect four loosely coupled services used to
process each order. As the order entry service inserts the order
headers and order lines into the database, it queues messages to the
billing, shipping, inventory, and credit limit services to finish
processing the order. Service Broker allows all four services to run in
parallel, improving the response time of the system.
Depending
on the business requirements of the system, the order entry service can
either wait for responses from all four services before it returns to
the user, or it can return as soon as the initial transaction is
committed and let the rest of the services execute in the background.
Implementing either behavior is a minor change to the order entry
service. The other services are the same in either case.
Using
Service Broker to link the services also offers several deployment
options. All five services can run on the same server or they can be
split up across up to five servers (or more if services are load
balanced) as required to provide the required throughput. The service
can also be deployed as stored procedures or as external applications.
This flexibility is especially attractive to Independent Software
Vendors (ISVs), who can create a single code base and deploy it in a
wide variety of configurations depending on the customer's performance,
redundancy, and throughput requirements.
Figure 1 illustrates the relationships among the queues and services that constitute the loosely coupled order entry system.

Figure 1. Loosely coupled order entry system
Parallel Stored Procedure
When
triggers were first invented, database companies used to talk about
using a trigger on an inventory table to automatically place a purchase
order when the inventory level dropped too low. While this was a great
concept, very few systems use it because the extra overhead of
executing a lot of code in a trigger makes the database update too
slow. Service Broker makes this kind of application possible by
allowing the trigger to queue the work to be done so that the trigger
can complete and allow the original transaction to commit. The queued
work is then done later in a separate transaction—maybe even in a
different database. SQL Server 2005 uses this model for query
notifications and event notifications.
Service Broker also
allows a stored procedure to start several other stored procedures in
parallel. This can significantly improve response time. For example,
consider a call center application that uses caller-ID on incoming
calls to look up all the information about the customer who is calling
so that the customer service representative has all the relevant
information available. In many cases, this information must be obtained
from different databases on different systems. Using remote queries to
obtain all this information is possible but the response time may not
be acceptable. The Service Broker can queue requests to all the remote
services simultaneously and wait for the results in its input queue.
Since all the requests are processed in parallel, the overall response
time is improved. Figure 2 illustrates this processing.

Figure 2. Parallel stored procedure
Batch Processing
One
of the significant use cases for Service Broker applications is a large
batch-processing system. Most batch processes are made up of many
small, semi-independent processes that must be scheduled and
coordinated. Independent execution of sub processes improves throughput
by allowing each subsystem to execute at its own optimal pace.
In
the example illustrated in Figure 3, the inputs to the batch planning
process—orders, forecasts, returns, etc.—are accumulated in input
queues throughout the day. When the planning engine runs, it reads
inputs from the queues, analyzes them, and then queues requests to the
sub processes that process the plan outputs. The output queues allow
the sub processes to execute independently, in parallel on one server
or on a number of servers. This allows the process to scale out to as
many servers as are required to handle the processing load.

Figure 3. Batch processing
Travel Booking
In
a previous life, I taught a distributed database class for a Master's
degree program. One example I used for distributed transactions was a
travel agent that booked plane reservations, hotel rooms, and rental
cars as part of the same transaction because the customer wouldn't want
to book the hotel if the airline trip wasn't available. Like many
examples, this worked until a programmer from a real travel agency
showed up in class one day. He told me that no real travel agency could
work this way. Airlines won't let anyone hold locks on their
reservation tables and seat maps while an agent looks for a hotel
reservation. Reservations are made based on current availability and if
the seat or room is gone when the real reservation is made, the
customer is called back to make sure the new reservation is OK.
In
the Travel Booking Web Site scenario, bookings are made based on data
in the flight and hotel availability tables. These tables are populated
with information from the airlines and hotels. The information is
updated frequently but it is always a little out of date. The actual
booking doesn't take place until after the customer has completed
booking his trip and possibly even after he has logged off.
This
kind of delayed activity is ideally suited to the Service Broker
architecture. The transaction that records the customer's travel
arrangements commits messages to back-end services that do the actual
booking. The booking services read messages from the queue and process
each booking in a separate transaction. The booking service
communicates through a variety of protocols to the systems that book
the airline and hotel reservations. These communications might include
XML Web services, SNA (Systems Network Architecture), HTTP, EDI
(Electronic Data Interchange), faxes, Service Broker, etc. Because the
input to the booking service is queued, the different latencies of
these protocols aren't a problem. If the queue gets too long, Service
Broker activation can start more queue readers to handle the load. If
the volume reaches a point where the booking servers can't keep up,
more servers can be added simply by adding rows to the routing tables.
When
all the reservations for a particular itinerary have been booked
successfully, a message is queued to the SQL Mail server to send a
confirmation e-mail message to the customer. If one or more of the
reservations fails, a customer service representative is notified to
help the customer rebook the trip.
The Process Updates service
runs in the background and periodically receives availability
information from hotels and airlines. This information is massaged into
a common format and then published to the servers in the Web farm
through Notification Services to update their availability tables.
The
loosely coupled Service Broker architecture allows Web servers to be
added to the server farm by loading the software, configuring the
database, and subscribing to the availability feeds. This can all be
scripted so it happens with a minimal amount of manual intervention. To
add more back-end servers to the booking services, the service must be
installed and configured and then the new server address must be added
to the routing tables in the Web servers.

Figure 4. Travel booking
Conclusion
SQL
Server 2005 Service Broker is a new platform for building distributed
asynchronous database applications. Including an asynchronous, reliable
messaging feature in the SQL Server database makes it possible to build
a variety of database applications that were difficult, if not
impossible, to build before. Service Broker combined with the other
developer features in SQL Server 2005 such as CLR integration and the
XML data type, make SQL Server 2005 a great platform for building
highly scaleable database applications.