Data Transformation Services for SQL Server 2005-Frequently Asked Design Questions
Huntting Buckley, Donald Farmer, Amit Goyal, Craig Guyer, James Howey
Microsoft Corporation
July 2004
Applies to:
Microsoft SQL Server 2005
Summary:
This paper answers some of the questions most frequently asked about
Data Transformation Services in SQL Server 2005. In particular it
addresses some of the design issues–it is a "why" FAQ rather than a
"how" FAQ. (6 printed pages)
Contents
Why Does the Wizard Fail When Transferring Data for Several Thousands of Tables?
Why Can I Not Copy Objects in the Import / Export Wizard?
Why Can I Not Design Packages in the SQL Server Management Studio?
What Are Data Sources and Data Source Views? How Do They Relate to Connections in My Package?
Why Is There No Microsoft Excel or Microsoft Access Data Source?
Why Is the Fast Parse Option Not Used by Default in the Data Flow?
Why Do We Have Separate WMI Data Tasks and Event Tasks?
Why Is the Expression Language Not SQL or Visual Basic?
Why Are There Functions Missing from the Expression Language?
Why Do I Have to Configure Sort Order at the Source Adapter, Rather than at the Consuming Component, for Example Merge?
Why Do I Have to Deal with Having to Lock Variables in My Package?
Why Is There No Notification to the Component When an Output Path Is Detached?
Why Is There a Windows Service for DTS?
Why Am I Not Able to See Running Packages Under My DTS Server in Mgt Studio?
Why Is Package Logging No Longer in the System Database MSDB by Default?
Why Aren't Changes in the Schema of an XML Document Seen in the Source Adapter?
Why Am I Having Problems Migrating Packages That Reference DTSGlobalVariables Parent?
Why Am I Having Problems Migrating Data Pump Tasks?
Why Can I Not Migrate a Dynamic Properties Task?
Why Aren't Global Variables Available in the ActiveX Script Task?
Why Don't We Include the Data Transformation Task and the Data Driven Query Task with DTS?
Why Is DTS No Longer Client-Side Redistributable?
Why Does DTS Not Support an MSI-Based Deployment?
Why Does the Wizard Fail When Transferring Data for Several Thousands of Tables?
Current
wizard architecture creates a DTS package data flow to transfer data.
There are scalability limits for several thousands of tables. We feel
this is reasonable since package designers are not likely to use
several thousands of sources and destinations in one data flow. There
may be some changes to this in Beta 3.
Why Can I Not Copy Objects in the Import / Export Wizard?
The
Import / Export wizard is designed to enable the user to easily move
data to and from SQL Server. Copying database objects between servers
is really a management operation rather than a data movement operation,
even though data is, in fact, moved.
The Copy Database Wizard in
the SQL Server Management workbench enables users to copy database
objects, and is specialized for doing this.
The DTS Import /
Export wizard, on the other hand, simplifies typical DTS
operations–loading data from diverse sources including flat files,
spreadsheets and OLEDB data sources.
Why Can I Not Design Packages in the SQL Server Management Studio?
The
SQL Server Management Studio is an environment for managing the storage
and execution of deployed packages. It has special features for doing
this, including integration with the DTS Service, and the ability to
enumerate packages on remote servers. But it is not a design
environment.
The Business Intelligence Design Studio is an
environment for designing packages, organizing them in Solutions and
Projects, debugging them and managing source and version control for
multi-user projects.
So in SQL Server 2005, users design packages in a specialist environment and manage them in a separate specialist environment.
More
information on designing and managing packages, and on deploying
packages between environments, can be found in Books Online.
What Are Data Sources and Data Source Views? How Do They Relate to Connections in My Package?
Data Sources and Data Source Views
are not objects within your DTS package, instead they are separate
objects saved in a Data Transformation Project in the Business
Intelligence Development Studio. As they are in the project and not in
a DTS Package, they can be shared between DTS Packages allowing a Data Source View to be defined for a star schema and used consistently by every package which loads that star schema.
Data Sources
define how to connect to an OLEDB or ADO.NET database. They hold the
connection string, and some other properties including a friendly name
for the connection and a description field.
Data Source Views are subsets of a relational schema that is pointed to by a Data Source.
These views can be elaborated with named queries, calculated columns,
virtual relationships and friendly names. For example, when working
with a large schema you may only want to use a small set of tables from
the source; using a Data Source View enables you to define this subset.
Data Sources and Data Source Views are design-time objects which can only be used within BI Development Studio projects. At design-time, a Data Source can be used by one or more DTS Packages in a project, but as at runtime a deployed DTS Package will not have access to these design-time objects. DTS Packages contain Connections, which enable the package designer to manage connectivity to databases, files, and so on. At design-time a Connection can reference a Data Source or a Data Source View, but at runtime the DTS Connection has all the necessary metadata to be used stand-alone. This way the user can design different packages which share Data Source references and Data Source Views, and then run them independently of each other and deploy them separately.
If a DTS Connection references a Data Source, then an object using the DTS Connection can use any Data Source View which also uses the Data Source. In this way, a DTS Package can easily refer to only a small subset of a large schema.
Finally, DTS Connections are more generic than Data Sources–they
can connect to text files, HTTP connections, FTP connections and many
other scenarios where a connection string is needed–not just to OLEDB
and ADO.NET managed providers.
More information on Connections, Data Sources and Data Source Views is available in Books Online.
Why Is There No Microsoft Excel or Microsoft Access Data Source?
Excel
and Access data files can be imported using a properly configured OLEDB
Connection Manager. You can also easily use the DTS Import / Export
Wizard which will build the Connection Manager for you.
For more information search Books Online for "Creating a Package Using the DTS Import/Export Wizard."
Why Is the Fast Parse Option Not Used by Default in the Data Flow?
The
Fast Parse option is opportunistic. It doesn't support locale-specific
parsing or date formats other than year-month-date. It is only useful
for limited data formats, for which faster parsing is possible compared
to the default standard parse. Therefore, users have to explicitly
choose the option based on their data format.
For more information search Books Online for "Fast Parse."
Why Do We Have Separate WMI Data Tasks and Event Tasks?
System
management data can be useful for influencing a DTS control flow. For
example, before running an ETL flow, you can check whether SQL Server
is running, whether there is a C: drive on the machine or if there is
enough disk space available. WMI data task is used to retrieve such
system data.
System management events can be useful for taking
real-time actions within a DTS control flow. For example, only run the
ETL flow when memory usage drops below 50% or only start a data flow
after a new source file is available on the file system. The WMI event
task allows definition and handling of such system events.
These
two tasks are also fundamentally different in the functionality they
provide. In this release we have been more aggressive about separating
tasks based on the features they provide to support granular building
of packages. Another example of this separation is the File System Task
and the FTP Task. The features they provided were combined into one
task in DTS 2000, but in this release have been separated to make the
packages easier to build, understand and support.
Why Is the Expression Language Not SQL or Visual Basic?
Neither
SQL nor Visual Basic meets the requirements of the DTS data flow. For
example, we wanted to support a range of operators such as bitwise
operators manipulating bitmasks that are used in many scenarios as
flags. Also, in order to make best use of memory (and thus get the best
performance) the DTS data flow is strict about data types–this is also
reflected in the expression syntax.
The DTS expression syntax
enables you to design high-performance data flows with complex handling
of strings and data types. If there are functions you need which are
not supported by the expression language, or if you would prefer to use
Visual Basic syntax, you can use the Script Component which enables you
to work with Visual Basic .NET in the data flow.
Why Are There Functions Missing from the Expression Language?
Resources
didn't permit implementation of every function now found in SQL Server
TSQL. We surveyed customers and developers to assemble our initial set,
and will attend carefully to beta feedback as we prepare for future
betas and eventual release.
Why Do I Have to Configure Sort Order at the Source Adapter, Rather than at the Consuming Component, for Example Merge?
The
order of rows is set at the source adapter and cannot be changed by
most downstream transforms (Sort would be one exception). Source
adapters that are cognizant of the order of sourced data can set this
information without user assistance, but in most cases insufficient
metadata is provided to a source component to allow it to set output
column sort information; as such, the user becomes responsible for this
effort.
Why Do I Have to Deal with Having to Lock Variables in My Package?
The
DTS runtime uses multiple threads for execution when appropriate to
improve concurrency and by doing so improve performance and
scalability. In order to avoid having more than one thread attempt to
access the same variable at the same time, all access to variables must
be made "safe" by locking them for your operation; not doing this could
result in packages with a range of responses, from ones that "sometimes
don't work" to ones that return completely incorrect results. This can
be an issue even if the package is completely linear: for example, if
the user were to change the concurrency setting in a loop.
Why Is There No Notification to the Component When an Output Path Is Detached?
DTS
Designer users should be able to delete a path, then reattach it,
without breaking their package or changing its behavior. By providing
an OnOutputPathDetached method, we would allow ill-behaved
components to break this understanding. To satisfy component authors
who wish to provide a variable number of outputs, the DTS object model
provides a property on the output called DeleteOutputOnPathDetached that will auto-delete the output when the path is detached if the property is set to true.
Why Is There a Windows Service for DTS?
The
new Windows service for DTS will allow an operator to see and stop DTS
packages running on a particular server. The service is exposed inside
the SQL Management Studio. Note: The DTS service needs to be started
first as it is off by default.
See the FAQ topic "Why am I not able to see running packages under my DTS server in Mgt Studio?"
For more information search Books Online for "DTS Service" and "Managing DTS Service."
Why Am I Not Able to See Running Packages Under My DTS Server in Mgt Studio?
The
DTS service is disabled by default. This behavior is by design so that
the service is not running unless the system administrator wants it
running. To start the DTS Service right-click on My computer and choose Manage (from either the Start button or Desktop), then inside the Computer Management window click Service and Applications and then Services to see all the Windows Services listed on the right side. Find DTS Server in the list and right-click to choose Start. (You can also double-click to open its properties and change the Startup Type to Automatic if you would like the service to automatically start each time the machine starts.)
For more information search Books Online for "DTS Service" and "Managing DTS Service."
Why Is Package Logging No Longer in the System Database MSDB by Default?
SQL
2005 DTS adds many new options for logging information about your
database. This allows you to define one or more log providers per
package that can be of different types. Log Provider Types include Text
Files, SQL Server, Windows Event Log, and so on. Within your package
you can add the log providers by going to the DTS menu and choosing Logging.
For more information search Books Online for "DTS Log Providers" and "Setting Logging Options in Packages."
Why Aren't Changes in the Schema of an XML Document Seen in the Source Adapter?
Changes
in the schema of an XML file significantly affect the metadata of the
data flow. This requires the user to explicitly change the schema and
take account of the metadata changes.
Why Am I Having Problems Migrating Packages That Reference DTSGlobalVariables Parent?
The
scope of changes incorporated in DTS 2005 required a complete object
model rewrite. As a consequence after migration to SQL Server 2005 DTS,
ActiveX script references to SQL Server 2000 DTS objects accessed via
the DTSGlobalVariables Parent property will no longer be valid.
Why Am I Having Problems Migrating Data Pump Tasks?
The
DTS 2005 Data Flow Task replaces the DTS 2000 Data Transformation
Tasks, offering greatly enhanced functionality. As a result though,
there is no unambiguous mapping between the 2000 and 2005 offerings. To
ensure continued correct operation of legacy Data Transformation Tasks,
they are not completely migrated to SQL Server 2005. The migration
wizard currently wraps such a task in a dedicated SQL Server 2000
package, and invokes it from an Execute SQL Server 2000 DTS Package
Task.
Why Can I Not Migrate a Dynamic Properties Task?
As
this task is directly dependent on the no-longer-present DTS 2000
object model, the migration wizard currently offers only limited
support for this task. Instances of this task are replaced with script
tasks containing comments describing the properties once set by this
task. Manual modification is required to restore prior functionality.
Why Aren't Global Variables Available in the ActiveX Script Task?
Access to global variables is not supported in Beta 2.
Why Don't We Include the Data Transformation Task and the Data Driven Query Task with DTS?
SQL
Server 2005 includes the Data Flow task in place of these two SQL 2000
tasks. The Data Flow task places no limits on the number of sources,
destinations and transforms, unlike in SQL 2000 where the user was
limited to a singe source, destination, and transform per task. The new
object model required to support this expanded functionality could not
accommodate these two less robust tasks.
Why Is DTS No Longer Client-Side Redistributable?
DTS
has been rewritten from a useful utility in SQL Server 2000 to a
feature rich application and platform in SQL Server 2005. This change
comes with a responsibility of providing service and support to DTS
users best handled by managing and licensing the DTS engine
functionality similar to other SQL server applications. Therefore,
while users can easily deploy packages with the new Deployment wizard,
they will be required to run the SQL Server 2005 setup and install DTS
on each machine where they want to run the packages. This will not only
help ensure the user/machine is using a correct version of the DTS
executables but will also assist in updating the software in the future.
Why Does DTS Not Support an MSI-Based Deployment?
MSI-based
deployment has the advantage of a complete stand-alone install, but it
would require the re-distribution of DTS binaries. If an important
security patch is required, it would be impossible to get the fix out
to every deployment target in a managed fashion. The current deployment
utility requires target servers to have an existing installation of DTS
as this is a much more serviceable model for administrators.
For more information search Books Online for "Deployment Utility."