Data Archiving On Demand
Microsoft’s Data Transformation Services puts your users in control of their data’s destiny by taking the bloat out of databases
Just about every organization eventually faces the problem of oversized databases. As they grow, these monsters tend to slow down the applications that rely on them. The obvious solution is to purge unused or outdated data.
The obvious isn’t always easy. It’s not always a simple matter to convince end users responsible for the data to purge anything; there’s always the fear that as soon as something is purged, someone is going to need it. You could always try archiving the data onto some type of removable media, but retrieving the data later may not be that easy—not to mention that getting to the data as fast as the user wants is usually impossible.
One way around this is to move data to an archive database, removing it from the primary database in the process, but keeping it available for user access when needed. This type of task has often been a database administration-intensive one, however. Someone has to define the SQL for the extraction of the purged data into a form useful for archival purposes, and manage the archive database once it’s been created. And depending on how the data is going to be used—for OLAP analysis in a data warehouse or data mart, or just for historical records—the data will have to be transformed into a format (and moved onto a server platform) appropriate for the task.
Microsoft’s SQL Server 7.0 offers a nice feature called Data Transformation Services (DTS) that allows you to import or export data between heterogeneous data sources using an OLE DB-based architecture. DTS Designer is a graphical design environment that will allow you to further define the workflow. A wizard inside SQL Server’s Enterprise Manager guides you through the process of creating a transformation package, in which you define the tasks necessary to move data between sources and the flow between these tasks. You can have your package perform as many unique tasks as necessary, such as sending an e-mail to someone, letting them know the process is complete. The package may be executed immediately, scheduled for execution, or saved for later use.
All this is great if you are a database administrator, or someone with direct access to your SQL server. However, end users, and even most programmers, are at the mercy of a (usually overworked) DBA to set up the packages for them. Typically, issues such as these fall to the back of the DBA’s to-do list—until the database gets so big that no one can reasonably use the system until data is purged.
Fortunately, DTS offers the ability to dynamically create packages using Microsoft’s Visual Basic, C++, Visual Basic Scripting Edition, or Microsoft JScript. This allows you to create a custom application that builds and executes DTS packages. The Custom Purge project provides a generic VB6 application (see Figure 1) that allows end users to review data to be purged, archive it to SQL Server, and then purge it from the source database. The source data can come from any heterogeneous data source that you can connect to with a Data Source Name (DSN).
Setting up the application
The benefit of setting up a custom program using DTS objects is the ability to minimize the time needed to set up the database. For this project, you will need only to create a database on SQL Server that you wish to purge into. Once the database is established and rights are assigned, the DBA does not need to be involved. For our example, create a database named PubsArchive. If you are not able to create a new database on your SQL Server, then find a current database that you have rights to create tables for. Substitute this database name wherever the name PubsArchive is referenced.
Then you’ll need to create a DSN that points to the source database. Do this by going to the ODBC Data Source Administrator through Control Panel. For our example, add a new Data Source, named Pubs, which points to your SQL Server’s Pubs database. Of course, you can go back later and create any ODBC-compliant data source that you want to purge from. For instance, you may want to purge legacy data from an Informix database or even from a Btrieve database. This is fine as long as you have a valid ODBC driver that can be used to connect to your database.
There are certain DTS files that you will need to install on your machine before executing the source code. The files are included with the SQL Server 32-bit tools and are installed automatically when you install SQL Server or the client installation. They are typically installed into the MSSQL7Binn directory. Verify that you have access to dtspkg.rll, dtspump.rll, and sqldmo.rll. The third file referenced is used to access SQL’s Distributed Management Objects (DMO), which will be used to determine if the destination table already exists.
Reviewing the functions
I’ve written a sample “Custom Purge Utility”, available for download off Enterprise Development’s website. This utility provides three main functions: Review Data, Archive Data, and Purge Data. The first function, Review Data, is a simple routine that allows the user to pull selected data into a grid. Users will enter a valid table name into the Table to be Purged text box.
Once the user has selected the necessary criteria, he or she will click “Review Data” to load the grid. Data will be loaded into the grid dynamically and the Fields collection will be used to display the field names at the top of the grid. Since some queries may return very large result sets, the program will ask the user if he or she wants to continue loading after it hits 1,000 records. This should provide enough records to spot-check data to be archived. If the result set attempts to load too many records into the data grid, the program may run out of memory. This can be changed, however, depending on your company’s individual needs.
The main routine for this program is executed when the user clicks “Archive Data”. It will first populate a string variable with a dynamically built SELECT statement based on the parameters entered by the user. The string is built with the same routine used to populate the data grid so as to ensure the data queried is the same that was reviewed. You may choose to customize the application so that a DSN-less connection may be used for the source instead. Since the application assumes we are archiving data to SQL Server so it may easily be retrieved, we will value the destination connection with DSN-less parameters.
If the destination table does not already exist, then we will open an Active Data Object (ADO) connection and initiate a transaction. If the table does exist, however, then we’ll first want to execute a SQL statement that deletes any records previously inside the destination table. In this project, I chose to delete the records using an ADO connection inside of a transaction.
Another alternative would have been to delete the data in the destination table using a DTS task. Figure 2 provides a snapshot of what the package would look like if it had been designed using SQL Server’s DTS Designer. As mentioned earlier, DTS enables you to import data inside a process called a package. Each package can contain one or more tasks that are executed as steps.
A package contains three types of objects: connection, task, and step objects. The connection object is used to define both the OLE DB source and destination databases. The task object defines the actual thing to be done, such as executing a SQL statement or ActiveX script. The step object defines the sequence in which the task objects are executed. Precedence constraints can be used to determine each step’s dependence on another step.
In this project, a precedence constraint was not used. A precedence constraint would have been used if I had chosen to delete data from the destination table using a DTS task as opposed to an ADO connection. The package is executed, and any errors will be written to an Exception file named DTSLog.txt that is located in the directory where the application resides. If a failure is encountered and the destination table already exists, then the ADO transaction created earlier will be rolled back. Otherwise, the application will indicate that the package executed successfully and should write a status message to the NT Event log.
Once the archive has been executed successfully, the user is able to purge selected data from the source database.
Running the application
When executing the application for the first time, you will be prompted to provide certain settings. The settings are used to define the locations of source and destination databases. For this project, enter the following values (see Figure 3):
Enter “authors” in the Table to be Purged text box. This will cause the Field Name combo boxes to be populated with valid field values. For this project, enter query parameters such that the au_lname field is equal to the value ‘White’. This should cause one record to be returned.
After archiving the data, a message should populate the status bar indicating that the DTS archive was successful. In this example, the purge should fail since the deletion of this field violates a constraint in the Title Authors table. To test the program with another source and destination database, click Settings and specify different values.
The code in the Custom Purge project provides a fairly generic program that you may have to customize only slightly. However, if you wish to create your own customized application, refer to the samples provided with SQL Server. The samples have been developed using both Visual Basic and Visual C++ and can be found in the Mssql7DevtoolsSamplesDTS directory on the SQL Server version 7.0 CD. Additional sample programs are installed in a self-extracting zip file, DTSDemo.exe that is usually found in C:Mssql7DevtoolsSamplesDTS.
Another invaluable resource for developing your own custom DTS applications is the ScriptPkg tool. This tool can be used to generate a script for a pre-developed data transformation. You can first create a package using the DTS wizards and Designer utility. Once the package has been established and tested, the ScriptPkg tool can be used to generate some of the actual code needed. This is especially important when you are importing data from another OLE DB provider other than SQL Server. For more information about using this tool, see Microsoft Knowledge-Base article on ScriptPkg.
The COM interface provides a wonderful method of allowing any user to leverage the power of DTS. Custom applications can be developed that allow you to create simple or even complex packages on the fly. The task of handling data archival and purging is not a glamorous task. Using this method you can return the job of managing the data back to the people responsible for it. Best of all, data is stored in a SQL Server database where it can later be queried easily.
Sara Rea offers tips on combating database bloat. Rea specializes in ActiveX technologies. Reach her at [email protected].
|Figure 1: Urge to purge. The Custom Purge Utility provides three main functions: Review Data, Archive Data, and Purge Data.|
|Figure 2: Data by design. Here’s an object package designed using SQL Server’s DTS Designer. DTS lets you import data inside a package, which can contain one or more tasks that are executed as steps.|
|Figure 3: Set the table. When executing an application for the first time, you’ll be prompted to provide settings. These are used to define the locations of source and destination databases.|