Extracting and Loading Data in Microsoft SQL Server 2005 Integration Services
- By Hitachi Consulting
After completing this chapter, you will be able to:
Understand and create connection managers.
Extract and load data from different data sources to different destinations.
Use data sources and data source views to extend the functionality of a regular connection manager.
In Chapter 2, “Building Your First Package,” you learned about how to build your first package, and you explored SQL Server Business Intelligence Development Studio (BIDS) and its basic components. In this chapter, you’ll learn how to set up a new Microsoft SQL Server Integration Services (SSIS) project, add a data flow task to extract data from a source, and load the results into a destination. Specifically, you’ll learn how to create and configure a connection manager for Microsoft Office Excel, SQL DB, and flat files. You will also learn how to use BIDS data sources and data source views to extend the functionality of a regular connection manager.
A connection manager is an SSIS object that contains the information required to create a physical connection to data stores as well as the metadata describing the structure of the data. In the case of a flat file, a connection manager contains the file path, file name, and metadata identifying rows and columns. A connection manager for a relational data source contains the name of the server, the name of the database, and the credentials for authenticating access to the data. Connection managers are the bridge between package objects and physical data structures. They are used by tasks that require a connection (such as the Execute SQL task), by data adapters that define sources and destinations, and by transformations that perform lookups to a reference table.
Connection Manager Types
A connection manager is a logical representation of a connection. At design time, the properties of a connection manager describe the physical connection that Integration Services creates when the package runs. For example, a connection manager includes the ConnectionString property that is set at design time; at run time, a physical connection is created, using the value in the ConnectionString property.
Many tasks use connections. For example, an Execute SQL task (that runs SQL statements) requires a connection to a relational database. The sources and destinations in package data flows use connections to extract and load data. Some transformations also require connections to do their work. For example, the Lookup transformation uses a connection to access a reference table to look up and retrieve values. The following is the list of connection managers available in SSIS:
This list represents the typical connection managers. However, SSIS gives developers the ability to write source components that can connect to custom data sources and supply data from those sources to other components in a data flow task.
Creating a New Integration Services Project
The process of creating a SQL Server Integration Services project consists of several steps. The first step is to define a name and location for your project and solution. You can also define a new name for the default package that SSIS creates as part of this initial step. The second step in building an SSIS project is to create connection managers for data source and data destinations. You need to know where your data is stored, what the server name that hosts the data is, and which database or file stores the data. Verify that you have all the required credentials to retrieve that data and store the new data in a destination database or file. The third step in creating your new SSIS project is the creation of at least one data flow, for instance, to extract and load data. To create a data flow task to extract and load data, you will need to specify data adapters linked to the source and destination connection managers you define. You can create more than one data flow in a control flow and, indeed, you can connect them in a logical sequence. You will learn more about how to manage a set of data flows in Chapter 5, “Managing Control Flow.”
Now you will create a new Integration Services project to which you will add a data flow task. You will create a new package to extract data from a source table and load the data to an Office Excel file. These transformation processes simulate data-delivering routines that you might perform when working in a data warehouse or enterprise environment.
Create a new Integration Services project
Start SQL Server Business Intelligence Development Studio. Your screen should look similar to this:
On the File menu, point to New, and then click Project.
Make sure that the Project Type is set to Business Intelligence Projects, and then click the Integration Services Project template.
Type a name for the project: Chap03
Change the location for the project to C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap03 and confirm that the Create Directory For Solution check box is selected. The New Project dialog box should look like this:
Click OK to continue.
In Solution Explorer, right-click the package and choose Rename to change the package name to CopyTable.dtsx. Click Yes when prompted.
Click Yes to rename the package object as well. Now you should see this:
Adding Connection Managers
The second step in building an SSIS project is to create connection managers for data source and data destinations. As described before, connection managers are logical representations of a connection. Connection managers you can add include connections to Oracle, FTP, and HTTP sites; Analysis Services databases; flat files; and more. Each connection manager has its own configuration, depending on the type of connection you want to set.
In the next two procedures, you’ll add a connection manager for a SQL Server 2005 database and another connection manager for Office Excel.
Add an OLE DB connection manager for the is2005sbs database
Right-click anywhere in the Connection Managers pane at the bottom of the Control Flow tab and click New OLE DB Connection.
Click New to define a new connection, click the Provider drop-down list to review available providers, and then click Cancel to keep default: Native OLE DB\SQL Native Client.
Type localhost for the Server Name.
Select Use Windows Authentication.
Choose is2005sbs as the database.
Click the Test Connection button. The following window will appear:
Click OK twice.
Add an Office Excel connection manager to the Employee.xls file
Create a new folder called Data in C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap03.
Right-click anywhere in the Connection Managers pane at the bottom of the Control Flow tab and click New Connection.
In the Add SSIS Connection Manager dialog box, click EXCEL (connection manager for Excel files) and click Add.
Browse to C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap03\Data\.
Type Employee in the File Name box, click Open, and then click OK.
Right-click Excel connection manager, select Rename, and rename the connection Employee.
Creating a Data Flow
An SSIS package needs at least one component in a control flow. This component could be a data flow task or any component from Control Flow Items or Maintenance Plan Tasks in the Microsoft Visual Studio Toolbox. Basically, you build a control flow by adding tasks or control flow components to the Control Flow tab.
The third step in creating your new SSIS project is the creation of at least one data flow, for instance, to extract and load data. To create a data flow task to extract and load data, you will need to specify data adapters linked to the source and destination connection managers you define. There are different ways to create data flows in a control flow. In this procedure, you’ll create a data flow task.
Create a data flow task
Click the Data Flow tab.
Click the message link in the center of the page to add a task.
In the Properties pane, change the Name property to Data Flow Task – Copy Employee.
Adding Data Adapters
Now you are ready to add data adapters to your data flow task. The term data adapter refers to a set of objects that provide the ability to connect to, and interact with, databases, files, and other resources that provide data storage. Data adapters are used to read, insert, modify, and delete data from these various data storage devices. Within a data flow task, data sources and data destinations are specific implementation types of data adapters.
A data adapter is an object that can be used only in the data flow task and requires a connection manager to be established.
In this procedure, you’ll add and map source and destination data adapters.
Add an OLE DB source data adapter
Open the Toolbox and review the available objects.
Drag OLE DB Source from the Toolbox to the grid.
In the Properties pane, change the Name property to OLE DB Source - Employee.
On this step, you’ll add the connection manager to the source adapter.
Add the localhost.is2005sbs Connection Manager to the OLE DB Source data adapter
Double-click the OLE DB Source – Employee data adapter to open the OLE DB Source Editor and click the OLE DB Connection Manager drop-down list.
In the drop-down list, select localhost.is2005sbs, and then click OK.
Click the Data Access Mode drop-down list to see the different access mode.
Select Table Or View.
In the Name Of The Table Or The View drop-down list, select the [dbo].[Employee] table.
Click the Preview button to see sample data of employees, and then click Close.
Map the connection manager to the data adapter
Click Columns from the left panel of the Editor. This action maps columns from the connection manager to output columns of the adapter.
Now you have a data adapter that has been associated with a connection manager and is now ready to be used in a transformation.
Add an Excel Destination data adapter
Open the Toolbox and expand the Data Flow destinations.
Drag Excel Destination from the Toolbox to the grid.
In the Properties pane, change the Name property to Excel Destination – Employee.
Add the Employee connection manager to the Excel Destination data adapter
Double-click the Excel Destination – Employee data adapter.
Click the OLE DB Source – Employee adapter and connect it to the Excel Destination adapter by dragging the green arrow from OLE DB Source – Employee to Excel Destination – Employee.
Double-click the Excel Destination – Employee data adapter to open the Excel Destination Editor and verify that Employee is selected in the OLE DB Connection Manager drop-down list.
In the Name Of The Excel Sheet drop-down list, click New.
Change the name of the sheet to Employee by replacing the current name, Excel Destination, next to the CREATE TABLE statement. Keep the quotation marks and change the size of the LoginID column to NVARCHAR(50).
Click Preview and see that the new table is empty, and then click Close.
Click Mappings in the left panel of the Editor.
Executing the Package
Once you have created a new SSIS project with connection managers for sources and destinations, created a data flow task with source and destination data adapters, and mapped the columns that you want to transfer from your source table to your destination Office Excel file, you are ready to run this package.
When you execute a package, Integration Services validates the package and executes the tasks defined in the control flow. You can change certain properties to optimize the processing time. You can learn more about optimization in Chapter 11, “Optimizing SSIS Packages.” In this procedure, you’ll execute the package you have built.
Execute the package
Right-click the CopyTable.dtsx package and choose Execute Package.
Click the Stop Debugging button on the Debug toolbar.
Using Windows Explorer, navigate to the C:\Documents and Settings\<username>\My Documents\Microsoft Press\IS2005sbs\Chap03\Data\folder.
Open the Employee.xls file to confirm that data appears in the file.
Click the Employee tab, and data should appear.