Working with Partitions
Partitions make it possible for you to create extremely large cubes. You can effectively create small, medium-sized, and even remarkably large cubes without using partitions. But partitions are useful when you need to create very large, enterprise-wide applications. For that reason, the ability to manage multiple partitions is available only with Microsoft SQL Server 2008 Enterprise Edition.
Understanding Partition Strategies
Each Analysis Services measure group consists of at least one partition. You design storage modes and aggregations at the partition level. Whether a measure group contains only a single partition or many partitions, the process of designing storage is the same.
One of the benefits of creating multiple partitions is that you can design different storage for different portions of the measure group. For example, say that you have one partition that contains information for the current year and one previous year. You access this information frequently, so you specify multidimensional OLAP (MOLAP) storage with aggregations to provide a 30 percent performance gain. A second partition contains values for the third, fourth, and fifth years. These years are usually accessed only at a summary level (if at all), and the relational warehouse is also occasionally accessed, so you specify hybrid OLAP (HOLAP) storage, with aggregations that provide a 15 percent performance gain. A third partition contains several previous years. Those years are infrequently accessed and the relational warehouse is never used, so you specify MOLAP storage with aggregations to the 5 percent performance level and then archive the relational warehouse to tape.
A second major benefit of creating partitions is that you can process a partition independently of the rest of the cube. As a fairly extreme example, suppose that you have an Analysis Services cube used to monitor manufacturing activities and you want to update the information in that cube every 10 minutes. You don’t have time to completely process the database every 10 minutes. By putting the current day into a separate partition, you can process that partition every 10 minutes, without having to process the rest of the cube. In effect, creating a partition for the current day is like performing an incremental update on the cube, except that you can completely replace the values in that one partition every 10 minutes, guaranteeing consistency with the relational data source.
A client application has no awareness of—let alone control over—partitions used on the server. You can modify the design of partitions without affecting any client application. The most important task when creating partitions is to make sure that each appropriate value from the fact table (or fact tables) makes it into one and only one partition.
When you’re creating partitions, make sure each partition gets unique data. Otherwise, it’s easy to double-count values in multiple partitions. The dangers of creating partitions are similar to the dangers of executing an incremental update on a cube. This similarity is not coincidental. In fact, when you perform an incremental update on a cube, the Analysis Services server creates a new partition, loads values into the new partition, and then merges the two partitions. Analysis Services provides two techniques to avoid double-counting:
Create a separate fact table for each partition.
Specify a filter (a SQL WHERE clause) to restrict rows from the fact table.
In this section, you will create three partitions in the AdventureWorks cube Reseller Sales measure group. One partition will contain data for CY 2011, another for CY 2010, and a final partition will contain data for all prior years. In the first procedure, you will modify the existing partition so that it contains the most current data, and then in the following procedure you will create two partitions to contain data from prior years.
Modify a partition
Switch to BIDS. In the BIDS Solution Explorer, right-click AdventureWorks.cube and select View Designer. In the Cube Designer, click the Partitions tab.
In the Reseller Sales measure group, right-click the Fact Reseller Sales partition and select Properties. In the Properties window, change the name of the partition to Fact Reseller Sales CY 2011.
In the Properties window, select the Source property and then click the ellipsis button that appears on the right. You can ensure that the data in a partition is unique by having each partition select data from a different table or by having each partition select records using a SQL SELECT statement that includes a WHERE clause. The WHERE clause for each partition must filter a unique set of fact records for each partition. In the Partition Source dialog box, you choose whether the partition will be retrieving data from a table or whether it will use a SQL query.
In the Partition Source dialog box, select Query Binding from the Binding Type list. When you select Query Binding, the SELECT statement that the partition is currently using appears followed by WHERE. You need to complete the WHERE clause so that the partition selects data for CY 2011.
In the Query text box, change the WHERE clause to WHERE OrderDateKey BETWEEN 20110101 AND 20111231. The Partition Source dialog box should look like this:
Click Check. This will check the SQL query syntax, but it is up to you to ensure that partitions do not contain duplicated data by setting the WHERE clause appropriately. In the Partition Filter dialog box, click OK. In the Partition Source dialog box, click OK.
Now that you have modified the original partition so that it only contains CY 2011 data, you can create two other partitions to contain the rest of the fact table data.
On the Partitions tab of the Cube Designer, click the New Partition link in the Reseller Sales measure group section.
On the Welcome page of the Partition Wizard, click Next. On the Specify Source Information page, the Measure Group value is Reseller Sales. This value defines the measure group for the new partition. The Look In value defines which data source or data source view contains the source table for the partition. Any tables that match the structure of the measure group’s source table will be displayed. If multiple tables are selected from the Available Tables list, a new partition is created for each table.
In The Available Tables list, select FactResellerSales. Click Next.
On the Restrict Rows page, select Specify A Query To Restrict Rows. In the Query text box, change the SQL SELECT statement to SELECT * FROM [dbo].[FactResellerSales] WHERE OrderDateKey BETWEEN 20100101 AND 20101231. The Restrict Rows page of the wizard should look like this:
Click Check. In the Partition Filter dialog box, click OK. On the Restrict Rows page of the wizard, click Next. If you have a very large Analysis Services database, you can allocate a measure group’s partitions across multiple Analysis Services servers. This allows multiple servers to share the processing and query load. To learn more, see the SQL Server Books Online article “Creating and Managing a Remote Partition.” You can also allocate a measure group’s partitions across multiple storage locations. The Processing And Storage Locations page of the Partition Wizard allows you to configure these options. You’ll accept the default options for this procedure.
On the Processing And Storage Locations page, click Next.
On the Completing The Wizard page, change the name of the partition to Fact Reseller Sales CY 2010.
A new partition needs to have an aggregation design. You can choose to create a new aggregation design using the Aggregation Design Wizard, copy an aggregation design from an existing partition, or create or copy an aggregation design later. For this partition, you will copy the aggregation design used by the Fact Reseller Sales CY 2011 partition.
Select Copy The Aggregation Design From An Existing Partition. This will copy the aggregation design from the partition in the Copy From list. The CY 2010 partition will be assigned the 30 Percent AggregationDesign aggregation design, the same one used by the CY 2011 partition. The Completing The Wizard page should look like the following.
Click Finish. The Reseller Sales measure group now contains two partitions. Notice that both partitions are using the 30 Percent AggregationDesign. In the next step, you will create one more partition that will contain the data from all prior years. Because this historical data will be queried less frequently than the more recent data, you can choose to have fewer aggregations. With fewer aggregations, the partition can be processed more quickly and will use less storage space.
Repeat steps 1 through 9 to create one more partition with the properties shown in the following table.
SELECT * FROM [dbo].[FactResellerSales] WHERE OrderDateKey < 20100101
Fact Reseller Sales History
Design Aggregations Later
In the Cube Designer, click the Aggregations tab.
On the Aggregations tab, right-click 15 Percent AggregationDesign in the Reseller Sales measure group section and select Assign Aggregation Design.
In the Assign Aggregation Design dialog box, select Fact Reseller Sales History from the Destination Partitions list.
Click OK and then click the Partitions tab. The Reseller Sales measure group should now contain three partitions. The CY 2010 and CY 2011 partitions should be using the 30 Percent AggregationDesign, and the history partition should be using the 15 Percent AggregationDesign.
One benefit of having multiple partitions is that you can process the partitions individually, or you can take advantage of parallel processing and process several partitions at the same time. In the next procedure, you will demonstrate these capabilities.
In the Cube Designer, right-click the Fact Reseller Sales CY 2011 partition on the Partitions tab and then select Process. A dialog box may appear with the warning that the server content appears to be out of date. Click Yes and the project will be deployed to the server. When deployment is complete, the Process Partition dialog box will appear.
In the Process Partition dialog box, click Run. When processing is complete, the status will change to Process Succeeded.
In the Process Progress dialog box, click Close, and then click Close in the Process Partition dialog box.
Click the Browser tab of the Cube Designer.
In the Browser tab Metadata pane, expand the Measures folder, expand the Reseller Sales measure group, and drag Reseller Order Quantity to the Report pane totals area.
In the Metadata pane, expand the Date dimension and drag Calendar Year to the Report pane rows area. Right-click the Report pane and select Show Empty Cells.
Because you processed the Fact Reseller Sales CY 2011 partition and did not process the other partitions, the report only displays CY 2011 data.
You will need to process the other two partitions in the Reseller Sales measure group so that you can see all four years of data.
Click the Partitions tab. Click the Fact Reseller Sales CY 2010 partition. Hold down Ctrl and click the Fact Reseller Sales History partition so that both partitions are selected. On the Partitions tab toolbar, click Process. In the Process Object(s) dialog box, click Run.
In the Process Progress dialog box, click Close, and then click Close in the Process Partition dialog box.
Click the Browser tab and click Reconnect. The report is refreshed and now displays data for four years.
Consider the situation described earlier in the section titled “Understanding Partition Strategies,” where you created a new partition each day for a manufacturing cube. Each month you would create up to 31 additional partitions in the cube. Simply keeping the partitions straight would be extraordinarily confusing. One solution is to use only two partitions: one for the current day and one for all previous time. Each night, merge the current day partition with the previous time partition, and then create a new current day partition for the next day. Merged partitions don’t run significantly faster than separate partitions, but they can be much easier to manage.
In the next procedure, you will learn how to merge the CY 2010 partition with the partition containing historical data.
Switch to SSMS. Expand the AdventureWorks cube, expand the Measure Groups folder, expand the Reseller Sales measure group, and expand the Partitions folder.
Right-click the Fact Reseller Sales History partition and select Merge Partitions.
In the Merge Partition dialog box, select Fact Reseller Sales CY 2010 from the Source Partitions list. The Merge Partitions dialog box should look like this:
Click OK. Right-click the Partitions folder and select Refresh. You can see that the Fact Reseller Sales CY 2010 partition no longer appears in the list of the Reseller Sales measure group partitions. The data that was in the CY 2010 partition has been merged into the history partition.
You will need to modify Fact Reseller Sales History partition so that it selects all fact records dated prior to January 2011.
Right-click the Fact Reseller Sales History partition and select Properties.
On the General page of the Partition Properties dialog box, select the Source property and click the ellipsis button that appears on the right.
Change the WHERE clause of the SQL query to WHERE OrderDateKey < 20110101.
Click OK. In the Partition Properties dialog box, click OK. Close SSMS.
Because the partitions have been merged in SQL Server Management Studio, the AdventureWorks SSAS database that is deployed on the Analysis Services server is no longer synchronized with the solution that is currently open in BIDS. Whenever changes are made to an Analysis Services database in SSMS, you should import those design changes into a new project in BIDS. Then use the new project for any future design changes that will be deployed back to the Analysis Services server.
In the next procedure, you will learn how to import an Analysis Services database design into a new project in BIDS.
Import an Analysis Services database design
On the File menu in BIDS, point to New and select Project.
In the New Project dialog box, verify that the project type is Business Intelligence Projects. In the Templates pane, select Import Analysis Services 2008 Database.
Change the project name to Import SSAS DB and change the location to C:\Microsoft Press\Analysis Services 2008 SBS\Chapter 14. The New Project dialog box should look like the following image.
When you click OK, the AdventureWorks BI solution will be closed and the Import Analysis Services Database Wizard will start.
Click OK. On the Welcome page of the Import Analysis Services Database Wizard, click Next.
On the Source Database page, enter localhost in the Server text box and then select AdventureWorks SSAS from the Database list.
Click Next. On the Completing The Wizard page, click Finish. You should look at the partitions in the Reseller Sales measure group to confirm that the changes you made in SSMS have been imported into this new project.
In the Cubes folder of Solution Explorer, right-click AdventureWorks.cube and select View Designer.
In the Cube Designer, click the Partitions tab. The Reseller Sales measure group should contain only the Fact Reseller Sales CY 2011 and Fact Reseller Sales History partitions.
In this chapter, you learned that Analysis Services has three storage modes: ROLAP, HOLAP, and MOLAP. ROLAP mode leaves dimension, detail, and aggregate data in the source relational database; MOLAP stores dimension, detail, and aggregate data in proprietary data structures on the Analysis Services server; and HOLAP leaves detail data in the source relational database and stores aggregate data on the Analysis Services server. Cube data is stored in measure group partitions. A measure group can have multiple partitions. Partitions in a measure group can have different storage modes and can use different aggregation designs. You can add, delete, or merge the partitions in a measure group.
You also learned that when the data in a source database is modified, you need to reprocess MOLAP dimensions and partitions to bring those changes into an Analysis Services database. The most simple and reliable method is to fully process all affected dimensions and partitions. However, if the source database is large, time might not allow for a full process. Analysis Services allows you to update dimensions and to use incremental processing to add new source data into a measure group partition. Keeping a source database and an Analysis Services database in sync can become quite complicated. Analysis Services provides proactive caching that allows you to automate Analysis Services processing.