Managing Analysis Services Processing
The easiest way to make sure that an Analysis Services database is completely consistent with the data warehouse, and with itself, is to process the entire database. When you process the database, you completely discard all the dimensions and measure group partitions within the database and create new ones. This takes place as a single transaction, which means that client applications can continue to use the existing cubes until processing is complete. It also means that if an error occurs at any point during the processing, the entire change is rolled back, again ensuring that the database is internally consistent.
Processing the entire database is the simplest option, provided that you have sufficient time and storage space available. Although processing a large database can consume a considerable amount of time, users can continue to access the existing database while data is being updated in a new version of the database. For example, suppose you have an Analysis Services database that you update every day and that requires 10 hours to fully process. Assuming you have sufficient disk space, you could still choose to process the entire database, perhaps by starting the nightly processing after 19:00 or as soon as new data is available. Users would then have access to the updated database by the next morning. You would not have to exclude users from the system or wait for them to leave for the day.
Processing a large database can also consume a considerable amount of disk space. The Analysis Services server not only creates a second copy of all the dimension and partition files created during the transaction, but it also uses additional temporary files to accumulate aggregations, particularly when creating aggregations from a large fact table.
Consequently, some databases are simply too large to process as a single transaction. Analysis Services provides several options for processing individual components of a database. These options allow you to create and manage extremely large databases, but they also require much more work to provide users continuous access to the cubes and to prevent including invalid or inconsistent values in the database.
Processing a Dimension
When you process a dimension, the server creates a map that includes the path for each member of that dimension. Every cube that includes the dimension uses that map. When you process an existing dimension, the map is destroyed and a new map is created. Destroying the dimension map invalidates all the cubes that use the dimension. When you process an entire database, the dimensions are processed first and then all the cubes are processed as well. But when you process a single dimension, you make all the cubes that use that dimension inaccessible to client applications.
Fortunately, you can make certain changes to a dimension without destroying the existing map. If you don’t destroy the existing map, you don’t invalidate existing cubes. Analysis Services allows you to update a dimension to make changes that don’t destroy the dimension map. The most useful change you can then make is adding new members to a dimension. You can also rename, delete, or re-parent members. For example, you could re-parent the Helmets member by changing its category from Accessories to Clothing.
Unfortunately, when you update a dimension, you delete the aggregations related to that dimension. To avoid deleting aggregations when you update a dimension, you need to change the attribute relationship property RelationshipType from its default value of Flexible to Rigid. Changing this property allows you to update a dimension without deleting all of the related aggregations. However, if this property is set to Rigid and you try to update a dimension, Analysis Services will fail the operation if it detects changes in the dimension table that aren’t allowed. Deleting or re-parenting members of a rigid relationship requires the dimension to be fully reprocessed, which in turn forces each cube using the dimension to be reprocessed.
The Date dimension is a good candidate for rigid attribute relationships, because you should not delete dates, and the month, quarter, and year that a date belongs to shouldn’t change. A product dimension with categories and subcategories may not be a good candidate for rigid attribute relationships because products may be deleted or recategorized.
In the next procedure, you will learn how to change the value of the attribute relationship property RelationshipType from Flexible to Rigid.
Create rigid attribute relationships
Switch to BIDS. In Solution Explorer, right-click Date.dim and select View Designer.
In the Attributes pane of the Dimension Designer, point at the Date dimension. Design Warnings will appear, including the warning “Define attribute relationships as ‘Rigid’ where appropriate.” You should be careful about responding to this warning. Only define attribute relationships as Rigid if you are sure that when dimension members are deleted or re-parented, you will be able to fully reprocess the dimension and all cubes that contain the dimension.
Switch to the Attribute Relationships tab.
In the Attribute Relationships pane, right-click the Date – Month attribute relationship and select Edit Attribute Relationship.
In the Edit Attribute Relationship dialog box, change the Relationship Type to Rigid. Click OK. The Date - Month attribute relationship is now displayed with a solid arrow indicating that it is a rigid attribute relationship.
Repeat steps 4 and 5 for all of the other Date dimension attribute relationships.
Close the Dimension Designer and save the changes to the Date dimension.
In the procedure titled “Insert Source Data” earlier in the chapter, you inserted Mexico into the DimSalesTerritory dimension table, but Mexico has not yet appeared in the Geography MOLAP dimension. In the next procedure, you will update the dimension using SQL Server Management Studio so that it will contain this new member.
Update a dimension
Switch to SQL Server Management Studio (SSMS). In Object Explorer, expand the Dimensions folder, right-click the Geography MOLAP dimension, and select Process. The Process Dimension dialog box is displayed.
Notice that the Process Options value is Process Update. Analysis Services suggests the processing option with the lowest impact that synchronizes the dimension with the database design and the data warehouse. In this case, because the dimension has already been processed and no structural changes are detected in the dimension design, Analysis Services determines that full processing is not required, and suggests an update instead.
The processing option to update the dimension, Process Update, is appropriate when you make changes to the data in the dimension table. However, you will change this value to Process Full to observe the potential impact on other objects in the database. Notice the current setting for Process Affected Objects, visible at the bottom of the dialog box, is Do Not Process. Thus, any objects dependent on this dimension, such as partitions in the cube, will not be processed. You can review the impact of processing the Geography MOLAP dimension on other objects in the database by using the Impact Analysis feature.
In the Process Dimension dialog box, select Process Full from the Process Options list.
Click Impact Analysis. The Impact Analysis dialog box shows that the MOLAP cube and all of its measure group partitions would need to be processed if you were to perform a Process Full on the Geography MOLAP dimension. You can select objects in the Process Object column so that they will be processed immediately after the dimension is processed.
In the Impact Analysis dialog box, click Cancel. To avoid the need to process the MOLAP cube, change the processing options so that the dimension is updated.
In the Process Dimension dialog box, select Process Update from the Process Options list. Click Impact Analysis. The Impact Analysis dialog box shows that when you update a dimension, you don’t need to process any other objects. You are now ready to update the dimension.
In the Impact Analysis dialog box, click Cancel. In the Process Dimension dialog box, click OK. The Process Progress dialog box appears, displaying the steps Analysis Services takes as it processes the dimension. When processing is complete, the Status will change to Process Succeeded. You can view additional information, including the SQL queries used to select data from the dimension table, by expanding the steps and substeps.
The “Process Cube ‘MOLAP’ completed” step makes it appear that the MOLAP cube was processed. However, if you expand the substeps, you will see that no SQL query was executed, meaning that no data was loaded into any of the partitions.
In the Process Progress dialog box, click Close. Now that the Geography MOLAP dimension has been updated, you should be able to see the new member, Mexico, when you browse the MOLAP cube.
Select the MOLAP [Browse] window. On the Browser toolbar, click Reconnect. Mexico doesn’t appear. Because the MOLAP cube was not processed, no values are associated with Mexico. You will be able to see Mexico if you choose to display empty cells.
Right-click the Report pane and select Show Empty Cells. The report should now display Mexico as a member of the Country attribute, like this:
You will now learn about the options available to you when you process a cube, measure group, or partition. You will then complete a procedure that loads the Mexico sales order into the MOLAP cube.
Processing a Cube
When you click the Process command for a cube that has already been processed, the default processing option is always Process Full. When you fully process a cube, Analysis Services checks to see whether any design changes have been made to any of the dimensions used by the cube. If changes have been made, the server processes the dimensions before processing the cube. The server then generates a set of temporary files containing replacement data for the cube. As soon as processing has completed successfully, the server deletes the current files for the cube and renames the temporary files with the permanent names.
The Process Data option for a cube is virtually identical to the Full Process option. In both cases, the server generates all the files for a new cube, swapping the files into place when the processing is complete. The only real difference is that the Process Data option doesn’t check to see whether you have made any changes to the dimension schema. It processes the cube using the existing dimension files.
Another option, Process Incremental, is both powerful and dangerous. The Process Incremental option allows you to process additional fact records into a cube. Analysis Services does not have the capability to identify which fact records are new. When you incrementally process a cube, you must provide Analysis Services with a SQL query that identifies the new records or identifies a table or view that contains the new records. You must be very careful to ensure that you do not include records that have already been processed into the cube and that you do not exclude any fact records. You can’t use Process Incremental to delete or update records that have already been processed into the cube.
Process Incremental creates new cube files—precisely as if you were using the Process Full option. When the processing is complete, however, the server doesn’t replace the old files with the new ones. Rather, it merges the two sets of files, creating a third set of cube files. Finally, it deletes all but the third set of files and renames those files to become the final cube files. One implication of this operation is that for a single cube, the Process Incremental option might actually require more disk space than the Process Full option because it creates three sets of files, rather than just two. A more important implication is that if you use the Process Incremental option using a fact table that includes values already stored in the cube, those values will be double-counted after you process the cube. An alternate option, Process Data, simply clears out the data in the cube structure and reloads data from the fact table as defined for each partition.
In the procedure titled “Insert Source Data” earlier in this chapter, you inserted a sales order from Mexico on December 31, 2011, into the FactResellerSales fact table, but this sales order has not yet appeared in the MOLAP cube. In the next procedure, you will use the Process Incremental processing option to add this record to the cube.
Incrementally process a partition
In the SSMS Object Explorer, expand the MOLAP cube, expand the Measure Groups folder, expand the Reseller Sales measure group, and expand the Partitions folder.
Right-click the Fact Reseller Sales partition and select Process.
In the Process Partition dialog box, select Process Incremental from the Process Options list. In the Settings column, click Configure. In the Incremental Update dialog box, you can select the table or view that contains the fact records that you want to add to the partition or you can enter a SQL query that selects the additional fact records.
In the Incremental Update dialog box, select Query. In the Text Of The Query text box, enter the following SQL query, or you can copy the code from the file C:\Microsoft Press\Analysis Services 2008 SBS\Chapter 14\SQL\Incremental Process.txt.
SELECT * FROM FactResellerSales WHERE OrderDateKey = '20111231'
The Incremental Update dialog box should look like the following image.
Click OK. In the Process Partition dialog box, click OK. The Process Progress dialog box will appear and display the steps Analysis Services takes as it processes the partition. When processing is complete, the Status will change to Process Succeeded. You can view additional information, including the SQL query used to select data from the fact table, by expanding the steps and substeps.
In the Process Progress dialog box, expand all of the steps and substeps and then select the substep that begins SELECT [dbo_FactResellerSales].[OrderQuantity] AS. Click View Details. The View Details dialog box displays the SQL query that Analysis Services used to select records from the fact table. The SQL query you entered in step 4 appears as a subquery.
In the View Details dialog box, click Close, and then click Close in the Process Progress dialog box.
Select the MOLAP [Browse] window. On the Browser toolbar, click Reconnect. The report shows that the sales order from Mexico has been added to the MOLAP cube. The number of items in the newly added order, 122, is displayed at the intersection of Mexico and CY 2011. Aggregate values have also been updated. The total for CY 2011 has increased by 122 to 45,252 and the total for all years and all countries has increased by 122 to 214,500.
As you’ve learned in this chapter, keeping an Analysis Services database up to date can be challenging. You have to consider when to process changes to database objects relative to changes in the data warehouse, how processing one object impacts other objects, and how the processing option you use affects user queries. In the next section, you will learn about proactive caching, an Analysis Services feature that simplifies database processing by managing the details for you.
The data that Analysis Services needs to respond to a query can exist in one of three storage locations, also called caches. Analysis Services can respond to a query most quickly if the data exists in memory. If the data is not in memory, Analysis Services can use the second-quickest option and retrieve the data from MOLAP disk storage. If the data is not available from memory or MOLAP storage, the slowest option is to retrieve the data from ROLAP storage—that is, Analysis Services has to query the source relational database.
A potential problem arises when the data in the source relational database changes. These changes cause the ROLAP cache to be out of sync with the memory and MOLAP caches. To get the three caches back in sync, you must process the dimensions and partitions that use MOLAP or HOLAP storage, and the memory cache must be cleared and repopulated. Because processing dimensions and partitions takes time and computer resources, you need to decide when you want this processing to occur and what cache(s) should be used to respond to queries while the caches are out of sync. Fortunately, Analysis Services provides proactive caching, which simplifies managing these issues.
To configure proactive caching, you need to consider the following questions:
Should processing of the cube occur in fixed intervals of time or only when data in the underlying source has changed?
While the cube is being processed, how should user queries be resolved—from the most recent version of the cube (which might contain old data) or from the underlying source relational database (which contains new data, but might be slower to return query results)?
If processing will be triggered by changes to the source database, how should Analysis Services be notified of a change?
In the next procedure, you will review the available options and configure proactive caching for a partition.
Configure partition proactive caching
Switch to Business Intelligence Development Studio (BIDS). In Solution Explorer, right-click MOLAP.cube and select View Designer. In the Cube Designer, click the Partitions tab.
On the Partitions tab, expand the Internet Sales measure group, right-click the Fact Internet Sales partition, and select Storage Settings.
In the Partition Storage Settings dialog box, drag the slider to Scheduled MOLAP. Click Options. Notice that Proactive Caching is enabled, and the cache is configured to rebuild in one-day intervals. In this case, the MOLAP storage for this partition will automatically update once every day, unless you change the setting to a shorter interval. Unfortunately, you can change only the frequency of the rebuild, not the time of day that the rebuild takes place. The Rebuild Interval can be as often as every second or it can be set as an interval of many days.
By using Scheduled MOLAP, you force a periodic update of the MOLAP cache, whether or not new data has appeared in the warehouse. This may result in more frequent processing of the partition than necessary if data is not regularly added to the warehouse. On the other hand, the partition may not be processed frequently enough if the intervals are too long relative to the frequency of updates to the warehouse.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Automatic MOLAP. Click Options. Automatic MOLAP also enables proactive caching, but updates the cache only when the data changes, instead of on a periodic basis. The default settings start cache processing after a 10-second Silence Interval, with a Silence Override Interval of 10 minutes. The Silence Interval setting prevents processing from starting until data updates in the warehouse have completely stopped for the specified interval. This situation is analogous to waiting for rush hour traffic to end. If your warehouse is routinely updated during the first 10 minutes of every hour, that period of time is the rush period. As soon as 10 seconds (or the amount of time specified by the Silence Interval) have elapsed, the cache is processed.
But what if there are so many updates to the fact table during the rush period that processing cannot start after the expected 10-minute duration? This Silence Override Interval setting tells the Analysis server to go ahead and start processing the cache if 10 minutes have elapsed and data is still being added to the warehouse. The server uses a snapshot to isolate the records that it will include in the update process from the records that are added to the warehouse after processing has begun.
Click the Notifications tab. You use the notification options to specify the conditions that indicate that data has changed in the warehouse. The default value is SQL Server. Any dimension and fact tables used for the current partition are monitored for changes, but you can also specifically identify tables to be monitored. Alternatively, you can choose to have a client application send notification of changes by using the Client Initiated option, or you can choose to poll specific tables by using the Scheduled Polling option. Scheduled Polling is useful if your warehouse is stored in a relational database other than SQL Server. For more information about polling queries, refer to SQL Server Books Online.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Medium-Latency MOLAP and click Options. The settings on this page are similar to the Automatic MOLAP settings, but now the Drop Outdated Cache option is enabled with a default value for Latency of 4 hours. With medium-latency MOLAP, the Analysis Services server will eliminate the MOLAP cache if it hasn’t been processed within the past four hours (or the period that you specify if you change the default value).
When the MOLAP cache is dropped by the server, any queries will be answered from the relational data source (ROLAP cache) until a new MOLAP cache is created. Notice that the Bring Online Immediately option is enabled for medium-latency MOLAP. This option tells the Analysis server that as soon as it drops the outdated cache, queries should be resolved from the relational data source until the new MOLAP cache is available. This setting is useful when processing has started but has not completed within a desired period of time, providing users with relatively current data in response to queries. If you disable this option and you have the Drop Outdated Cache option enabled, queries cannot be answered until the new MOLAP cache is created. If you prefer to continue answering queries with the old cache while a new cache is being built, you must disable the Drop Outdated Cache option.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Low-latency MOLAP. Click Options. The only difference between medium-latency MOLAP and low-latency MOLAP is that the Latency property changes from 4 hours to 30 minutes.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Real-time HOLAP. Click Options. The storage mode has now been changed to HOLAP. In HOLAP storage, only summarized aggregations are stored in MOLAP storage. All queries that require detail data will be sent directly to the relational database.
Because the Silence Interval value is 0, the MOLAP aggregations will be rebuilt immediately when Analysis Services is notified of a change to the data. The Drop Outdated Cache option causes the Analysis Services memory cache to be cleared as soon as there is a change to the data source fact table.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Real-time ROLAP. Click Options. The storage mode has now changed to ROLAP. The Update The Cache When Data Changes option has been cleared, because no MOLAP cache will be created. Similar to HOLAP storage, the Drop Outdated Cache option causes the Analysis Services memory cache to be cleared as soon as there is a change to the data source fact table.
Click Cancel. In the Partition Storage Settings dialog box, drag the slider to Automatic MOLAP. Click OK. You have configured the partition to use Automatic MOLAP proactive caching to manage processing. The partition will automatically be processed starting 10 seconds after the last update to the source database. If 10 minutes pass since the first update to the source database and there has not been a 10-second silence interval, processing will begin.
Close the Cube Designer and save the changes to the MOLAP cube.
You can also configure proactive caching for dimensions. In the next procedure, you will configure Automatic MOLAP proactive caching for the Geography MOLAP dimension.
Configure dimension proactive caching
In Solution Explorer, right-click Geography MOLAP.dim and select View Designer.
In the Attributes pane of the Dimension Designer, right-click the Geography MOLAP dimension and select Properties.
In the Properties window, select the ProactiveCaching property and click the ellipsis button that appears on the right.
In the Dimension Storage Settings dialog box, drag the slider to Automatic MOLAP. Click Options. The same options are available for dimension and partition proactive caching.
Click OK. In the Dimension Storage Settings dialog box, click OK.
Close the Dimension Designer and save the changes to the Geography MOLAP dimension. In the next procedure, you will have an opportunity to see proactive caching in action, so you need to deploy your changes to the Analysis Services server.
On the Build menu, select Deploy AdventureWorks SSAS.
In the next procedure, you will create a report using Internet sales data. You will then execute a SQL script that will insert Japan into the DimSalesTerritory table and add a record for a sale in Japan to the FactInternetSales table. You will then refresh the report and see the order for Japan, even though you haven’t processed the cube.
Insert source data and browse the cubes
Switch to SSMS. In the MOLAP [Browse] window, click Reconnect and then remove Reseller Order Quantity from the Report pane.
In the Metadata pane, expand the Internet Sales measure group and drag Internet Order Quantity to the Report pane totals area. Notice that Japan is not one of the countries listed in the report.
On the SSMS File menu, point to Open and select File. In the Open File dialog box, browse to the C:\Microsoft Press\Analysis Services 2008 SBS\Chapter 14\SQL folder, select Update Warehouse 2.sql, and click Open.
On the Query menu, point to Connection and select Connect. In the Connect To Database Engine dialog box, change the Server Name to localhost and click Connect.
On the SSMS toolbar, click Execute. After the query executes successfully, close the query window. Proactive caching has been configured to automatically process the dimension after a 10-second quiet interval, so you may need to wait just a brief moment before refreshing the report.
Back in the MOLAP [Browse] window, click Reconnect on the Browser toolbar. The report shows that the sales order from the Internet order from Japan that has been added to the MOLAP cube. The number of items in the newly added order, 1, is displayed at the intersection of Japan and CY 2011. Aggregate values have also been updated. The total for CY 2011 has increased by 1 to 30,057, and the total for all years and all countries has increased by 1 to 58,190.