Should you have SQL Server Enterprise Edition for Data Warehouse?
Posted on Thu, Jul 14, 2011
I have recently been teaching a lot of Analysis Services classes at QuickStart, and as we go through the class, features and functionality come up that are only available in SQL Server Enterprise Edition. Some of these are nice to have, but others seem more basic or necessary, depending on your circumstances. In these times of economic hardship and tight budgets, many organizations make the decision to go the SQL Server 2008, but tell their DBA/developers: “We’re getting Standard Edition”. Some students are moving from SQL Server 2000 to 2008, while others are getting started on a data warehouse, and the organization is getting ready to start the project. 
Here I’m going to talk only about the first of two SQL Server 2008 Enterprise Edition features: data partitioning. First you need to understand that a data warehouse database is a denormalized database. There is a fact table, which has the measures your business is interested in. A fact table has measures: sales, weight, units shipped, number of visits, length of call, or number of times an ad has run. Measures or facts are given meaning by dimensions. Since we’re looking at partitioning, we’re only interested in the fact table.
Just as with a relational database, a fact table can be partitioned. This is usually done by date, though it can be done in some other way – regions, product lines, etc. Let’s assume we’re doing it by date. Remember that the reason we have a Data Warehouse with a cube, is that the cube has pre-aggregated data, and it has hierarchies of data. Hierarchies can be for products: Product Category (Bikes), Product Sub Category (Racing Bikes), Product (Racing Bike model LA, 42”). Another common hierarchy is time: Years, Quarters, Months, Days or the level of detail that your source data allows you to have.
In order to get aggregations at all those levels, the cube has to be processed. Depending on how many transactions are loaded in to your fact table, it can have millions of rows. And of course, it keeps growing as you add the most recent transactions. Reprocessing the whole fact table can therefore start taking more and more time. But what if you can partition your data.
Let’s say that you have data for this year – 2011. It’s now November. You are only making accounting adjustments to data in October. So all the data for the prior months is stable, and the aggregations should be stable. Similarly for the prior years – those aggregations could be recalculated each time you process the cube, but since the measures or facts are not changing the aggregations don’t change.
With partitioning of your fact table, you can choose to only reprocess the most recent partitions: October and November. In fact, using the new merge functionality, you could have merged the January, February and March partitions into a Q1 partition, the April, May and June into a Q2, and the July, August and September into Q3, while the prior year’s data could be a single partition, and earlier years another single multi-year partition. Now your processing time is greatly reduced.
By having these partitions SQL Server 2008 Enterprise Edition enables you to take advantage of another feature: partition specific aggregation design. In many businesses, more recent data is of the most interest: this year compared to last year, this quarter to the same quarter a year ago, this quarter to last quarter, etc. Given the high interest in recent data, you can design an aggregation scheme with a higher level of aggregation for recent data, and a lower or even zero level of aggregation for old data. Combining this with usage based optimization of aggregations will give you the right aggregations for the most frequently accessed data, and little or no aggregation for less accessed data.
In addition to being able to choose an aggregation design for a partition, SQL Server 2008 Enterprise Edition enables you to choose a storage mode. Your main choices are storing data and aggregations in the cube – multidimensional OLAP (MOLAP), or storing the data (and possibly the aggregations) in the relational database – relational OLAP (ROLAP). For many companies, data older than a few years is accessed using the cube only rarely. Let’s assume access is frequent enough we want to keep the data accessible in the cube, that we’re not going to prune it out of the cube. In that case, we can minimize storage and processing both by using ROLAP storage and a zero level of aggregation for the archived partitions. (We use a zero level since the aggregated numbers are rarely used. ) For current year and prior year data we use MOLAP storage and an appropriate level of aggregation to optimize query responsiveness when users are analyzing the data.
Without fact table partitioning, you will have to reprocess the aggregations for the whole fact table, and have the same aggregation design for the whole fact table. SQL Server 2008 Enterprise Edition allows for fact table partitioning which we can tune our cube so that data that needs to be pre-aggregated for frequent analysis is processed with the appropriate aggregations, while older data, or rarely used data, can be in separate partitions that is stored efficiently in ROLAP storage.
So, one big reason for using SQL Server Enterprise Edition is the functionality of partitioning, which allows you to optimize the design of your data storage, and optimize the design of your data aggregations. In a following article, I will address another feature of SQL Server Enterprise Edition that many organizations find useful: semi-additive measures.
To see the scenario comparison: http://www.microsoft.com/sqlserver/2008/en/us/editions.aspx
To see a complete listing of the features in SQL Server 2008 Enterprise Edition compared to Standard Edition: http://www.microsoft.com/sqlserver/2008/en/us/editions-compare.aspx. Note that there are two categories to look at – Data Warehousing and Analysis Services.
To learn more about SQL Server 2008 Enterprise Edition and our SQL Server training classes, check out our complete course catalog. Or if you have any questions or comments, please reply below.
Thank you!
Saskia Schott, Systems Engineer, QuickStart Intelligence