Data Warehouse Infrastructure- A holistic view

By rajgupta

Estimating the infrastructure for a Data Warehouse is a tricky task. This is because one can never take an intelligent guess on the level of usage, number of users, the kind of queries and the kind of usage that Data-Warehouse will be put to.

Just to clarify, Data Warehouse is one among many components of business Intelligencce platform. The other important components are:

  1. ETL tools (which extract and load data into Data Warehouse)
  2. OLAP Server (which picks data from Data Warehouse and load it in analysis friendly multi-dimensional form)
  3. End-user tools (like enterprise reporting, analytics tools, data mining tools…), which sit over OLAP and Data Warehouse to make use of the data stored in them. In other words, they create information out of the data.

Data Warehouse is core, as it provides the sanitized, integrated and consistent data to the end-user tools. These end user tools generally access this data not directly from Data Warehouse, but through OLAP. They go to the data-warehouse, when they need transaction level data. The summary level data is generally available at OLAP level.

Given this back-ground, one has to understand that all query processing in business intelligence does not happen at Data Warehouse level. Out of 1000 users accessing business intelligence platform, only 100 may be accessing data warehouse. Rest of them may be accessing and querying the OLAP Database, or enterprise reports repository.

As you decide upon you Data Warehouse infrastructure needs (including licensing), you have to consider the load of ETL, and the volume of data which you will store. However, for the infrastructure related to the number of users, query load etc, one needs to take into account the entire architecture of OLAP and End-user tools.

More details are available in my recent field tip Data Warehouse Infrastructure Estimate in my portal Business Intelligence and Performance Management.

Leave a Reply