Archive for the ‘Uncategorized’ Category

Using Synonyms and Views

October 18, 2008

Using Synonyms and Views to better handle the database level changes

This tip is equally applicable for OLTP scenario.

Synonyms or Alias

Synonym is like an alias for a table name at a physical level. The idea is that the query which an end-user tool fires to access the data, should be ideally using a synonym than the actual table name. A synonym (or alias) is just like the name of the book-mark in your internet browser, while the underlying link is pointing to the actual URL. You can maintain the same name of the book mark, while changing the underlying hyperlink.

Similarly, if you use synonym for providing access to the users and end-user tools, the advantage is that you can rename the underlying tables (as your dimensional model evolves), while still maintaining the same aliases or synonym. You can create synonyms for tables, views, stored procedure, function and some other objects. The idea is to have user-friendly synonyms and also to have a better change management.

Views

A view is a virtual table on which the end-user tools or users can run queries. The views make it easy and user-friendly access to the database. Views are typically a single table made out of multi-table, complex queries. For example, you may have a snow-flake schema, which you may convert into a star-schema by combining the dimensional table and its extension.  OR you may like to create a single virtual table combining the fact table and dimensional table.

Content of this page is referred from using synonyms and views from Business Intelligence and Performance Management Institute.

Don’t be guided only by the business requirements for your Business Intelligence

October 15, 2008

There is much talked about business requirements gathering phase in your BI initiatives. As part of this BR phase, you go to the users and ask for-

  1. Their information requirements- Reporting as well as analytics needs.
  2. Look at the current set of reports, dashboards and scorecards.
  3. Check on any additional potential information needs they might have

This phase will provide you with inputs for modeling and designing your BI platform. The tip here is that this input will not be the single driving factor for you BI plans. The reasons are that information requirements keep on changing. By the time you deliver the data warehouse as per the business requirements, you may realize that half of them have changed. Therefore, one has to model and design your BI environment, so that it can-

  1. Accommodate wider set of information needs, without changing the model
  2. Be Flexible in such a way so that you can add new dimensions and measures without re-writing your data-warehouse.
  3. Respond quickly to the changed information requirements.

This is achieved by following means

Educate your users on the different styles of business requirements

Instead of putting requirements only in form of exact reports and MIS (which should still be done), users can articulate their requirements in form of-

  1. Which dimensions they need to have the ‘cuts’ and reporting?
  2. Which measures they want to be reported?
  3. What are the business hierarchies they want to have their information on?
  4. What is the level of detail (mainly linked to a hierarchy level); they need to have the information on?

Users need to be made aware that once you have a schema, you can churn out million different reports or analytic views. The idea is to define that schema. Point to note is that we are not doing a holistic modeling here.

Use conformed and foundation dimensions

I think we have beaten this subject of foundation dimensions & measures to death (we can’t help it as it is core to your BI success). Even if users are not asking for all attributes of customer (say) dimension, you should have a holistic customer dimension from the day one and populate it as well. This becomes easy, if the entire data is coming from a single customer master.

NOTE- Foundation dimensions and measures are important, but if they are coming in the way of your first set of data-marts, you may skip them so to stay with-in the given investment boundaries. For example, if you have to make an ETL routine to pull a holistic customer data (to populate the foundation dimension) from five different sources, you may skip it, as sponsors will not buy this kind of spend for ‘future flexibility’. However, if you are creating an enterprise data warehouse, don’t skip the foundation dimensions and facts.

Use domain expertise to guess future information requirements.

There are some standard measures and reports which are used in a given industry or function. Even if your users have not asked for it, we would suggest that you should include it in your scope.

The content for this post is referred from the page don’t rely only on business requirements for BI modeling from Business Intelligence and Performance Management Institute.

Big-Bang Enterprise Data Warehouse is a pipe-dream

October 10, 2008

You might have seen product providers’ sales team as they walk into their clients board-room, open-up their presentations carrying the first slide on what we call a ‘dream DW architecture’. A nice assortment of polygons and arrows talk about the every possible data source being integrated into a central data warehouse, which is provider of accurate and reliable information. This is followed by the ‘ideal’ benefits of an enterprise data warehouse with statements like ‘right information at the right time at the right place’, ‘Get all what you need to know on a click of a button’…etc

What intriguing is that why service and product providers make this pitch, when they perhaps know best that building a true enterprise data warehouse in one-go is next to impossible for any medium or large size organization. Surely intent is positive! they might be selling the ‘point of arrival’, but at the same time, they need to be educating their clients that this POA may be many years away.
Let’s talk about what a big-bang Data-Warehouse may involve:

  1. Having all possible business dimensions
  2. Addressing all possible business themes
  3. Picking data from all source systems
  4. Creating an enterprise data warehouse metadata repository…etc…

You must refer Data warehouse has unique challenges, to understand uniqueness of DW initiatives.

  1. Fuzziness of business requirements
  2. Lack of focus from the users, as life can go on without DW.
  3. Unpredictable use of DW environment…etc…

Set a right foundation and go phase wise:

If you feel that what we are talking make sense, one can adopt million different ways on how to phase out a DW initiative. This depends a lot on your Organization Readiness, how much you have invested into DW initiatives already, the level of success (or failures), you have created in building DW etc…What we have seen in our experience is that:

  1. There are million different business intelligence ‘current states’
  2. There are generally similar issues related to BI. If we talk to 50 different companies on their BI issues, there is more than 75% chance that they will come out with the similar set of answers.
  3. The solutions to these similar problems can be very different depending on an organization.

One more aspect one has to be clear about it that BI point of arrival (POA) is generally the same (enterprise integration of data, virtualization of information, high-availability of information…), in terms of BI capabilities. However the actual architecture implementation could be very different. It depends a lot on the journey to POA and also on how large or complex an organization is. Very few architectures will be anywhere close to (even in POA) the dream DW architecture

Coming to DW Journey to Point of Arrival

My professional network keeps on asking me, if there are standard rules for designing your journey to the point of arrival? Here is my ‘very simple’ tentative answer:

  1. Start with stand-alone data marts, and let users taste the blood.
  2. Combine the data marts at a functional level. Fix your functional level agenda first.
  3. Move gradually towards an enterprise level integrated data warehouse. You can refer integrating data-marts

There is a  rider to the above approach. It is must that you create a central governance structure around the Business Intelligence. You may call it Business Intelligence Competency Centre or something else. This central practice can set common standards, disciplines, oversight to ensure that you are not creating the stuff, which you may have a hard time integrating later. Moreover, too many independent data-marts can overload your source systems, if they are not managed well. This central practice can help you a lot in terms of providing inputs on how to have efficient design etc…

The contents of this page are referred from Big-bang Enterprise Data Warehouse is a pipe-dream

Maximizing Data Warehouse ROI and Usage?- Keep Most Granular Data

July 25, 2008

This is one of those common-sensical  approaches to get the maximum out of your Data Warehouse. As you make your investments in Data Warehouse environment, you can enhance your ROI, by using this environment for diverse applications. Having most granular (or detailed) transaction level data is core to broad-basing the Data Warehouse applications.

Traditional use of Data Warehouse environment for the purpose of back-room analytics, is no longer applicable. It can now serve as a single reference source for any of your BI related information needs. Some examples are:

  1. Summary Analytics
  2. Enterprise Reporting (needs transaction level data)
  3. Performance Management
  4. Data Mining (needs transaction level data-mostly)
  5. Business Modeling (sophisticated models need transaction level data)
  6. Operational BI like single customer view for telemarketers, customer support (needs transaction level data)
  7. Ad-hoc queries by operational staff (needs transaction level data)
  8. Root Cause investigation for issues, where you need to drill down to problem areas (needs transaction level data)
  9. Business Applications with Embedded BI modules (may need transaction data)

There are reasons for using Data Warehouse as a single reference information source. This helps you to:

  1. Maintain consistency: If your summary MIS/analysis and enterprise reports are coming from different sources, you will struggle to keep the numbers in synch.
  2. If your production data needs an offline fix (like standardizing customer and product IDs), its better to do that data-fix in one place. If you have separate enterprise reporting and analysis platforms, you will need to do that data transformation at two places, instead of one.
  3. Data Auditability: A single information reference point having detailed data will provide a good audit-trail of your summary transactions/analysis.
  4. ETL synergy: If you have diverse systems, and you want to have some level of information integration, its better to do it at one place. Doing ETL for summary data warehouse and a detailed reporting database, will almost double your efforts.
  5. Overall platform ease: You maintain only one information infrastructure (administration, scheduling, publishing, performance tuning…).
  6. Ease of Change Management: Any change in your information requirements, or changes in your source systems will be managed and done at one place.

Then the question is- If this approach is so good, why many companies use Data Warehouse only for summary data? Keeping granular data in data-warehouse has its own challenges and has its own demands:

  1. Brings forth the real issues with transactional data: In summary data warehouses, you can ignore some of the transaction level data issues and do some patch-work to ensure that aggregated data has a level of acceptable quality. Bringing in granular data, will need more incisive surgery on your data issues. This will extend the time of implementation.
  2. ETL efforts go up: This is related to the first point. Your key plumbing task in DW will become larger and more complex.
  3. Existing robust and stable reporting and querying platforms: Why to fix, which ain’t broken? etc…

My answer to the above reasons will be that while you can be flexible for functional level data marts, you should go for granular data from Day 1 for your enterprise data warehouse. If you are in a hurry, create a quick-fix data mart on the side. For an enterprise level data warehouse, you can start with few high-priority business themes, but ensure that they are designed for long-term usage (i.e. granular data).

You can refer field tips for BI and Data Warehouse in my portal Business Intelligence and Performance Management.

What is Metadata Management?

July 24, 2008

Dear Readers,

Generally my posts are complete by themselves. However, this is a long subject, and that’s why I am not able to write whole of it in my post. Please refer What is Metadata? in my portal Business Intelligence and Performance Management. I have tried to give the definition of Metadata in its practical form, as well as its differences from Data Warehouse, Business Intelligence and Master Data Management.

Please share your feedback and question you may have related to this page.

BI and Data Warehouse Requirements-Your Home Work

July 23, 2008

In this post, I assume that people who are gathering business intelligence requirements are business domain experts and knowledgeable dimensional modelers (logical and not physical techie modelers).

The challenges in gathering business intelligence requirements are:

Business folks are too busy for BI: In a transactional system, business owner has a fairly good idea on what functionality, control or feature they are looking for. This is because it is linked to their day-to-day operations and survival.  Life still can go on without BI for another day.

Business does not know what it wants- there are too many choices: In OLTP, one has mostly linear and limited options, one can choose from. In case of BI, you can have hundred of different information requirements.

Crises Driven: As its difficult for business to pin-down what it wants from information perspective, the information requirements skew towards immediate need.

One way to manage this challenge is to do an intensive home-work, before you meet the business owner. You home work should include going through:

  1. Strategy blue-print including SWOT, competition land-scape etc..
  2. Enterprise Business plan
  3. Functional Business Plan
  4. Minutes of the management meetings
  5. Goal-sheets of the key stakeholders
  6. Current MIS and Reporting Packs

This will give you a good idea on the business priorities and issues. In other words, you will need to carry with you,  the business landscape at functional ( for which you are building a Data Mart) and at an enterprise level. It does not mean that you would be leading the interviewee. This will enable you a better questioning and conscious-keeping for comprehensiveness. The biggest advantage is to be able to related to the interviewee. If interviewee feels that you know his world, he will open up much more.

NOTE- the above home-work may not only be done by the interviwer but also for the interviewee. While the business user is aware of his world and priorities, going through the above-said documents will re-inforce and systematize his thinking.

For more details around the business requirements, you can refer Data Warehouse Business Requirements Gathering in my portal Business Intelligence and Performance Management Institute.

Universal Data Warehouse Dimensions- Is it possible?

July 22, 2008

I have talked a lot about having universal & foundation dimensions in the data warehouse in my portal Business Intelligence and Performance Management Institute. In brief, Data Warehouse is typically a dimensional model, which is different from relational/OLTP model. In dimensional model, you have dimensions and their attributes in their own tables linking through a central ‘Fact Table’, which carries all the numbers (or measures).

Now, the idea is that one should have one standard dimension (and attributes) for a business entity. Lets say that there are 20 star-schemas (or cubes in OLAP lingo) in a data warehouse, and ten out of them are using customer dimension. The concept of foundation dimension is that all these ten cubes should be using an identical customer dimension. This helps in doing analytics across multiple cubes and significantly reduces the change and development effort.

One reality check one has to take is that for large organization, it is extremely difficult to have a single customer dimension. This is because they may be operating very diverse markets or products. Getting all the business heads to agree to a single dimension may take years. Many of my clients have asked on if these ’standard’ dimensions are practically possible.

My answer to them is that yes its possible, if we do apply the following techniques:

Create Super-Sets: Take a convenient path of creating ’super-set’ dimension, which can absorb diverse entitites. For example creating a super set customer dimension, which can absorb retail, corporate and group-customers.

Create different foundation dimensions:  You do not need to create a single customer dimension for the whole organization. You can create ‘retail customer’ dimension, ‘corporate customer dimension’ etc… Both these dimensions will be different, but they will be universal within themselves. The assumption is that you will not be required to do cross-navigation across retail and corporate customers.

Start with high priority Dimensions: Instead of trying to make universal dimensions for all entities, identify top 5-6 which are most critical. Pareto principle equally applies here. The critical dimensions can be customer, vendor, product, location, sales channel etc…

Data Warehouse Infrastructure- A holistic view

July 16, 2008

Estimating the infrastructure for a Data Warehouse is a tricky task. This is because one can never take an intelligent guess on the level of usage, number of users, the kind of queries and the kind of usage that Data-Warehouse will be put to.

Just to clarify, Data Warehouse is one among many components of business Intelligencce platform. The other important components are:

  1. ETL tools (which extract and load data into Data Warehouse)
  2. OLAP Server (which picks data from Data Warehouse and load it in analysis friendly multi-dimensional form)
  3. End-user tools (like enterprise reporting, analytics tools, data mining tools…), which sit over OLAP and Data Warehouse to make use of the data stored in them. In other words, they create information out of the data.

Data Warehouse is core, as it provides the sanitized, integrated and consistent data to the end-user tools. These end user tools generally access this data not directly from Data Warehouse, but through OLAP. They go to the data-warehouse, when they need transaction level data. The summary level data is generally available at OLAP level.

Given this back-ground, one has to understand that all query processing in business intelligence does not happen at Data Warehouse level. Out of 1000 users accessing business intelligence platform, only 100 may be accessing data warehouse. Rest of them may be accessing and querying the OLAP Database, or enterprise reports repository.

As you decide upon you Data Warehouse infrastructure needs (including licensing), you have to consider the load of ETL, and the volume of data which you will store. However, for the infrastructure related to the number of users, query load etc, one needs to take into account the entire architecture of OLAP and End-user tools.

More details are available in my recent field tip Data Warehouse Infrastructure Estimate in my portal Business Intelligence and Performance Management.

Don’t wait for Data Warehouse- Start with a Data Mart

July 15, 2008

An organization, how-much-so ever big or small it is, should not start with a big-bang enterprise data warehouse. It should start with few high importance Data-Marts. The reasons are as follows:

An Enterprise Data Warehouse is a long term commitment: This means that there are many imperatives (or foundations), which are key for a Data Warehouse. The examples of these imperatives are foundation or conformed dimensions, fine-grained granular data, comprehensive star-schemas etc…An organization will need high level of readiness and investments to build these foundations. These foundations (though great for data marts as well) can be compromised for initial set of Data-marts.

Business Learning- Initial set of data-marts will provide great learning, less on the IT side and more on the business side. Here are the set of learnings from business side:

  • • Creating business themes
  • • Building Data-Mart Business Requirements
  • • Building Dimensional Model
  • • Testing of Data-Mart
  • • Taking business decisions around the extraction and transformation
  • • Generating the information out of the Data-Mart through end-user tools (like reporting and analytics application)

Examples of IT Learnings:

  • • Extraction, Transformation and Loading design
  • • Processing Load Management
  • • Handling Data Explosion (data goes up exponentially as you add sparse fields- where most of the records are blank)
  • • Change Management (end-to-end impact analysis if you make a change in the Data Mart Model)

Show-case for sponsors: A successful Data-Mart may make the sponsorship of a Data Warehouse much easier.

Quick-hit: A Data-mart is a quick hit and gives earlier gratification to business.

Non-Disruptive: It does not take away the attention of an organization from other big things.

You can refer my portal Business Intelligence and Performance Management for more details.

IBMs Data Governance Prediction- My Anxiety

July 12, 2008

I came across a news item talking about the IBMs top 5 predictions for Data Governance (http://www.ebizq.net/news/9883.html?rss). These predictions are coming out of IBM Data Governance Council, comprising 50 top-notch companies from wide range of sectors.

 

 

While the predictions were interesting and point to a brighter future for data governance, this one raised some level of dissonance. As a general disclaimer, I beg pardon if I have mis-interpreted the text of this prediction. These are only initial thoughts, and I do plan to dig out more. Please cascade this blog post to your network as it can generate some healthy discussion.

 

The text of that prediction is-

 

‘The role of the Chief Information Officer (CIO) will change making this corporate officer responsible for reporting on data quality and risk to the Board of Directors. The CIO will have the mandate to govern the use of information and report on the quality of the information provided to shareholders.’

 

My Dissonance- I don’t agree that CIO role should be taking the ownership of reporting on information quality as well as governing the use of information. It may lead to role conflict and de-focus the CIO role. These are the questions, I will seek clarifications on to address my dissonance: 

  1. CIO’s core role is to be a strategic internal service partner to the business and operations to ‘make it happen’. Should we be mixing the role of a service provider to a quasi-governance and oversight role.
  2. Will we not create a confusion between the roles of Audit, CFO, Internal Control and Data Steward (if you have one)? I feel that there can be enough roles to oversee and audit. The issue today is more towards owning and delivering on the tasks related to data governance.
  3. A CIO cannot have the mandate to report on the quality of the information provided to shareholder. This may set him/her up for failure. This has to be a business role, which encompasses not only IT, but the business processes, manual controls, compliance and regulatory checks outside of systems etc…Aren’t we conflicting this role with that of a CFO and CEO? CIO may be responsible for certifying that the data lying in the systems is consistent. However, how can the CIO take the ownership to govern the manual adjustment figures entered by finance at the period-end processing??
  4. If something is wrong with the information provided to shareholders, where the buck will stop? Will it be the CIO or CFO?
  5. CIO cannot have the mandate to govern the use of information. The use of information is defined by the user access matrix and distribution lists of various reports and outputs made by the system. This access matrix has to be defined by the Business and internal control. How can a CIO decide on which groups and which functions can use that information?. Secondly, not all information is in the systems, and lot of it is manual.
  6. From my point of view, the data governance and quality needs more business ownership at all the levels, as Data Governance goes much beyond the system boundaries. Much of the data issues are either due to faulty data entered in the systems or a lack of robust business specifications for IT systems. Isn’t the prediction seem to be recommeding the move in the other direction?

Looking forward to your comments. This post should be taken as invoking comments from readers, and inviting discussion. I will add more to this post. You may also refer my portal Business Intelligence and Performance Management Institute. Our main theme has been that data quality and data governance is much more a business issue than an IT issue.