Realizing the Potential of the Data Driven Enterprise
Webinar Replay and Additional Resources
In 2010 we all know that if we have the data, we can manage our businesses better. But if that data doesn’t reflect business’s needs, or is riddled with quality problems, or doesn’t arrive in time to make key decisions, then the potential of our enterprise being truly data driven cannot be realized. Fortunately, we have a formidable array of techniques and best practices that can make our data assets relevant, trustworthy, and timely.
In this webinar, Dr. Kimball describes in detail how to:
- identify the right business requirements,
- build and manage a data quality infrastructure, and
- source and deliver timely data
In addition, Informatica and MicroStrategy discuss approaches and considerations for building data-driven solutions while leveraging your existing EDW and BI infrastructure.
You may also download presentation slides from Ralph Kimball, Informatica and MicroStrategy.
Below is the list of additional resources mentioned during the webinar.
Other Information Resources
The following is a list of questions & comments submitted during the webinar and the responses provided by the speakers.
Please define relationships between and importance of data warehousing, data mining, ontologies, semantic web.
Ralph Kimball: Data warehousing (in my opinion) is the capture and making available for presentation of an organization’s data. While traditionally data warehousing has been focused on text-and-number structured data, it is clear that even unstructured data comes under this umbrella, if it is subject to analysis. Data mining, again in my opinion, is a client of the data warehouse.
On system design for effective business decisions based on business needs...should it be top down domain modeling vs. bottom up data mining with and analytics, and both together for business decisions?
Ralph Kimball: An effective EDW must be a combination of top down and bottom up. Top down is required to decide what the overall business requirements are and therefore what data should be captured, cleansed, integrated, and delivered for analysis. And that includes even the most granular atomic data. Bottom up then trolls this data with a variety of approaches including traditional reports, alerts and dashboards, as well as undirected data mining routines that look for interesting new patterns. But I want to emphasize that there is no such thing as pure bottom-up since the substantial commitment required to collect and expose any data implies a significant top-down decision.
Your virtual view is what is commonly called as "conceptual schema" based on domain modeling.
Ralph Kimball: I don’t use the words “virtual view” or “conceptual schema” so I am not sure I align with your assumptions here. Our modeling approach is based on real data that exists in physical documented sources. We never base our modeling on idealized entity-relation models, since these are not populated with real data.
For Informatica Data Services Version 9, how is license structure and mainframe connectivity included in the license or are there additional costs?
Julianna DeLua / Informatica: This is a separate license. To learn more, click here. For mainframe connectivity, click here to look at PowerExchange for Mainframe
Is this Data Services part of the base platform v9 or does it require buying an add-on component?
Julianna DeLua / Informatica: It is a new product called Informatica Data Service. To learn more, click here.
Alert, trigger report -- It seems the trigger is a basic part of a DBMS. So what is different with a trigger report?
Ralph Kimball: When building the data pipeline for a particular application, the latency (data freshness) requirement plays a big role. I showed that on the last slide of the webinar (the “latency triage”). So if the requirement is to provide business alerts with a guarantee of timing, then the database itself is not the hardest part. It is the extraction, cleaning, integrating, and delivery of the data TO the database that takes more thought and development. Then the last step of actually sending out the alerts is indeed interesting, since it can range from a query interface that periodically probes the database with a conventional query, all the way a daemon that runs inside the database and pushes results out to the final user interfaces.
Is there any example star schema for university or educational sector?
Ralph Kimball: There are lots of dimensional models used in education, but a university for example has dozens of separate business processes (data sources), ranging from admissions to student tracking to facilities to budgets to employee records to research grants and many more. Each of these business processes would have its own appropriate fact and dimension tables and they would be loosely coupled through conformed dimensions. Please access the Kimball Group books, especially the Data Warehouse Toolkit (Wiley, 2002) for design techniques and some of these education use cases.
The Virtual data model is good but how do you get performance?
Ralph Kimball: As stated previously, I don't reference a "virtual" data model, but if you are referring to a dimensional model, or star schema, there is more than 20 years of industry experience that shows these are the fastest data schemas for querying, especially compared to complex normalized data models involving dozens of joined tables. Of course, the star schema is the foundation for most OLAP deployments as well.
How is security handled in the virtual database provisioned by the data federation product?
Wei Zheng / Informatica: Security in the virtual database provided by Informatica Data Services is handled very much like real database security. Fine grained controls over database, table and even column access can be granted individually to users and groups, or specific roles. The user management is integrated with Informatica’s UUM (unified user management) interface, which has external connectivity / integration with LDAP and Active Directory systems. Authentication is managed through a single access layer.
Is there such thing as real-time datawarehousing?
Ralph Kimball: The word “real time” must be divided into specific latency requirements. That is what my slide on the latency triage discussed. If you are referring to “zero latency” data delivery, then yes, you can do that. Such an extreme form of data delivery is often grouped under the term Enterprise Information Integration (EII) and true EII places some restrictions on how much cleaning and integrating you can do before the data appears on the user’s screens. But EII is eminently possible if you are willing to live with the compromises. To review the latency triage slide from the webinar, click here!
What are you doing to reach the business leadership directly as well as the IT and EA staff in organizations?
Ralph Kimball: I alluded in the webinar to using the “agile” approach to data warehouse development, which has business sponsorship as its cornerstone. If you pursue this approach as opposed to the waterfall approach then you start with the business community. Much has been written on the agile approach. My only warning is not to become too much of a “methodologist” but harvest the basic lessons that come from the agile community.
Can the source be multidimensional?
Ralph Kimball: Yes, of course. I assume that you are thinking of an actual OLAP technology being the source point. Having such a source participate in your enterprise data warehouse is eminently feasible. I would look closely at whether you can conform the dimensions of this source with the rest of your EDW to make it participate effectively.
Can you define the schema for the service interface so you don't see any database specific objects or something tied to Informatica?
Wei Zheng / Informatica: Yes. The Informatica Logical Data Object functionality – which is a core component of the architecture for Data Services, will allow you to create a data abstraction layer so the source specific metadata are not exposed directly to the application or end consumers. Once the data objects are published as externally consumed services, then no specific Informatica metadata will be seen by the consumers, they will be presented just as another database or web service.
How can we integrate code from multiple teams towards the end of development?
Ralph Kimball: On one of my last slides I recommended a series of architectural sprints at the beginning of an agile EDW effort. The most important such sprint is the definition of the conformed dimensions. An EDW can be profoundly distributed, with far flung locations and incompatible technologies. The beauty of conformed dimensions is that at query/analysis time you can “drill across” these distributed sources and assemble an integrated answer set if your dimensions are conformed. So in many cases, integrating “code” should not be necessary. Even many of the back room data hand-offs can be done through simple services defined in an SOA framework that by definition are above specific code implementations. I am thinking especially of master data management services like “fetch new dimension members”, etc. There is a wealth of information and guidance on Master Data Management from Informatica.
Note: To view PDF files, download the latest version of Acrobat.