Wednesday, October 10, 2012

ETL Design and Implementation

It is a good practice to have a job control and logging mechanism in place early in the initial design phase of an ETL package. The following questions can help to determine what control and logging functionalities are necessary to be part of the package.

How does the program know where to pick it up incrementally when daily refresh happens? How does it achieve automatic recovery when the ETL is interrupted before it completes? What are the procedures needed in the event of truncated refresh or dev-to-prod pushing? How to suspend the fact table loading if any error occurs in previous dimension tables? How to pinpoint exact error sources among all the steps? How to keep a history of data manipulation languages (DMLs) in your data warehouse/marts? Enable your ETL package to have some sort of intelligence to keep an eye on itself along the progression and to uncover any irregular events that may break the job. Some simple techniques have done the magic for us

No comments:

Post a Comment