ETL (Extracting, Transforming and Loading):
An ETL tool is a software, mainly used for Extracting, Transforming and Loading data.
In today’s data-driven world a huge amount of data is generated from various organizations, machines, and gadgets, irrespective of their sizes. For example, your mobile, each time you browse the web, some amount of data is generated.
Let’s go through each step of the trio process (Extract, Transform, Load)
1. Extract: Extraction of data is the most important step of ETL which involves accessing the data from all the Storage Systems. The storage systems can be the RDBMS, Excel files, XML files, flat files, ISAM (Indexed Sequential Access Method), hierarchical databases (IMS), visual information etc. Being the most vital step, it needs to be designed in such a way that it doesn’t affect the source systems negatively. Extraction process also makes sure that every item’s parameters are distinctively identified irrespective of its source system.
2. Transform: Transformation is the next process in the pipeline. In this step, entire data is analyzed and various functions are applied on it to transform that into the required format. Generally, processes used for the transformation of the data are conversion, filtering, sorting, standardizing, clearing the duplicates, translating and verifying the consistency of various data sources.
3. Load: Loading is the final stage of the ETL process. In this step, the processed data, i.e. the extracted and transformed data, is then loaded to a target data repository which is usually the databases. While performing this step, it should be ensured that the load function is performed accurately, but by utilizing minimal resources. Also, while loading you have to maintain the referential integrity so that you don’t lose the consistency of the data. Once the data is loaded, you can pick up any chunk of data and compare it with other chunks easily.
No comments:
Post a Comment