Introduction:
The Main objective of this ETL process to data will be extracted from 3 types of sources,and ingest those raw data into Azure Synapse and transform it to load Facts and Dimension tables.
Ingest pipeline design describes how the raw data transformed from source systems to sink (Synapse) and.......
Introduction:
The Main objective of this ETL process to data will be extracted from 3 types of sources,and ingest those raw data into Azure Synapse and transform it to load Facts and Dimension tables.
Ingest pipeline design describes how the raw data transformed from source systems to sink (Synapse) and shows how Azure Data Factory activities are used during the data ingestion phase.
Raw data ingestion design
Below diagram shows a high-level design for copying data from sources ARGUS - SQL server, SAP ECC, and flat files to target data warehouse (Sink) on cloud Azure Synapse Analytics.
In this process configuration driven framework is copying the data from sources to target using a csv file which consists of source & destination schema,table and path info which is stored in ADLS2. using these configuration files to be read and passed to the pipeline dynamically.
Step 1:
Pipeline reads data from config file to get database, tables, path
Step 2:
Using ADF linked service and data set objects, copy data from source to sink
Step 3:
All raw data ingestion load is configured to perform “Truncate and load”
Azure Synapse destination:
Pipeline auto-creates tables directly based on source column names and data types
Data transformation design
Data transformation describes how raw data gets transformed and restructured into facts and dimension tables as per the designed data model using Star schema.
data transformation will be implemented using two approaches
SQL script driven
Pipeline reads data from config file to get database, tables, path
Visual way of transformation – Code free
Using ADF Data Flow Activity to transform & load data into Synapse
Transformation using T-SQL
Both our Dim and Fact implement using Slow changing dimensional type1 approach in TSQL.
Dimension And Fact Load:
Step 1: Create SQL views for dimension that holds transformation logic
- Surrogate key – MD5 Hash based on natural key
- MD5_HASH column – for all dimensional attributes to track the changes
- LAST_LOAD_TS - housekeeping column
- Dimensional attributes
Step 2: Create Store Procedure to perform Inserts / Updates for loading SCD Type 1 dimensions. This procedure takes source table, target table names and primary key column as inputs
Step 3: Create and load Dimensional tables from Staging VIEWS and Store Procedure