Building Metadata-Driven Pipelines in Azure Data Factory

 


1. Introduction to Metadata-Driven Pipelines

Metadata-driven pipelines in Azure Data Factory (ADF) provide a dynamic and scalable approach to orchestrating data workflows. Instead of hardcoding pipeline configurations, metadata (stored in a database or JSON file) defines:

  • Source & destination locations
  • File formats & schemas
  • Transformation logic
  • Processing rules

This approach enhances reusability, reduces maintenance efforts, and allows for seamless pipeline modifications without redeploying code.

2. Storing and Managing Metadata

Metadata can be stored in:

  • Azure SQL Database: Structured metadata for multiple pipelines
  • Azure Blob Storage (JSON/CSV files): Unstructured metadata for flexible processing
  • Azure Table Storage: NoSQL metadata storage for key-value pairs

For this blog, we’ll cover two practical examples:

  1. Using a JSON file stored in Azure Blob Storage
  2. Using a metadata table in Azure SQL Database

3. Example 1: JSON-Based Metadata in Azure Blob Storage

Step 1: Define Metadata JSON File

Create a JSON file (metadata.json) in Azure Blob Storage to define source and destination details:

json
{
"pipelines": [
{
"pipeline_name": "CopyDataPipeline",
"source": {
"type": "AzureBlobStorage",
"path": "source-container/raw-data/"
},
"destination": {
"type": "AzureSQLDatabase",
"table": "ProcessedData"
},
"file_format": "csv"
}
]
}

Step 2: Create a Lookup Activity in ADF

  • Add a Lookup Activity in ADF to read the JSON metadata from Azure Blob Storage.
  • Configure the Dataset to point to the JSON file.
  • Enable the First row only option if fetching a single record.

Step 3: Use Metadata in a ForEach Activity

  • Add a ForEach Activity to iterate over metadata records.
  • Inside the loop, use a Copy Activity to dynamically move data based on metadata.

Step 4: Configure Dynamic Parameters

In the Copy Activity, set dynamic parameters:

  • Source Dataset: @activity('Lookup').output.pipelines[0].source.path
  • Destination Table: @activity('Lookup').output.pipelines[0].destination.table

Now, the pipeline dynamically reads metadata and copies data accordingly.

4. Example 2: SQL-Based Metadata for Pipeline Execution

Step 1: Create Metadata Table in Azure SQL Database

Execute the following SQL script to create a metadata table:

sql
CREATE TABLE MetadataPipelineConfig (
ID INT IDENTITY(1,1) PRIMARY KEY,
PipelineName NVARCHAR(100),
SourceType NVARCHAR(50),
SourcePath NVARCHAR(255),
DestinationType NVARCHAR(50),
DestinationTable NVARCHAR(100),
FileFormat NVARCHAR(50)
);
INSERT INTO MetadataPipelineConfig 
(PipelineName, SourceType, SourcePath, DestinationType, DestinationTable, FileFormat)
VALUES
('CopyDataPipeline', 'AzureBlobStorage', 'source-container/raw-data/', 'AzureSQLDatabase', 'ProcessedData', 'csv');

Step 2: Use a Lookup Activity to Fetch Metadata

  • Add a Lookup Activity in ADF.
  • Configure the Source Dataset to point to the MetadataPipelineConfig table.
  • Fetch all metadata records by disabling the First row only option.

Step 3: Use ForEach Activity and Copy Activity

  • Add a ForEach Activity to loop over the metadata rows.
  • Inside the loop, configure a Copy Activity with dynamic expressions:
  • Source Dataset: @item().SourcePath
  • Destination Table: @item().DestinationTable

Step 4: Deploy and Run the Pipeline

Once the pipeline is deployed, it dynamically pulls metadata from SQL and executes data movement accordingly.

5. Benefits of Metadata-Driven Pipelines

Flexibility: Modify metadata without changing pipeline logic
 ✅ Scalability: Handle multiple pipelines with minimal effort
 ✅ Efficiency: Reduce redundant pipelines and enhance maintainability

6. Conclusion

Metadata-driven pipelines in Azure Data Factory significantly improve the efficiency of data workflows. Whether using JSON files in Azure Blob Storage or structured tables in Azure SQL Database, this approach allows for dynamic and scalable automation.

WEBSITE: https://www.ficusoft.in/azure-data-factory-training-in-chennai/

Comments

Popular posts from this blog

Best Practices for Secure CI/CD Pipelines

What is DevSecOps? Integrating Security into the DevOps Pipeline

SEO for E-Commerce: How to Rank Your Online Store