Azure Data Factory

When I heard about Azure data factory (ADF), I thought this was something new technology to store the data and never thought Microsoft will come up with another data load tool/service other than SSIS.

though ADF is used for moving data between on premises database to cloud or between different cloud storage, it is not different from SSIS in basic functionality.

Main difference between ADF and SSIS is that ADF can handle unstructured data and also can run the variety of language scripts such as python, node.js, USQL etc. ADF is created mainly to satisfy the cloud needs of the enterprise whereas SSIS is integrating data from traditional data sources. ADF can even execute the SSIS package.

I started working ADF v1 and found it very difficult to implement as it requires JSON script to create datasets, pipeline and linked service.

with the release of ADF V2, Microsoft eliminated this difficulty by including graphical interface to create datasets, pipeline and linked services.

ADF can be used to pull the data from on premises data and load into blob and then use USQL to transform the data and load into data lake store. move these data lake data to azure SQL database. process the data using spark/hive/pig and store back in data lake store. these all activities can be included in one pipeline as shown below.


Comments

Popular posts from this blog

Hadoop - Hive - Load data from csv/xls files

Microsoft BI Implementation - Cube back up and restore using XMLA command