Showing posts from February, 2020

Azure Data Factory - Pull files from SFTP

Recently I had a request to pull the data from Linux based SFTP from my customer. ADF is not able to connect the SFTP due to firewall settings and we had discussion with Microsoft to get the solution. But unfortunately Microsoft said we need to wait couple of months to get the solution. I came up with another solution as business is not ready to wait till Microsoft help us. We have SSIS license already and I thought of making use of it. Below is the high level architecture that I proposed. Use SSIS to pull the file from Linux SFTP and download into local folder. For each feed separate folder is created and the files are downloaded based on the last modified date. ADF pick all the files from windows FTP based on the date and loop through each file and load into Azure data lake store  RAW layer and then later to analytic layer. RAW layer to Analytic layer processing is done using databricks script which is called inside ADF. EST_GET_FEEDS task hit