Showing posts from August, 2017

Azure - Data lake

a data lake is a repository for large quantities and varieties of both structured and unstructured data in their native formats. if you have azure subscription, it is very easy to create new Azure data lake quickly. Click on new from the left side pan from Azure screen and select Data + Analytics after selecting the Data Lake Store, click on create button which will open another window to provide some more options. after creation, we can see the data lake properties from the dashboard as shown below.

Azure Data lake - Query

I was going through data lake job creation and re usability of the scripts written. Important point I would like to point out is that in data lake we cannot go back and correct any mistake in the query created or any modification in the existing query but have to create new query. I believe this has been done to track the changes in each and every query. It is easy to create any new query by selecting the sample queries provided by Microsoft and change the syntax and table names based on our new requirements and finally saving with different names. Sample query to extract the records from file. @Searchlog =   Extract         UserID              int ,                       Start                  Datetime ,                      Region               String ,                      Query                String ,                      Duration            int ,                      Urls                   String ,                      ClickedUrls       String   FROM @"/Samples/Data/

SSIS - Automation of ETL packages

We are proposing to automate the ETL packages to our logistics customer. SSIS is using project deployment and not package deployment. We have developed detailed framework for package deployment which will not work here.   More than 150 packages used to integrate legacy systems to Salesforce. All the legacy systems push their feed into FTP and SSIS data load framework pull it from this FTP path. Some of the packages are having dependency on other packages. All the source to stage packages will fail if they didn’t find the files on FTP. ETL project implemented in three releases and all the releases using different solution approach. We can use master package to call all these data load packages based on the scheduled time specified. As the packages are deployed in different projects, master package should have access to all these projects. Normally SSIS master package can access those packages in the same solution. To overcome this issue we can use below script in execute S