Showing posts from 2017

Azure - Data lake Analytics

Azure data lake analytics helps to analyze the data in Azure data store.  Biggest benefit of using data lake analytics is to avoid thinking of networking, systems etc. this will be scales up or down based on the job requirements. we can set up pay as you go option which charges only for the usage. we should have Azure Data lake store before creating analytics which can be created during Azure data lake analytics too. click on the + mark which will open the below window and then select Data +Analytics option. once you select this option, you can see all the analytics related apps on the right side and choose Data Lake Analytics from there. this will open another window in which we can give other options such as resource group, data lake store etc. click on create and you are set to use Data Lake Analytics :)

Azure - Data lake

a data lake is a repository for large quantities and varieties of both structured and unstructured data in their native formats. if you have azure subscription, it is very easy to create new Azure data lake quickly. Click on new from the left side pan from Azure screen and select Data + Analytics after selecting the Data Lake Store, click on create button which will open another window to provide some more options. after creation, we can see the data lake properties from the dashboard as shown below.

Azure Data lake - Query

I was going through data lake job creation and re usability of the scripts written. Important point I would like to point out is that in data lake we cannot go back and correct any mistake in the query created or any modification in the existing query but have to create new query. I believe this has been done to track the changes in each and every query. It is easy to create any new query by selecting the sample queries provided by Microsoft and change the syntax and table names based on our new requirements and finally saving with different names. Sample query to extract the records from file. @Searchlog =   Extract         UserID              int ,                       Start                  Datetime ,                      Region               String ,                      Query                String ,                      Duration            int ,                      Urls                   String ,                      ClickedUrls       String   FROM @"/Samples/Data/

SSIS - Automation of ETL packages

We are proposing to automate the ETL packages to our logistics customer. SSIS is using project deployment and not package deployment. We have developed detailed framework for package deployment which will not work here.   More than 150 packages used to integrate legacy systems to Salesforce. All the legacy systems push their feed into FTP and SSIS data load framework pull it from this FTP path. Some of the packages are having dependency on other packages. All the source to stage packages will fail if they didn’t find the files on FTP. ETL project implemented in three releases and all the releases using different solution approach. We can use master package to call all these data load packages based on the scheduled time specified. As the packages are deployed in different projects, master package should have access to all these projects. Normally SSIS master package can access those packages in the same solution. To overcome this issue we can use below script in execute S

SSIS - dataflow task stuck

I had a strange issue with TFS and SSIS project solution file in my current project. SSIS package have multiple task in it which includes loop container, execute sql task, dataflow task etc. this package was running successfully and ran multiple times locally as well as from the server. TFS contain multiple projects under same solution file. One fine day when I opened solution file, I could see all the projects except one project. I tried different methods  described in blogs and msdn to get back my project but I not succeeded. Finally I have deleted the files from TFS and created new project under the solution and added all the packages into new project. As usual I tried to execute the package from my local machine and it was executing till the data flow task but not moving ahead and completing the project. after scratching my head for an hour, I found the issue with the package. when I deleted the package from TFS to get back the project under project solution file,

SSIS - Pull the data from Web service (REST API)

I got opportunity to pull the data from web service and load into SQL database. SSIS 2014 have one web service task but that was not suitable for REST API. In order to achieve this, we need to use script component in the data flow task. double click on the script component and can provide required variables. now go to the editor and add reference to web extensions. Project --> add reference --> browse and choose system.web.extensions.dll. now balance scripting is very easy even if you are not aware of C# coding. include all the required name spaces as displayed below. #region Namespaces using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using System.Net; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.IO; using System.Web; using System.Xml; using System.Collections.Generic; using System.Data.Sql