Posts

Showing posts from 2013

Hadoop - how to learn

Image
Enterprise level data analysis and processing requirement is keep on changing day by day and it is proved that in order to make correct judgments better to deep dive into as much data as possible. Companies are adopting BIG DATA technologies to analysis the big volume of data that comes to their eco system. Some years back market analysts thought that this trend is purely due to peer pressure but later pioneers who implemented Hadoop (Open source big data frame work) started reaping the benefit. Most of the companies are adopting Hadoop framework for their data problems. There are multiple technologies which can handle big volume of data such as massively parallel processing systems like green plum, Microsoft SQL Server MPP systems, Teradata etc.  Problem with these database is cost involved for implementation and maintenance. Hadoop is completely open source and can implement using in-expensive commodity servers. There are lot of players in the market who is specialized in thei

MS-Excel -trick to format the data

Image
Most of our day to day work we need to use MS-Excel to summarize or calculate some business data. We may have to bucket these data into various format. For e.g I need to analyze last year sales data month on month but the date should be in this particular format YYYY-MM-DD and the format of the same in spread sheet is in this format YYYYMMDD. We can change the format by two methods- 1. add custom format by right clicking and select custom option  Fig 1 give the required format here as showed in Fig 2 which enables to use the same format easily in other part of the excel as well. Fig 2 2. second option is very easy like any other formula in excel Fig 3

SQL Server - Restore the database using mdf

Image
Today I had to restore the database from MDF file and the log file associated with this database was not available. MDF file is the place where all the data get stored and the log files stores log information which helps in restoring to the previous working condition in case of failure. it is very easy to attach the database using SSMS. Right click on the Databases folder and select Attach  Fig 0.1 which will open up the below window Fig 0.2 Click on add and browse the MDF file from the stored path  Fig 0.3 Though the log file is not available, still the reference to the file exist in the MDF and hence we need to remove the log file path information from here before proceeding. Fig 0.4 So we can either take the back up of the database for future use or copy MDF and LDF files or only MDF files.

SQL - Sql Server 2008 R2 ( Provider Named Pipes Provider, error:40- Could not open a connection to SQL Server)

Image
Today I have installed SQL server 2008 R2 express edition to my local machine but when I tried to login to the server got the below error message.  Fig 0.1 I did install express edition some years back and was successful at that time and not remembering such issue came up at that time. I tried below steps but still was not successful.              - Enabled TCP\IP port              - Added exception for SQL server and browser in windows firewall              - double checked the server is running or not But I was not lucky enough to get into the server. Ignoring small things in life will cost you more some time as here in this scenario I have forgot to give the server name and the instance together before hitting the login button. I am able to login to the express edition when I have given the server name as .\SQLEXPRESS.  Catch here is server will not listen if you give it as only dot(.) or [local host].

SQL - Identify the name of SQL instance

Image
It is very simple to identify the name of the installed instances. I have installed SQL server 2008 R2 express edition in my system and was not able to find out the name exactly. I know that SQL by default gives the name as SQLExpress but we can make sure the name from registry value. - go to start -> run and give regedit there which will open the Registry editor window. Fig 0.1 - Click on HKEY_LOCAL_MACHINIE - Expand SOFTWARE - Expand Microsoft - Expand Microsoft SQL Server - Go to Instance name to find out the actual instance name.        Fig 0.2

SSIS - Loop and get all the file data using Foreach

Image
It is easy to map all the files in a specific disk folder and fetch the data for further processing using SSIS.  SSIS is having Foreach Loop container, drag this task tool from the tool box to the work area. For the demonstration of the usage of this task, I am going to pull data from multiple CSV files copied into folder in my system M:\Jithesh. Fig 0.1 Map the folder name in Foreach container properties and decide what kind o files we need to use it for data collection. it can find any type of files inside this folder or we can restrict based on our requirement. here I would like to consider only CSV files.  Fig 0.2  As you seen in the Fig 0.2, provide the folder path and the type file in Files section, here as I mentioned *.csv picks all the CSV files in this path for the processing. we can retrieve the file name and pass to the task inside the Foreach loop in different ways, Fully qualified pick the

SSAS - Custom drill through

Image
Here I would like to give one of the very useful tip in SSAS cube drill through report. One of my client required to have specific order of the fields in the report. Normally drill through report in cube comes as per the order of dimension selected. This can be achieved using Analysis Service Stored Procedure (ASSP). Please visit http://asstoredprocedures.codeplex.com/  for more details. Now download and register the DLL. steps you can find from the below link. http://asstoredprocedures.codeplex.com/wikipage?title=Installation%20Instructions Here I will demonstrate how to make use of ASSP to get the custom drill through. Once you have installed, it appears under Assemblies folder of solution explorer.  Fig 0.1 Now go to Action tab of the cube and create new action Fig 0.2 Provide name to the action as per the project standard, here I have given as Custom Drill Through. Target type property decides wh

SSIS - [OLE DB Source [392]] Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager

Image
If you are a ETL developer with SSIS you will be definitely faced this issue at least once. I have spent hours of my development efforts to sort out this issue so I hope this post will help developers in future. I was pulling the data from oracle database and pushing the information into MS-Excel report after cleansing the data. I had used ADO.net provider for oracle and given the required information such as user name, password, server name etc. Tested the connection and it was perfect. But when I executed the whole package together it started throwing this error. As every developer search for this issue in internet, I also spent hours to get the answer for my issue but most of the answers says problem fixed by doing below changes to the SSIS packages.    - Change the Delay validation  property of the task true so that it will not validate the connection       immediately.    - Change protection level of the package to DontSaveSensitive    - Change Run64BitRunTime property 

SQL - Concatenate multiple column values

Image
Yesterday I got one requirement to concatenate multiple values of same column into one row based on the specified group. Initially I tried it with MS-Excel but it failed suddenly I switched my method to SQL server and finished the job in no time. I thought it will be better to share the code for future use. Requirement Due to restrictions in giving the actual requirement, masked the details and also reduced the total number of columns. In actual requirement there are millions of records are there and here I reduced the records to 6 for easy understanding. Look at the figure F 0.1 Fig 0.1 for the same customer and city there are multiple product serials and the output result should be like Fig 0.2 Fig 0.2 Solution Create table with identity column in SQL server and insert the raw data into the table. Identity column helps us to go through the each rows in the table. CREATE TABLE RequirementData (     ctr INT IDENTITY (1,1),     [Customer] [ NVARCH

SSAS: Storage Modes

Image
As we know cubes pulls the data from relation database and stores it in cube. Storage mode helps cube to store data in cube and dimension. There are three storage modes available in SSAS                                 - MOLAP                                 - ROLAP                                 - HOLAP MOLAP or multi dimensional Online Analytical Processing is the default storage mode of SSAS cube. MOLAP takes the copy of facts and dimensions. ROLAP leaves the fact and dimension data in relational tables. HOLAP use both relational as well as multi dimensional storage, mostly aggreations stores in MOLAP and fact and dimensions store in ROLAP. Important note is dimension can have options to use either MOLAP or ROLAP unlike cubes where all the 3 storage mode is suitable. Now we can look at the advantage and disadvantage of these storage modes which helps us to choose the proper storage mode based on the requirement.

SSAS : Transaction deployment

Image
There are different options available to deploy and process the cube. Based on project requirement we  can carefully select the appropriate deployment options. Here I would like to explain the use of transaction deployment property of cube deployment attribute. Normally while processing the cube user will not be able to access the cube but if you enable this property cube user can access the data from the cube while processing. Changes will be committed only if the deployment succeeds.

SSIS - Error at pull the data from db[Excel Destination [197]]: SSIS Error code DTS_E_OLEDBERROR

Image
This is one interesting error that I got while loading data to excel report. I have created very simple package to copy excel report template from my template folder to destination folder and trying to load data from my database but it throwed below mentioned error.  Fig 0.1  Look at the pacakge in the figure 0.2.  Fig 0.2  It pick the template from the template folder and the file name is shippedReport.xls as shown in Fig 0.3  Fig 0.3  Name of the output file is different from the input template and the name is ShippedSerials_+date.  Name of the file generating dynamically with the help of the variable.  Fig 0.4  Here comes the problem. while executing the task, it will try to find out the destination file is existing in the folder and if it is not able find out the file throws the above mentioned error. We can avoid this issue by making the delay validation propery of loading

SSAS - ignore unrelated Dimensions

Image
It is good to ignore any un related dimension while comparing multiple measure values together. Lets consider below example to explain the use of the property ignoreUnrelatedDimension in cube structure of SSAS. I have Case and installedbase details in the cube. case store all the information about the case created by the customer and installed product stores all the information related to the product. Look at the dimension relations ship as shown in Fig 0.1 Fig 0.1 Person dimension is not related to ZINSTALLBAEproactive measure. Now lets look at the excel sheet below in which I considered count of case and count of installed product for the dimension person. Fig 0.2 Here we can see that Serialnumber count of installed product is repeating. this may confuse the end user to have proper decision. This we can avoid by changing the setting ingoreUnrelatedDimension property of measure group to false as givne in Fig 0.3  Fig 0.

SSIS- Load data from Text

Image
In datawarehousing field, it is more likely to load data from text files and it is deveopers head ache to deal with the problems of formatting issues.  In this post I would like to show one typical issue with text file which can be tackled very easily. I have a file with Sudent,Mark and address details which need to load into the table. I have created the flat file connection in SSIS and realised that it was not able to recognize the columns due to some extra junk lines in text file before starting of the actual required details and considered as a single column as shown in Figure 1. Figure 1 Find out the number of lines with junk values and skip those lines by providing the numerical skip values to the property Header rows to skip as shown in Figure 2. Also provide the tick mark for the property column names in the first data row which will consider the first line after the skipped lines as a column name. Figure 2 now go back to

Microsoft Excel - Data analysis

Image
MS-Excel is one of the wonderful tool available today for data analysis. It is having number of lovely features to do data analysis. Here I would like to show you one simple but very time saving and important bucket analysis. In data analysis field, it is very important to bucket the data and do the deep dive of each area seperately based on the bucket result. I have one file with Population data as shown in Figure 1 Figure 1 Now the requirement is to identify each 20 years bucket right from 0, how many people are belongs to each. We can easily analyse this by loading these table data into pivot and group accordingly which I have demonstrated below. Step 1 : Create pivot table Figure 2 Step 2 : pull the people field into aggregate window of pivot and age into row values as in figure 3 Figure 3 Step 3: right click on the row label (bucketing field) and select group and provide the details as in figure 4 

SSIS - SQL Server 2012

Image
I have been thinking to explore SQL server 2012 since its lanuch date April 2nd 2012. I had opportunity to work using SQL server 2005, 2008 and 2008 R2 BI tools and now SQL server 2012 code name changed to DENALI. Microsft is boasting about Denali's capability in self service and corporate business intelligence and we can bet on this only after some years. One of the main change that I liked in SSIS is its look which is totally changed from its predecessor. Interface is improved to include details about the selected task and sample to learn about it, added one more window to work with package parameters, tool box is having two more windows to add favourites and common tasks and the user can choose which task can be added into these windows. Another useful but minor feature that developer would like is undo and redo feature which was not available in earlier versions. In this post I would like to explain about another wonderful feature which will definitely a great help to th