Dataframe introduction



A DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. I tried to explain the creation of dataframe using csv file and manipulate the data and store the processed records into another file or table for further processing. Data transformation using spark data frame is very easy and spark provided various functions to help the transformation.




I used databricks community edition for this demo.






Comments

Post a Comment

Popular posts from this blog

Microsoft BI Implementation - Cube back up and restore using XMLA command

Databricks - incorrect header check