Azure Databricks - bulk insert to Azure SQL
Recently I had request to load the processed data into the azure SQL database from databricks. databricks program processing around 300 to 400 million records and aggregating those records into certain buckets. Even after the aggregation total number of records going inside the azure SQL database is 40 million. we found that the insertion is happening raw by raw and hence thought of doing the same using bulk insert option provided by the databricks.
databricks provided super documentation on bulk insert and I just followed the same.
to achieve this we need to create Spark connector library which can be done by using the upload option from the cluster.
Click on the library option and provide the coordinate and create the library as mentioned in the below figure.
once the library is created we used below code to execute the bulk insert. database name, user name, password, table name mentioned here are only for illustration purpose only. we had total 25 columns. we can either provide the metadata here or leave it blank but it is recommended to provide as it will improve the performance. if the metadata is not provided, then databricks match the target system metadata before the actual bulk load.
%scala
import com.microsoft.azure.sqldb.spark.bulkcopy.BulkCopyMetadata
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val deftable = spark.sql("select column1,column2 from tablename")
var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "ID", java.sql.Types.VARCHAR, 100, 0)
bulkCopyMetadata.addColumnMetadata(2, "Name", java.sql.Types.VARCHAR, 1000, 0)
Ref :- https://docs.azuredatabricks.net/spark/latest/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server
databricks provided super documentation on bulk insert and I just followed the same.
to achieve this we need to create Spark connector library which can be done by using the upload option from the cluster.
Click on the library option and provide the coordinate and create the library as mentioned in the below figure.
once the library is created we used below code to execute the bulk insert. database name, user name, password, table name mentioned here are only for illustration purpose only. we had total 25 columns. we can either provide the metadata here or leave it blank but it is recommended to provide as it will improve the performance. if the metadata is not provided, then databricks match the target system metadata before the actual bulk load.
%scala
import com.microsoft.azure.sqldb.spark.bulkcopy.BulkCopyMetadata
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
val deftable = spark.sql("select column1,column2 from tablename")
var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "ID", java.sql.Types.VARCHAR, 100, 0)
bulkCopyMetadata.addColumnMetadata(2, "Name", java.sql.Types.VARCHAR, 1000, 0)
val bulkCopyConfig = Config(Map(
"url" -> "myproj-sql-server.database.windows.net",
"databaseName" -> "databasename",
"user" -> "Username",
"password" -> "Password",
"databaseName" -> "databasename",
"dbTable" -> "dbo.tabletempbulk",
"bulkCopyBatchSize" -> "1000000",
"bulkCopyTableLock" -> "true",
"bulkCopyTimeout" -> "6000"
))
deftable.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)
"url" -> "myproj-sql-server.database.windows.net",
"databaseName" -> "databasename",
"user" -> "Username",
"password" -> "Password",
"databaseName" -> "databasename",
"dbTable" -> "dbo.tabletempbulk",
"bulkCopyBatchSize" -> "1000000",
"bulkCopyTableLock" -> "true",
"bulkCopyTimeout" -> "6000"
))
deftable.bulkCopyToSqlDB(bulkCopyConfig, bulkCopyMetadata)
Ref :- https://docs.azuredatabricks.net/spark/latest/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server
This is good information and really helpful for the people who need information about this.
ReplyDeleteGerman Classes in Chennai
german language course
best spoken english institute in chennai
Japanese Language Course in Chennai
top 10 ielts coaching in chennai
ielts exam coaching centre in chennai
Spoken English in Chennai
TOEFL Training in Chennai
spanish courses in chennai
content writing course in chennai
German Classes in Adyar
German Classes in T Nagar
Great Article Cloud Computing Projects
ReplyDeleteNetworking Projects
Final Year Projects for CSE
JavaScript Training in Chennai
JavaScript Training in Chennai
The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
This information is really awesome thanks for sharing most valuable information.
ReplyDeleteWorkday Training
Workday Online Training
Excellent blog since I have visited is really awesome. The important thing is that in this blog content written clearly and understandable. The content of information is very informative. We are also providing the best services click on below links to visit our website.
ReplyDeleteSnowflake Training
Workday Training
Okta Training
AEM Training
CyberArk Training
The above wont work with Databricks Runtime 7.3 and above. Any suggestions to run the above code on these DBRs ?
ReplyDelete