Azure Databricks : Mounting to ADLS

Databricks File System( DBFS) allows to store all the processed or unprocessed records into their file system. My customer is not ready to keep any data into DBFS as they believe it is not as secured as Azure data lake store (ADLS).

ADLS is not mounted to Databricks by default and hence it is my turn to mount the ADLS to the source layer to store the data for Databricks to process and store.

In order to continue with mounting of ADLS with databricks, make sure the below steps have completed successfully.

1. Install databricks
2. Install Azure data lake store
3. Register the databricks with azure active directory which is required to link the databricks with AD. Once you register the databricks app, will get service principleID and this ID should be provided at the time of mounting.

lets go through the app registration process first.

Steps: click on Azure active directory and select app registration from the left side of the window.




























now click on the new application registration which will open the below window.


we can give any name and no specific rule here but can follow customer standards. application type should be Web app/API  and the url here is just for name sake meaning this is not considered any where in the application. Once you create, it will generate application ID which is required for the mounting step.

so we need below details from app registration to mount.
  • Application ID
  • Authentication Key
  • Tenant ID 










click on settings and select Key  from the left hand side. provide the suitable name for the key and here I given the name as dbrkkey. once you save this settings, it will generate one key and we have to save this key some where safely as we will not be able to see the key again.










next value required for mounting is Tenant ID. Lets see how we can get this value. this is actually the directory ID present in Azure active directory. to get this value click on the active directory and go the properties to select the Directory ID.















Now we got all the required 3 keys, lets go to the databricks and mount the ADLS with databricks.
here  is the code to mount.

Python:


dbutils.fs.mkdirs("/mnt/mountdatalakejithesh") 
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
             "dfs.adls.oauth2.client.id": "ApplicationID",
             "dfs.adls.oauth2.credential": "AuthenticationKey",
             "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/DirectoryID/oauth2/token"}
dbutils.fs.mount(
    source = "adl://StorageAccount.azuredatalakestore.net/",
    mount_point = "/mnt/mountdatalakejithesh",
  extra_configs = configs)


now the data lake has been mounted here and ready to access the file from databricks.
we can provide complete path to the azure data lake store so that databricks can access the data from that path onwards.

to understand this clearly, I am pasting my data lake store file explorer details below.








so if we would like to give access to databricks only to demo, change the source path as below.
source =  "adl://StorageAccount.azuredatalakestore.net/Demo"

this is not enough to provide access to databricks but also have to add the newly registered app id with read/write/execution permission.




Comments

  1. I’m genuinely impressed with your knowledge. You have shared good knowledge by this blog. It was a really attractive blog. Please keep sharing your post with us.Azure Managed Services

    ReplyDelete

Post a Comment

Popular posts from this blog

Hadoop - Hive - Load data from csv/xls files

Microsoft BI Implementation - Cube back up and restore using XMLA command