How to read using sparklyr csv files located in blob storage without downloading it?

252 Views Asked by At

I´m using the following credentials auth for logging in blob storage in R:

library(AzureStor)

account_endpoint <- "https://mycorporation.blob.core.windows.net"
account_key      <- "mykey"
container_name   <- "mycorporation"

bl_endp_key <- storage_endpoint(account_endpoint, key = account_key)
cont        <- storage_container(bl_endp_key, container_name)
w_con       <- textConnection("foo", "w") 

I need to read a lot of huge csv files located in mycorporation/my_folder without making download and sequentially reading using sparklyr.

What is the best way to do it ?

1

There are 1 best solutions below

0
Vamsi Bitra On

If you want to access a small number of files then, the Blob storage path WASBS is a simple and direct way to read files from blob storage. To access a large number of files and more complex data sets use mount point.

Depending upon your requirement either choose Blob storage path or mount point.

Note: R is not capable of doing the actual mounting .So the workaround is to mount using another language like python and read the file using the library "sparklyr" as shown below.

Mount using python:

%python
dbutils.fs.mount(
    source = "wasbs://<container>@<storage_account>.blob.core.windows.net/",
    mount_point = "/mnt/<mount_path>",
    extra_configs = {"fs.azure.account.key.<Storage_account>.blob.core.windows.net":"Access_key"})

R notebook with sparklyr library :

library(sparklyr)
df2 <- read.df("/mnt/dem123",source ="csv",header = "true",inferSchema = "true")
display(df2)

enter image description here

Or

Configure the Blob storage .

%python
spark.conf.set("fs.azure.account.key.<storage_account>.blob.core.windows.net","Access_key")

Reading csv file using R

library(sparklyr)
path = "wasbs://[email protected]/read-employees-csv.csv"
df1 <- read.df(path,source ="csv",header = "true",inferSchema = "true")
display(df1)

enter image description here