How to read using sparklyr csv files located in blob storage without downloading it?

252 Views Asked by Rodrigo H. Ozon At 21 February 2023 at 20:35

I´m using the following credentials auth for logging in blob storage in R:

library(AzureStor)

account_endpoint <- "https://mycorporation.blob.core.windows.net"
account_key      <- "mykey"
container_name   <- "mycorporation"

bl_endp_key <- storage_endpoint(account_endpoint, key = account_key)
cont        <- storage_container(bl_endp_key, container_name)
w_con       <- textConnection("foo", "w")

I need to read a lot of huge csv files located in mycorporation/my_folder without making download and sequentially reading using sparklyr.

What is the best way to do it ?

Original Q&A

There are 1 best solutions below

Vamsi Bitra On 27 February 2023 at 05:46

If you want to access a small number of files then, the Blob storage path WASBS is a simple and direct way to read files from blob storage. To access a large number of files and more complex data sets use mount point.

Depending upon your requirement either choose Blob storage path or mount point.

Note: R is not capable of doing the actual mounting .So the workaround is to mount using another language like python and read the file using the library "sparklyr" as shown below.

Mount using python:

%python
dbutils.fs.mount(
    source = "wasbs://<container>@<storage_account>.blob.core.windows.net/",
    mount_point = "/mnt/<mount_path>",
    extra_configs = {"fs.azure.account.key.<Storage_account>.blob.core.windows.net":"Access_key"})

R notebook with sparklyr library :

library(sparklyr)
df2 <- read.df("/mnt/dem123",source ="csv",header = "true",inferSchema = "true")
display(df2)

enter image description here

Configure the Blob storage .

%python
spark.conf.set("fs.azure.account.key.<storage_account>.blob.core.windows.net","Access_key")

Reading csv file using R

library(sparklyr)
path = "wasbs://[email protected]/read-employees-csv.csv"
df1 <- read.df(path,source ="csv",header = "true",inferSchema = "true")
display(df1)

enter image description here

How to read using sparklyr csv files located in blob storage without downloading it?

There are 1 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in AZURE-BLOB-STORAGE

Related Questions in SPARKLYR

Trending Questions

Popular # Hahtags

Popular Questions