How to import/read csv file in R server in azure hdinsight ML service

214 Views Asked by At

Overview:
Azure HDInsight
Cluster Type: ML Services (R Server)
Version: R Server 9.1 (HDI 3.6)

I am trying to import a csv file from Azure data storage blob into R server environment. But it's obviously not as easy as I thought it would be or just as easy as locally.

First thing I tried was installing sparklyr package and set connection.

#install.packages("devtools")
#devtools::install_github("rstudio/sparklyr")
install.packages("sparklyr")
library(sparklyr)
sc <- spark_connect(master = "yarn")

But due to an old version installed in HDI, there's an error message.

Error in start_shell(master = master, spark_home = spark_home, spark_version = version,  : 
  sparklyr does not currently support Spark version: 2.1.1.2.6.2.38

Then I tried to use rxSparkConnect but didn't work either.

#Sys.setenv(SPARK_HOME_VERSION="2.1.1.2.6.2.38-1")

cc <- rxSparkConnect(interop = "sparklyr")
sc <- rxGetSparklyrConnection(cc)

orgins <- file.path("wasb://[email protected]","FILENAME.csv")
spark_read_csv(sc,path = origins, name = "df")

How would you read a csv file from azure storage blob into the r server environment?
I'm a little upset at myself that this is taking so long, and it shouldn't be this complicated, please help me guys! Thanks in advance!
related post 1 related post 2

1

There are 1 best solutions below

0
timxymo1225 On

I found a imperfect work around is to upload the data in the "local" environment in the bottom right corner and simply read the csv file from there.
enter image description here

There's gotta be a better way to do it, since it's a lot of manual work, probably impractical if data size is big and it's a waste of storage blob.