I want to read orders data and create RDD out of it which is stored as sequence file in hadoop fs in cloudera vm. Below are my steps:
1) Importing orders data as sequence file:
sqoop import --connect jdbc:mysql://localhost/retail_db --username retail_dba --password cloudera --table orders -m 1 --target-dir /ordersDataSet --as-sequencefile
2) Reading file in spark scala:
Spark 1.6
val sequenceData=sc.sequenceFile("/ordersDataSet",classOf[org.apache.hadoop.io.Text],classOf[org.apache.hadoop.io.Text]).map(rec => rec.toString())
3) When I try to read data from above RDD it throws below error:
Caused by: java.io.IOException: WritableName can't load class: orders
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:77)
at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:2108)
... 17 more
Caused by: java.lang.ClassNotFoundException: Class orders not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:75)
... 18 more
I don't know why it says that it can't find orders. Where am I going wrong ?
I referred codes from these two links as well but no luck:
1) Refer sequence part
2) Refer step no. 8
I figured out the solution to my own problem. Well, I am going to write a lengthy solution but I hope it will make some sense.
1) When I tried to read the data which was imported in
HDFSusingSQOOP, it gives an error because of following reasons:A) Sequence file is all about
key-value pair. So when I import it using sqoop, the data which is imported it is not in key value pair that is why while reading it throws an error.B) If you try to read
few charactersfrom which you can figure out thetwo classesrequired for passing as input while reading sequence file you ll get data as below:Above you can see only
one classi.eorg.apache.hadoop.io.LongWritableand when I pass this while reading the sequence data it throws an error which is mentioned in the post.I don't think that the
Bpoint is the main reason of that error but I am very much sure thatApoint is the real culprit of that error.2) Below is the way how I solved my problem.
I imported data as
avrodatafile in other destination usingSQOOP. Then I created the dataframe from avro using below ways:Now I created
key-value pairand saved it assequencefileNow when I try to read
fewcharacters of the above written file it gives metwo classeswhich I need while reading the file as below:Now when I try to print data it displays data as below:
Last but not the least, Thank you everyone for your much appreciated efforts. Cheers!!