Hive 3.x causing error for compressed (bz2) json in external table

186 Views Asked by At

I have some JSON data (about 60GB) that I have to load in Hive external table. I am using Hive 3.x with Hadoop 3.x. The schema of table is as follows:

    CREATE TABLE people(a string, liid string, link string, n string, t string, e string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
STORED AS TEXTFILE LOCATION '/data/db/';

I have also loaded the jar for serde as follows: ADD JAR /usr/hive/lib/hive-hcatalog-core-3.1.2.jar;

If I copy a simple text json (or load) then DML queries (select etc.) works fine. As data file is very large and thus I have compressed it (20GB now). I have loaded this compressed file into Hive table (created above).

hive> select * from people;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Field name expected
Time taken: 0.096 seconds
hive> 

It is working fine with uncompressed data. What is the issue with this ?

I have tried some solutions like this but not successful

1

There are 1 best solutions below

0
Hafiz Muhammad Shafiq On

I found the solution myself. Actual the issue was there are two columns that are arrays in json. They should be mapped to ARRAY in hive. The sample I taken for schema did not contain these array. Hence, by changing the field type to array<<string>> for one column solved my issue.