I have some JSON data (about 60GB) that I have to load in Hive external table. I am using Hive 3.x with Hadoop 3.x. The schema of table is as follows:
CREATE TABLE people(a string, liid string, link string, n string, t string, e string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
STORED AS TEXTFILE LOCATION '/data/db/';
I have also loaded the jar for serde as follows:
ADD JAR /usr/hive/lib/hive-hcatalog-core-3.1.2.jar;
If I copy a simple text json (or load) then DML queries (select etc.) works fine. As data file is very large and thus I have compressed it (20GB now). I have loaded this compressed file into Hive table (created above).
hive> select * from people;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Field name expected
Time taken: 0.096 seconds
hive>
It is working fine with uncompressed data. What is the issue with this ?
I have tried some solutions like this but not successful
I found the solution myself. Actual the issue was there are two columns that are arrays in json. They should be mapped to ARRAY in hive. The sample I taken for schema did not contain these array. Hence, by changing the field type to
array<<string>>for one column solved my issue.