Glue Crawler cannot classify and create table with snappy compressed json files

516 Views Asked by At

I have a KFH application that puts compressed json files as snappy into an S3 bucket. I have also a Glue Crawler that creates schema using that bucket. However, the crawler classifies the table as UNKNOWN. It cannot detect the file is json indeed. According to below doc, Glue crawler provides snappy compression with JSON files but I couldn't achieve it. https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in

Thanks.

1

There are 1 best solutions below

1
Rishabh Sahrawat On

THis could happen, when the JSON files don't have same schema or it is complicated for the in-built classifiers to classify.

  1. If JSON files have different schemas then you should filter different schema files. You can test this bc just running crawler on few JSON files.

  2. If you are sure that the schema is same, but the crawler can't read it then build your own custom JSON classifier. You can read about it here. Once built, attach it to your Crawler and it should be able to read and status should change from UNKNOWN to your classifier's name.