Why does Apache drill change the column names when extractHeader is enabled?

58 Views Asked by At

I have the following data in the CSV file,

PRODUCTID PRODUCTNAME SUPPLIERID CATEGORYID UNIT PRICE
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton's Gumbo Mix 2 2 36 boxes 21.35

I've enabled extractHeader in the csv config of dfs plugin.
Apache Drill version: apache-drill-1.21.0
No. of drillbits: Single
OS: Windows

While querying on the csv file using following query:

SELECT * FROM dfs.`/var/lib/PRODUCT.csv`

Case 1:
the output is drill-output

Why does drill change the ID column name like that?

Case 2: It does some more modifications when we have special characters in column name.
For example -

#UNITS is changed to col_UNITS

FINANCIAL$RECORD is changed to FINANCIAL_RECORD

Is there any criteria on which these changes are made?


The problem with this is that while making a SELECT query with the original column names, we don't get any output. I've tried to go through the documentation and the JIRAs in Apache Drill but didn't find anything helpful.

Thanks in advance.

0

There are 0 best solutions below