I have a parquet file in google cloud bucket which I want to make a query upon.
As per one of the answer mentioned I have added the configuration in core-site.xml under $DRILL_HOME/conf as below-
<configuration>
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
</property>
<property>
<name>fs.gs.project.id</name>
<value><my_project_id></value>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value><path_to_json></value>
</property>
</configuration>
Then I added this in storage-plugins-override.conf -
{
"name": "gcs",
"config": {
"connection": "gs://<my_bucket>",
"enabled": true,
"formats": {
"json": {
"type": "json"
}
}
}
}
After saving this, I restarted the drill. When I am running command show schemas;, the gcs schema is not showing up which is bloking me to make any query on the parquet file in GCS.
When I run use gcs;
Error is coming as
Error: SYSTEM ERROR: ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
Then I checked the logs in sqlline.log
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
What am I am missing? Thanks in advance.
Tried in web UI of drill by creating storage plugin for gcs. The configuration looks like:
{
"type": "file",
"connection": "gs://my-bucket",
"config": {
"store.format": "parquet"
},
"formats": {
"parquet": {
"type": "parquet"
}
},
"enabled": true
}
That is also not working.
You should install Google's Cloud Storage connector for Hadoop into the jars/3rdparty directory on each of your Drillbits.