Apache Drill - Unable to start Drill in distributed mode (In GCP Dataproc)

109 Views Asked by At

I am trying to run Apache Drill in distributed mode on Google Cloud Dataproc, but unable to start drillbit on each node in the cluster.

I have created a basic cluster (1 master, 2 worker) with GCP Dataproc service, using the initialization scripts and instructions provided in the Apache Drill website.

Installing Drill in Distributed Mode in Dataproc

Apache Drill 1.19.0 and Apache Zookeeper 3.6.3 versions were configured in the setup script. The cluster provisioning in Dataproc was successful and I am able to connect with each node using SSH. When I tried to check the status of Zookeeper using telnet localhost 2181 and entering stats, it is showing the following

Zookeeper Status

Then, I try to start drillbit service on each node using the command bin/drillbit.sh start as mentioned here Starting Drill in Distributed Mode,

then it shows

Starting drillbit, logging to /opt/drill/log/drillbit.out

When I check the status of drill using bin/drillbit.sh status, it displays

/opt/drill/drillbit.pid file is present but drillbit is not running.

Kindly provide help on how to resolve the issue and setup Apache Drill in distributed mode.

1

There are 1 best solutions below

4
Dzamo Norton On

I don't know Dataproc but the contributed scripts you're using, specifically automation.sh and apache-drill.sh, already contain commands to start ZooKeeper and Drill. So you shouldn't be using drillbit.sh to start up Drillbits yourself. You can check whether Drill is running by going to its web UI at http://[drillbit-host]:8047. Note that there is no master node in a Drill cluster and you can use any one of the Drillbits in the web UI URL.

Footnote: Drill has moved on a bit since 1.19 so you might try making the following change on line 10 of apache-drill.sh.

readonly DRILL_VERSION='1.21.1'