Configure Apache Drill to read xml files in Mapr distribution

128 Views Asked by At

I have a project where I should read xml files with Apache Drill to process it , can someone tell me how I can configure it? NB: I use Mapr distribution

I tried to add the configuration to the configuration UI but I get a error(see image) enter image description here Thanks in advance

2

There are 2 best solutions below

0
Dzamo Norton On BEST ANSWER

You'll need to use a Drill distribution based on Apache Drill >= 1.19 for the XML format plugin.

0
Ted Dunning On

So this is more of a Drill question than a MapR question.

There are two key steps here

  • make sure that Drill can access whatever you use to store your data (sounds your data is xml files in MapR (which is now called HPE Ezmeral Data Fabric))

  • make sure that Drill can understand the data you have. I am not current on Drill, but reading many kinds of XML should be doable.

For getting access, there are two major paths to accessing files on Ezmeral Data Fabric. One path is to mount the data fabric as a conventional file system on all the nodes running Drillbits. This is often done using NFS mounts, but can also be the FUSE driver provided with data fabric.

The other major approach to getting data access is to use the HDFS API framework to access data via maprfs://... path names. This requires installing the data fabric client on all of the nodes running Drillbits.

It sounds like you are running the version of Drill that is packaged with the old MapR or current HPE Ezmeral system. This is the easiest approach since the packaged version is integrated with the client libraries needed to use the HDFS API with maprfs:// resources (it also provides access to the tables and streams in the data fabric).