Hive Create table over S3 in RIAK CS

353 Views Asked by At

I have Hive service running on a Hadoop cluster. I'm trying to create a Hive table over Eucalyptus(RIAK CS) S3 data. I have configured the AccessKeyID and SecretAccessKey in core-site.xml and hive-site.xml. When I execute the Create table command and specify the S3 location using s3n schema, I get the below error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.http.conn.ConnectTimeoutException: Connect to my-bucket.s3.amazonaws.com:443 timed out)

If I try using the s3a schema, I get the below error:

FAILED: AmazonClientException Unable to load AWS credentials from any providern the chain

I could change the endpoint URL for distcp command using jets3t, but the same didnt work for Hive. Any suggestions to point Hive to Eucalyptus S3 endpoint are welcome.

2

There are 2 best solutions below

1
kuenishi On

I'm not familiar with Hive, but as long as I hear it uses MapReduce as backend processing system. MapReduce uses jets3t as S3 connector - changing its configuration worked for me in both MapReduce and Spark. Hope this helps: http://qiita.com/kuenishi/items/71b3cda9bbd1a0bc4f9e

Configurations like

s3service.https-only=false

s3service.s3-endpoint=yourdomain.com

s3service.s3-endpoint-http-port=8080

s3service.s3-endpoint-https-port=8080

would work for you?

0
Veronica On

I have upgraded to HDP2.3(Hadoop 2.7) and now I'm able to configure s3a schema for Hive to S3 access.