Apologies beforehand if this turns out to be a silly question, I am new to hadoop environment.
I have two hadoop clusters my-prod-cluster and my-bcp-cluster. Both are accessible over the same network.
Is there any way to configure my clusters in such a way that when I am in BCP mode, all my queries to my-prod-cluster gets routed to my-bcp-cluster (on the basis of some config parameter or environment variable)
So when flag=prod
hadoop fs -ls /my-prod-cluster/mydir translates to hadoop fs -ls /my-prod-cluster/mydir
and fetches the data in /my-prod-cluster/mydir
when the flag=bcp
hadoop fs -ls /my-prod-cluster/mydir translates to hadoop fs -ls /my-bcp-cluster/mydir
and fetches data from /my-bcp-cluster/mydir
I am using [mapr][1] flavour of haddop(provided by HP), version 6.1, in case that matters
You could easily make a shell wrapper script that prepends the NameNode address to each query
For example, a fully-qualified command would look like this
So, refactoring that, you could have a script like
Then execute something like
my-hdfs-ls prod /mydirIf you need something more complex than that like Kerberos tickets, and such, then creating a separate
HADOOP_CONF_DIRvariable with uniquecore-siteandhdfs-siteXMLs for each cluster would be recommended.