In the hope of achieving Cloudera Backup and Disaster Recovery to AWS-like functionality in GCP, I am searching for some alternatives.
Will the below approach work?
- adding GCP connector to an on-prem Cloudera cluster
- then copying with
hadoop dist-cp - then syncing hdfs source directory to gcs directory with
gsutil rsync [OPTION]... src_url dst_url
If the above approach is not possible then is there any other alternative to achieve Cloudera BDR in Google Cloud Storage (GCS)?
As of the moment, Cloudera Manager’s Backup and Disaster Recovery does not support Google Cloud Storage it is listed in limitations. Please check the whole documentation through this link for Configuring Google Cloud Storage Connectivity.
The above approach will work. We just need to add a few steps to begin with:
DistCpcommands to move your data.For more detailed information, you may check this full documentation on Using DistCp to copy your data to Cloud Storage.
Google also has its own BDR and you can check this Data Recovery planning guide.
Please be advised that Google Cloud Storage cannot be the default file system for the cluster.
You can also check this link: Working with Google Cloud partners
You could either use the following connectors:
gs://prefix.hadoop fs -ls gs://bucket/dir/file.gsutil cporgsutil rsynccommands.You can check this full documentation on using connectors.
Let me know if you have questions.