Using datalad with Google Cloud Storage

149 Views Asked by At

I am new to Datalad. I am trying to achieve version history and commit details for every person who is doing any changes to my Datalad dataset.

For far, I am able to create a sibling of my local dataset to a cloud storage bucket and able to export the Datalad dataset to GCS bucket/Datalad sibling.

What I am trying to achieve here is the below points: -

  1. where ever some files get changed to my Datalad directory a commit should be able to capture the user details.

Currently, it captures the git config details of my that I set during the git installation. Is there a way to dynamically pass these values using Datalad while doing a commit?

  1. I don’t want my local disk to maintain the history of the files, just the metadata, version history I want to store it on a GCS bucket.

Currently, I am able to push all the files/ folder (except the .git folder which contains history) to GCS sibling using git-annex export command. Is there a way to push the version history to GCS bucket and get insight from there instead of storing everything locally?

  1. Also, most of the commands I am using are the git-annex commands. Is there a Datalad API present for the same operations?

Any insights will be helpful.

1

There are 1 best solutions below

2
Chris32 On

As I understand that a Datalad history file is a text file, I can say for your third question that you can consume a txt file from Cloud Storage without the need of downloading it locally. You can achieve this by accessing the file using the storage URL, i.e.: "https://storage.cloud.google.com/{MyBucket}/{MytxtFile}.txt"

From here you will be able to get the text content dynamically, i.e. making a GET request will return the file content.

Now, it would be useful if you share with us an example of what do you want to achieve exactly, i.e what commands you are using. As per the Datalog get documentation it seems like it expects a local file and I'm not sure if you could make it work without a local file (through a curl)

A possible midway solution between using Cloud Storage or local files could be using Cloud Storage FUSE so you can mount your Cloud Storage buckets as file systems on Linux or macOS systems, you can manipulate and access your files locally and this changes will be reflected in the bucket.