I have a project that uses several databases, to avoid versioning huge files in git, I used DVC to manage it on gdrive.
I followed the following step by step on DVC
Start DVC (dvc init)
dvc add #dataset zip#
dvc remote add --default #drive_name# gdrive://#Folder ID#
dvc push
for each dataset. But when I try to upload such a data set individually through the
dvc pull --remote #drive_name#
it simply downloads all the files to my machine and not just the ones I specified, I've already run a dvc remote list and even seen in gdrive that the files are separated. Why can't I get them individually?
If you need to store certain parts of the DVC project in some remote, and other part in a different remote storage there are two way to do this (or a mix of those).
remote:field in the.dvcfiles ordvc.yaml. For example:or:
In this case, you don't have to use
--remotefordvc pullordvc push- DVC will know automatically which remote to use for each datasets or models or output in general.--remote. But in this case (and that's where probably the issue is in your case), you would need to always carefully usedvc pushto avoid by mistake pushing all data to a default remote storage. Always dodvc push --remote <dataset>. Or even don't use--default, don't even specify a default remote in this case. As you can see this can be a bit tedious tbh.In both options, I would avoid creating a default remote (unless you have some objects that you want to always go to some default). Also, yes, you still need to use
dvc remote add ...commands to create all these named remotes.