Is double disk storage required in JFrog Artifactory for two files with the same MD5 value?

140 Views Asked by At

For instance, I uploaded FILE1 to Artifactory PATH1. Then, I upload FILE1 (the same file) to Artifacotry PATH2 (a different path).

Will these two shares the same disk storage as they have the same md5 value?

I hope Artifactory could treat them as the same file to save the disk storage. Something like only save the orginal file on disk and mantain a number of reference.

2

There are 2 best solutions below

0
Hanan On BEST ANSWER

Artifactory uses Checksum-Based Storage to store files based on their SHA1 checksum. So each file is stored only once in the filestore but may have several references in the database.

1
Bruno On

You can Deploy the file by checksum instead of uploading it fully a second time.

Something like this should work:

curl ${repository}/${newPath} -X PUT -H X-Checksum-Deploy:true -H X-Checksum-Sha1:${sha1checksum} -H Content-Length:0

You can either add the credentials to curl manually (e.g. -u ...:...), or get the JFrog CLI do it for you if you're using it (jf rt curl ... should manage that for you).

This only seems to work for SHA-1 or SHA-256, but I'm assuming if you're able to compute the MD5 checksum of your initial file, you should be able to compute the SHA-1/SHA-256 quite easily too.

I'm not sure how it works if you actually upload the file a second time, but I would assume it saves space internally by comparing their checksums indeed. (This may also depend on some checksum policy settings.)