Why does the Gradle cache contain dependencies multiple times?

441 Views Asked by At

We are uploading our Gradle caches as a ZIP to S3 via Gitlab Pipeline Jobs. Unpacking one of those ZIP files (which just contains the .gradle folder) has shown that a lot of dependencies are contained multiple times with the exact same version: 1x in jars-9 and 1x in modules-2: enter image description here

Why is this happening and how to avoid this? Our CI caches are 20-30% bigger than they need to be because of this, especially for big dependencies like the Kotlin compiler: enter image description here

The size differences between the JARs can be attributed to JAR file compression being on or off, they are identical content-wise.

The official explanation on how the .gradle folder is structured did not help.

1

There are 1 best solutions below

8
VonC On BEST ANSWER

Gradle's dependency cache is designed for efficiency and reliability. It includes two primary storage types:

  1. A file-based store of downloaded artifacts, including binaries such as jars and raw downloaded metadata like POM and Ivy files. The storage path for a downloaded artifact includes the SHA1 checksum, which means that two artifacts with the same name, but different content can be easily cached ($GRADLE_USER_HOME/caches).

  2. A binary store of resolved module metadata, including the results of resolving dynamic versions, module descriptors, and artifacts.

The jars-* and modules-2 directories you are seeing in the Gradle cache pertain to these two different types of storage.

The jars-* directory likely refers to the file-based store of downloaded artifacts. Each artifact stored in this directory includes the SHA1 checksum in its storage path. This design allows Gradle to cache two artifacts with the same name but different content, and it also ensures that the same artifact is not downloaded multiple times if it is already present in the cache with the same SHA1 checksum.

The modules-2 directory, on the other hand, likely refers to the binary store of resolved module metadata. This directory keeps a record of various aspects of dependency resolution in binary format, including the results of resolving dynamic versions to concrete versions, the resolved module metadata for a particular module, and the resolved artifact metadata for a particular artifact.

These two directories are distinct because they serve different purposes and store different types of data.

Gradle architecture

As discussed with Vampire in the comments, I added:

when Gradle resolves a dependency, it downloads the dependency's JAR file and stores it in the modules-2 directory.

If a classpath transformation is applied to the dependency, the result of the transformation (which could be identical to the original if an identity transformation is used) is stored in a jars-* directory. This allows Gradle to cache the result of the transformation, avoiding the need to perform the transformation again if the same dependency is used with the same transformation in the future.

And that would would explain the identical content.


In terms of reducing the size of your CI caches, the multiple instances of the same dependency in different cache directories might not be avoidable given the way Gradle's cache works.

However, you might be able to configure your CI/CD pipeline to only cache the necessary directories or files, or use techniques like incremental builds to minimize the amount of data that needs to be cached.
You might also consider cleaning the Gradle cache manually or programmatically on a regular basis to remove unused or obsolete files.