Checksum of checksums of a local file downloaded from S3 does not match SHA-256 checksum of the remote file

90 Views Asked by At

Background:

I have a somewhat large file of around 2,5GB that is stored in AWS S3. I used SHA-256 as the checksum function when uploading this file:

Additional checksums

I then proceeded to use this official AWS user guide titled Checking object integrity. To be specific, I copied the validateExistingFileAgainstS3Checksum function from the "Using the AWS SDKs" section. I then proceeded to make a couple of changes:

  • Extracted some code to a new function called getPartBreak as per the suggestion of IDE.
  • Removed some System.out.print lines
  • Added my own logging while trying to make sense of what's going on
  • Refactored the error handling
  • I had to typecast the long returned by getPartBreak to int or the code wouldn't compile. I checked that this should not be an issue since the value is never large enough to matter.

The Code

Here's what my code looks like now:

package app.service.vendor.amazon;

import app.service.vendor.amazon.exception.ChecksumValidationException;
import io.netty.handler.codec.base64.Base64Encoder;
import jakarta.inject.Inject;
import jakarta.inject.Singleton;
import org.slf4j.Logger;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectAttributesRequest;
import software.amazon.awssdk.services.s3.model.GetObjectAttributesResponse;
import software.amazon.awssdk.services.s3.model.ObjectAttributes;
import software.amazon.awssdk.services.s3.model.ObjectPart;

import java.io.*;
import java.nio.channels.FileChannel;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;
import java.util.List;

import static software.amazon.awssdk.services.s3.internal.resource.S3ResourceType.BUCKET;

@Singleton
public class S3ChecksumValidator {

    private final S3Client client;

    private final Logger logger;

    @Inject
    public S3ChecksumValidator(S3Client client, Logger logger) {
        this.client = client;
        this.logger = logger;
    }

    public boolean validateMultipartUpload(File file, String bucket, String s3Key) throws ChecksumValidationException {

        int chunkSize = 5 * 1024 * 1024;

        GetObjectAttributesResponse
                objectAttributes = client.getObjectAttributes(GetObjectAttributesRequest.builder().bucket(bucket).key(s3Key)
                .objectAttributes(ObjectAttributes.OBJECT_PARTS, ObjectAttributes.CHECKSUM).build());

        try (InputStream localInput = new FileInputStream(file)) {
            MessageDigest sha256ChecksumOfChecksums = MessageDigest.getInstance("SHA-256");
            MessageDigest sha256Part = MessageDigest.getInstance("SHA-256");
            byte[] buffer = new byte[chunkSize];
            int currentPart = 0;

            long partBreak = getPartBreak(objectAttributes, currentPart);
            int totalRead = 0;
            int read = localInput.read(buffer);
            while (read != -1) {
                totalRead += read;
                if (totalRead >= partBreak) {
                    int difference = totalRead - (int) partBreak;
                    byte[] partChecksum;
                    if (totalRead != partBreak) {
                        sha256Part.update(buffer, 0, read - difference);
                        partChecksum = sha256Part.digest();
                        sha256ChecksumOfChecksums.update(partChecksum);
                        sha256Part.reset();
                        sha256Part.update(buffer, read - difference, difference);
                    } else {
                        sha256Part.update(buffer, 0, read);
                        partChecksum = sha256Part.digest();
                        sha256ChecksumOfChecksums.update(partChecksum);
                        sha256Part.reset();
                    }
                    String base64PartChecksum = Base64.getEncoder().encodeToString(partChecksum);
                    if (!base64PartChecksum.equals(objectAttributes.objectParts().parts().get(currentPart).checksumSHA256())) {
                        logger.info(String.format("Part checksum of local file does not match s3 file '%s'.", s3Key));
                        return false;
                    }
                    currentPart++;
                    if (currentPart < objectAttributes.objectParts().totalPartsCount()) {
                        partBreak += objectAttributes.objectParts().parts().get(currentPart - 1).size();
                    }
                } else {
                    sha256Part.update(buffer, 0, read);
                }
                read = localInput.read(buffer);
            }
            logger.info(String.format("local parts: %s , remote parts: %s", currentPart + 1, objectAttributes.objectParts().totalPartsCount()));

            if (currentPart != objectAttributes.objectParts().totalPartsCount()) {
                currentPart++;
                byte[] partChecksum = sha256Part.digest();
                sha256ChecksumOfChecksums.update(partChecksum);
                String base64PartChecksum = Base64.getEncoder().encodeToString(partChecksum);
            }

            String base64CalculatedChecksumOfChecksums = Base64.getEncoder().encodeToString(sha256ChecksumOfChecksums.digest());

            if (!base64CalculatedChecksumOfChecksums.equals(objectAttributes.checksum().checksumSHA256())) {
                logger.info(String.format("Checksum of checksums of local file does not match s3 file '%s'.", s3Key));
                logger.info(String.format("%s vs %s", base64CalculatedChecksumOfChecksums, objectAttributes.checksum().checksumSHA256()));
                return false;
            }
        }
        catch (IOException | NoSuchAlgorithmException e) {
            String msg = String.format("Could not read local checksum - %s", e.getMessage());
            throw new ChecksumValidationException(msg, e);
        }

        return true;
    }

    private static long getPartBreak(GetObjectAttributesResponse objectAttributes, int currentPart) throws ChecksumValidationException {

        if(objectAttributes.objectParts() == null) {
            String msg = "Not a multipart upload - object attributes -> object parts is null";
            throw new ChecksumValidationException(msg);
        }

        List<ObjectPart> parts = objectAttributes.objectParts().parts();

        if(parts.isEmpty()) {
            String msg = "File was uploaded without checksum algorithm - object attributes -> object parts is empty";
            throw new ChecksumValidationException(msg);
        }

        return parts.get(currentPart).size();
    }
}

The Problem

I first tested this code with another file that was slightly smaller at ~1.5GB and everything worked great. However, that is not the case for this slightly larger file. Instead of a successful validation the function returns false while I'm getting this as the output:

[2024-02-01 18:34:08,768]-[Execution worker] INFO  app.App - local parts: 125 , remote parts: 144                                                            
[2024-02-01 18:34:08,768]-[Execution worker] INFO  app.App - Checksum of checksums of local file does not match s3 file 'shared/database.mmdb'.
[2024-02-01 18:34:08,768]-[Execution worker] INFO  app.App - faceAGZGc36kITYRStsK5zEw+iBJTgttwRWbmnQC+jQ= vs j3L01d+7qyiJ4zYSadr0/+N+Q8IfYbpWM7JTYvXrIlw=  

It seems that the checksum validation of each part of the S3 object is successful, but the validator is missing the tail end of the local file for some reason (125 vs 144 parts) while the number of parts matches for the other file I tested with?

Any ideas as to what might be causing things to go wrong here would be appreciated. I have confirmed that the size of the file is exactly same locally and in S3.

1

There are 1 best solutions below

1
Quassnoi On BEST ANSWER

Your code does not seem to be checking if the list of object parts is truncated.

From the docs:

IsTruncated

Indicates whether the returned list of parts is truncated. A value of true indicates that the list was truncated. A list can be truncated if the number of parts exceeds the limit returned in the MaxParts element.

NextPartNumberMarker

When a list is truncated, this element specifies the last part in the list, as well as the value to use for the PartNumberMarker request parameter in a subsequent request.

You should call GetObjectAttributes or ListParts in a loop, passing the next part marker, until you get all the parts.