Background:
I have a somewhat large file of around 2,5GB that is stored in AWS S3. I used SHA-256 as the checksum function when uploading this file:
I then proceeded to use this official AWS user guide titled Checking object integrity. To be specific, I copied the validateExistingFileAgainstS3Checksum function from the "Using the AWS SDKs" section. I then proceeded to make a couple of changes:
- Extracted some code to a new function called
getPartBreakas per the suggestion of IDE. - Removed some
System.out.printlines - Added my own logging while trying to make sense of what's going on
- Refactored the error handling
- I had to typecast the long returned by
getPartBreakto int or the code wouldn't compile. I checked that this should not be an issue since the value is never large enough to matter.
The Code
Here's what my code looks like now:
package app.service.vendor.amazon;
import app.service.vendor.amazon.exception.ChecksumValidationException;
import io.netty.handler.codec.base64.Base64Encoder;
import jakarta.inject.Inject;
import jakarta.inject.Singleton;
import org.slf4j.Logger;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectAttributesRequest;
import software.amazon.awssdk.services.s3.model.GetObjectAttributesResponse;
import software.amazon.awssdk.services.s3.model.ObjectAttributes;
import software.amazon.awssdk.services.s3.model.ObjectPart;
import java.io.*;
import java.nio.channels.FileChannel;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Base64;
import java.util.List;
import static software.amazon.awssdk.services.s3.internal.resource.S3ResourceType.BUCKET;
@Singleton
public class S3ChecksumValidator {
private final S3Client client;
private final Logger logger;
@Inject
public S3ChecksumValidator(S3Client client, Logger logger) {
this.client = client;
this.logger = logger;
}
public boolean validateMultipartUpload(File file, String bucket, String s3Key) throws ChecksumValidationException {
int chunkSize = 5 * 1024 * 1024;
GetObjectAttributesResponse
objectAttributes = client.getObjectAttributes(GetObjectAttributesRequest.builder().bucket(bucket).key(s3Key)
.objectAttributes(ObjectAttributes.OBJECT_PARTS, ObjectAttributes.CHECKSUM).build());
try (InputStream localInput = new FileInputStream(file)) {
MessageDigest sha256ChecksumOfChecksums = MessageDigest.getInstance("SHA-256");
MessageDigest sha256Part = MessageDigest.getInstance("SHA-256");
byte[] buffer = new byte[chunkSize];
int currentPart = 0;
long partBreak = getPartBreak(objectAttributes, currentPart);
int totalRead = 0;
int read = localInput.read(buffer);
while (read != -1) {
totalRead += read;
if (totalRead >= partBreak) {
int difference = totalRead - (int) partBreak;
byte[] partChecksum;
if (totalRead != partBreak) {
sha256Part.update(buffer, 0, read - difference);
partChecksum = sha256Part.digest();
sha256ChecksumOfChecksums.update(partChecksum);
sha256Part.reset();
sha256Part.update(buffer, read - difference, difference);
} else {
sha256Part.update(buffer, 0, read);
partChecksum = sha256Part.digest();
sha256ChecksumOfChecksums.update(partChecksum);
sha256Part.reset();
}
String base64PartChecksum = Base64.getEncoder().encodeToString(partChecksum);
if (!base64PartChecksum.equals(objectAttributes.objectParts().parts().get(currentPart).checksumSHA256())) {
logger.info(String.format("Part checksum of local file does not match s3 file '%s'.", s3Key));
return false;
}
currentPart++;
if (currentPart < objectAttributes.objectParts().totalPartsCount()) {
partBreak += objectAttributes.objectParts().parts().get(currentPart - 1).size();
}
} else {
sha256Part.update(buffer, 0, read);
}
read = localInput.read(buffer);
}
logger.info(String.format("local parts: %s , remote parts: %s", currentPart + 1, objectAttributes.objectParts().totalPartsCount()));
if (currentPart != objectAttributes.objectParts().totalPartsCount()) {
currentPart++;
byte[] partChecksum = sha256Part.digest();
sha256ChecksumOfChecksums.update(partChecksum);
String base64PartChecksum = Base64.getEncoder().encodeToString(partChecksum);
}
String base64CalculatedChecksumOfChecksums = Base64.getEncoder().encodeToString(sha256ChecksumOfChecksums.digest());
if (!base64CalculatedChecksumOfChecksums.equals(objectAttributes.checksum().checksumSHA256())) {
logger.info(String.format("Checksum of checksums of local file does not match s3 file '%s'.", s3Key));
logger.info(String.format("%s vs %s", base64CalculatedChecksumOfChecksums, objectAttributes.checksum().checksumSHA256()));
return false;
}
}
catch (IOException | NoSuchAlgorithmException e) {
String msg = String.format("Could not read local checksum - %s", e.getMessage());
throw new ChecksumValidationException(msg, e);
}
return true;
}
private static long getPartBreak(GetObjectAttributesResponse objectAttributes, int currentPart) throws ChecksumValidationException {
if(objectAttributes.objectParts() == null) {
String msg = "Not a multipart upload - object attributes -> object parts is null";
throw new ChecksumValidationException(msg);
}
List<ObjectPart> parts = objectAttributes.objectParts().parts();
if(parts.isEmpty()) {
String msg = "File was uploaded without checksum algorithm - object attributes -> object parts is empty";
throw new ChecksumValidationException(msg);
}
return parts.get(currentPart).size();
}
}
The Problem
I first tested this code with another file that was slightly smaller at ~1.5GB and everything worked great. However, that is not the case for this slightly larger file. Instead of a successful validation the function returns false while I'm getting this as the output:
[2024-02-01 18:34:08,768]-[Execution worker] INFO app.App - local parts: 125 , remote parts: 144
[2024-02-01 18:34:08,768]-[Execution worker] INFO app.App - Checksum of checksums of local file does not match s3 file 'shared/database.mmdb'.
[2024-02-01 18:34:08,768]-[Execution worker] INFO app.App - faceAGZGc36kITYRStsK5zEw+iBJTgttwRWbmnQC+jQ= vs j3L01d+7qyiJ4zYSadr0/+N+Q8IfYbpWM7JTYvXrIlw=
It seems that the checksum validation of each part of the S3 object is successful, but the validator is missing the tail end of the local file for some reason (125 vs 144 parts) while the number of parts matches for the other file I tested with?
Any ideas as to what might be causing things to go wrong here would be appreciated. I have confirmed that the size of the file is exactly same locally and in S3.

Your code does not seem to be checking if the list of object parts is truncated.
From the docs:
You should call
GetObjectAttributesorListPartsin a loop, passing the next part marker, until you get all the parts.