Trying to use BufferedInputStream and Base64 to Encode a large file in Java

292 Views Asked by At

I am new to the Java I/O so please help.

I am trying to process a large file(e.g. a pdf file of 50mb) using the apache commons library. At first I try:

byte[] bytes = FileUtils.readFileToByteArray(file);
String encodeBase64String = Base64.encodeBase64String(bytes);
byte[] decoded = Base64.decodeBase64(encodeBase64String);

But knowing that the FileUtils.readFileToByteArray in org.apache.commons.io will load the whole file into memory, I try to use BufferedInputStream to read the file piece by piece:

BufferedInputStream bis = new BufferedInputStream(inputStream);
StringBuilder pdfStringBuilder = new StringBuilder();
int byteArraySize = 10;
byte[] tempByteArray = new byte[byteArraySize];
while (bis.available() > 0) {
                if (bis.available() < byteArraySize) { // reaching the end of file
                    tempByteArray = new byte[bis.available()];
                }
                int len = Math.min(bis.available(), byteArraySize);
                read = bis.read(tempByteArray, 0, len);

                if (read != -1) {
                    pdfStringBuilder.append(Base64.encodeBase64String(tempByteArray));
                } else {
                    System.err.println("End of file reached.");
                }
            }
byte[] bytes = Base64.decodeBase64(pdfStringBuilder.toString());

However, the 2 decoded bytes array don't look quite the same... ... In fact, the only give 10 bytes, which is my temp array size... ...

Can anyone please help:

  • what am I doing it wrong to read the file piece by piece?
  • why is the decoded byte array only returns 10 bytes in the 2nd solution?

Thanks in advance:)

1

There are 1 best solutions below

0
Evan_HZY On

After some digging, it turns out that the byte array's size has to be multiple of 3 in order to avoid padding. After using a temp array size with multiple of 3, the program is able to go through.

I simply change

int byteArraySize = 10;

to be

int byteArraySize = 1024 * 3;