I have a .gz file which I'm trying to read using GZIPInputStream, fixed number of bytes at a time.
I am using GZIPInputStream.read(byte[] buf, int off, int len) (ref: doc) to do this.
But the method is reading only 397 bytes(return value of the above method) when the length supplied is 490.
It didn't throw any exception.
I'm wondering in which cases will the return value of the method be less than the supplied length parameter.
What I understood from this question is that some of bytes we want to read might be in the next chunk(which is not uncompressed yet) and we might want to read again (not sure whether this is correct interpretation though). But the documentation of the GZIPInputStream.read(...) doesn't talk about any such chunking.
I uncompressed the .gz file manually and tried reading the uncompressed file using RandomAccessFile.readFully(byte[] b) (ref: doc), which reads all 490 bytes properly.
I'm expecting the GZIPInputStream.read(...) method also to read all 490 bytes properly.
I'm going to answer this question in the reverse order that you asked it.
While you may expect this, the javadoc for
read(buf, off, len)does not state that that will happen. What it actually says is this:It does NOT state ANYWHERE that
readwill return as many (available) bytes as will fit.So, basically, it is your expectation that is wrong. You shouldn't write code that assumes that
readwill "properly" read all 490 bytes in one call.The code is complicated. It will be reading from the underlying stream in blocks. Then, when it inflates a stream, it may turn turning a small number of compressed bytes into a large number of uncompressed bytes. So, the
InflaterInputStreamlayer has to deal with cases where advancing the input by one byte results in ... more bytes that will fit in the remainder of the user's buffer.So (to my mind) it is unsurprising that they would take the simple (and efficient) approach of not entirely fill up the buffer, leaving the unconsumed (compressed) bytes for the next
readcall1.And then there are mysterious "GZIP trailer members" which are dealt with at the
GZIPInputStreamlayer.Like I said ... it is complicated.
In short, there could be a number of cases where you may not get a number of bytes equal to
len. But it won't help you to know all of the details. Especially since they could depend on what version of Java you use!What you need to know is that is it incorrect for your code to assume that it will get all of the bytes available in a single
readcall. Even if the byte buffer that you provide is big enough. You already have evidence that that assumption is incorrect.1 - Though it does have to deal with that scenario in the pathological case where you call
readwithlen == 1.No. The use-case is fine. The problem is that your code is using the
GZIPInputStream.readmethod incorrectly.For what it is worth, this
readmethod is behaving roughly the same way as areadon aSocketInputStreamwould behave if the "other end" of the socket was writing sporadically. It gives you what is available now.