I am learning Haskell but having some difficulty understanding how exactly lazy ByteStrings work. Hackage says that "Lazy ByteStrings use a lazy list of strict chunks which makes it suitable for I/O streaming tasks". In contrast, a strict list is stored as one large array.
What are these "chunks" in lazy byteStrings? How does your compiler know just how large a chunk should be? Further, I understand that the idea behind a lazy list is that you don't have to store the entire thing, which thus allows for infinite lists and all of that. But how is this storage implemented? Does each chunk have a pointer to a next chunk?
Many thanks in advance for the help :)
You can find the definition of the lazy
ByteStringhere:so
Chunkis one data-constructor - the first part is a strict (!) strict (S.)ByteStringand then some moreChunksorEmptyvia the second recursive (lazy)ByteStringpart.Note that the second part does not have the
(!)there - so this can be a GHC thunk (the lazy stuff in Haskell) that will only be forced when you need it (for example pattern-match on it).That means a lazy
ByteStringis eitherEmptyor you get a strict (you can think of this as already loaded if you want) part or chunk of the complete string with a lazy remaining/rest/tailByteString.As about the size that depends on the code that is generating this lazy bytestring - the compiler does not come into this.
You can see this for
hGetContents:where
defaultChunkSizeis defined to be32 * 1024 - 2 * sizeOf (undefined :: Int)- so a bit less than 32kBAnd yes the rest (snd. argument to
Chunk) can be seen as a pointer to the nextChunkorEmpty(just like with a normal list).