I am trying to concatenate the bytes of multiple Numpy arrays into a single bytearray to send it in an HTTP post request.
The most efficient way of doing this, that I can think of, is to create a sufficiently large bytearray object and then write into it the bytes from all the numpy arrays contiguously.
The code will look something like this:
list_arr = [np.array([1, 2, 3]), np.array([4, 5, 6])]
total_nb_bytes = sum(a.nbytes for a in list_arr)
cb = bytearray(total_nb_bytes)
# Too Lazy Didn't do: generate list of delimiters and information to decode the concatenated bytes array
# concatenate the bytes
for arr in list_arr:
_bytes = arr.tobytes()
cb.extend(_bytes)
The method tobytes() isn't a zero-copy method. It will copy the raw data of the numpy array into a bytes object.
In python, buffers allow access to inner raw data value (this is called protocol buffer at the C level) Python documentation; numpy had this possibility in numpy1.13, the method was called getbuffer() link. Yet, this method is deprecated!
What is the right way of doing this?
You can make a numpy-compatible buffer out of your message
bytearrayand write to that efficiently usingnp.concatenate'soutargument.And sure enough,
This method has the implication that your output is all the same format. To fix that, view your original arrays as
np.uint8:This way, you don't need to compute
total_sizeeither, since you've already computed the number of bytes.This approach is likely more efficient than looping through the list of arrays. You were right that the buffer protocol is your ticket to a solution. You can create an array object wrapped around the memory of any object supporting the buffer protocol using the low level
np.ndarrayconstructor. From there, you can use all the usual numpy functions to interact with the buffer.