Python data persistence

Question

Python data persistence

472 Views Asked by David Zheng At 17 January 2016 at 16:18

Whenever a Python object needs to be stored or sent over a network, it is first serialized. I guess the reason is that the storage and network transfer are both based on bits. I have a stupid question, which is more like a computer science foundation question than a python question. What kind of format do python objects take when they are in cache? Shouldn't they represent themselves as bits? If that's the case, why not just use those bits to store or send the object, and why bother with serialization?

Original Q&A

There are 2 best solutions below

Aprillion On 17 January 2016 at 17:22

The key point for I/O is achieving interoperability, e.g. the JSON you send over network might need to get transferred over HTTP protocol and then parsed by JavaScript. And the data you store on disk might need to be readable the next time you run Python (different runtime environment, memory allocations, ...).

But for code execution, you usually want to achieve higher performance than what would be possible using interoperable formats, e.g. using memory location addresses to access object methods, dict items, ... Or optimizing for processor cache as much as possible.

For details how exactly python is implemented, you can have a look at one of the interpreter implementations though.

**arainone** · Accepted Answer · 2016-01-17T16:34:54.767000

Bit Representation

The same object can have different representations in Bits on different machines:

Think endianness (byte-order)
and architecture (32 bits, 64 bites)

So an object representation in Bits on the sender machine could mean nothing, (or worse could mean something else) when received on the receiver.

Take an simple integer, 1025, as an illustration of the problem:

On Big Endian machine the Bits representation is:
- binary: 00000000 00000000 00000100 00000001
- hexadecimal: 0x00000401
while on a Little Endian machine:
- binary: 00000001 00000100 00000000 00000000
- hexadecimal 0x01040000

That's why to understand each other, 2 machines have to agree on a convention, a protocol. For the IP protocol, the convention is to use the network byte order (big-endian) for example.

Python data persistence

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in DATA-SERIALIZATION

Trending Questions

Popular # Hahtags

Popular Questions