The Python jsons library produces duplicated serialized output with underscores prepended to the field names. Can this be prevented?

77 Views Asked by At

The original title of this question was going to be

Why does the Python jsons library produce serialized output with underscores prepended to the field names?

However while writing the question I realized why this is happening. Now the question has become more about whether or not this can be prevented and if so how to go about doing so.

Here are some examples to demonstrate the problem:

Example 1: Dictionary serialized as expected

with open('tmp.txt', 'w') as ofile:
    d = {'key': 'value'}
    ofile.write(jsons.dumps(d))

# produces:
# {"key": "value"}

Example 2: Class (object) serialized as expected

class Test:
    def __init__(self, value:str) -> None:
        self.value = value

with open('tmp.txt', 'w') as ofile:
    test = Test('value')
    ofile.write(jsons.dumps(test))

# produces:
# {"value": "value"}

Example 3: As soon as the concept of properties are introduced, we begin to see duplicated output

class TestProperty:
    def __init__(self, value:str) -> None:
        self.value = value

    @property
    def value(self) -> str:
        return self._value

    @value.setter
    def value(self, the_value) -> None:
        self._value = the_value

with open('tmp.txt', 'w') as ofile:
    test = TestProperty('value')
    ofile.write(jsons.dumps(test))

# produces:
# {"_value": "value", "value": "value"}

Example 4: If we swap out the properties concept for getter/setter functions, the problem goes away

class TestFunction:
    def __init__(self, value:str) -> None:
        self.value = value

    def get_value(self) -> str:
        return self.value

    def set_value(self, the_value) -> None:
        self.value = the_value

with open('tmp.txt', 'w') as ofile:
    test = TestFunction('value')
    ofile.write(jsons.dumps(test))

# produces:
# {"value": "value"}

What problems are created?

As Example 3 shows, if properties are introduced, the data is effectively duplicated.

  • I don't fully understand why this is happening
  • I thought properties should work like functions, as in Example 4, but they don't and cause a field value to be serialized
  • I understand why _value is being serialized in Example 3, it is an attribute of the class, it is the field name where we are storing the actual data values

There are three potential issues with this:

  • Harder to inspect messages when debugging. The data is duplicated making the JSON message harder to read and using up screen space (which might become important if the message to be inspected is large)
  • Slower message transfer rates (sending twice as much data over a network)
  • Wasted disk space (might become important if storing large quantities of data, because available space to store records is reduced by a factor of 2)

Why is the data duplicated in Example 3 and is there a solution to this?

1

There are 1 best solutions below

0
FreelanceConsultant On

strip_privates=True removes attributes starting with an underscore.

jsons.dumps(object, strip_privates=True)