Can a dataclass field format its value for the repr?

2k Views Asked by At

I have a Node class holding RGB data in both hex and HSV form. I'll be using this to sort colors in various ways and would prefer the HSV tuple to remain in float form for comparisons instead of converting from a string for every use. Is there a way to specify to the dataclass field that it should format the value in a specific way similar to default values with the default_factory, i.e. a repr_factory?

def RGB2HSV(r, g, b):
    '''Returns HSV values in the range H = [0, 360], S = [0, 100], V = [0, 100]'''
    r, g, b = r / 255, g / 255, b / 255
    maxRGB = max(r, g, b)
    minRGB = min(r, g, b)
    delta = maxRGB - minRGB

    V = maxRGB
    if V == 0:
        return 0, 0, V
    
    S = delta / V * 100
    if S == 0:
        return 0, S, V * 100
    
    if V == r:
        H = (g - b) / delta
    elif V == g:
        H = 2 + (b - r) / delta
    else:
        H = 4 + (r - g) / delta
    H *= 60
    if H < 0:
        H += 360
    
    return H, S, V * 100

@dataclass
class Node:
    r: int = field(repr=False)
    g: int = field(repr=False)
    b: int = field(repr=False)
    hex: tuple[int, int, int] = field(init=False)
    hsv: tuple[float, float, float] = field(init=False)

    def __post_init__(self):
        self.hex = self.r, self.g, self.b # Generating random r, g, b numbers
        self.hsv = RGB2HSV(self.hex) # Converts the r, g, b to a tuple of floats

While I'm working out the different sorts, I'm printing out the Nodes and seeing 10 unnecessary digits of a float is distracting. As far as I can think of, would I just be better off implementing my own __repr__ for the class instead of relying on the dataclass generated one?

The reason I'm looking at the __repr__ value is because it's automatically generated by the dataclass and can make distinguishing between nearly identical colors easier than just looking at the visual output. It'll be easier to find out what to change or do next if I know what the actual numbers a color are. A portion of the end of the output:

Node(hex=(238, 0, 0), hsv=(0.0, 100.0, 93.33333333333333))
Node(hex=(238, 17, 0), hsv=(4.285714285714286, 100.0, 93.33333333333333))
Node(hex=(238, 34, 0), hsv=(8.571428571428571, 100.0, 93.33333333333333))
Node(hex=(238, 51, 0), hsv=(12.857142857142858, 100.0, 93.33333333333333))
Node(hex=(255, 0, 0), hsv=(0.0, 100.0, 100.0))
Node(hex=(255, 17, 0), hsv=(4.0, 100.0, 100.0))
Node(hex=(255, 34, 0), hsv=(8.0, 100.0, 100.0))
Node(hex=(255, 51, 0), hsv=(12.0, 100.0, 100.0))

Basically, can a format be specified to a dataclass field, similar to how a function can be specified to default_factory, in order for the generated __repr__ to format the field for me so I don't have to write my own?

...
    hsv: tuple[float, float, float] = field(init=False, repr_factory=lambda x: "{:.3f"}.format(x) for x in self.hsv)
...
Node(hex=(238, 51, 0), hsv=(12.857, 100.000, 93.333))
2

There are 2 best solutions below

0
Jasmijn On BEST ANSWER

The dataclasses library currently does not support formatting fields like that. The code generated in the default __repr__ for each included field is always in the formf'field={self.field!r}'. You will have to write your own __repr__.

0
Harvey On

Here's a working proof-of-concept implementation based on Jasmijn's answer that we must write our own __repr__. I'm sure that I probably processed the fields in the worst way possible, but this is a start. Replace the awkward field name and value access with a better method.

from dataclasses import dataclass
import struct


@dataclass
class ElfHeader:
    """ELF Header 32-bit"""
    magic: bytes
    bitwidth: int
    endianess: int
    version: int
    osabi: int
    abi: int
    # padding: int
    filetype: int
    machine: int
    version2: int
    entry_address: int
    phoff: int
    shoff: int
    flags: int
    header_size: int
    ph_entry_size: int
    ph_num: int
    sh_entry_size: int
    sh_num: int
    sh_string_index: int

    @staticmethod
    def from_bytes(data_bytes):
        """Construct an instance from binary data."""
        # https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
        return ElfHeader(*struct.unpack('<4s5B7x2H5L6H', data_bytes[:52]))
        
    def __post_init__(self):
        self.reformat_hex_fields = {"machine", "entry_address"}

    def __repr__(self):
        """Just like the default __repr__ but supports reformatting some values."""
        def hexConvert(name, value):
            return hex(value) if name in self.reformat_hex_fields else f'{value!r}'
        
        fields = (
            f'{name}={hexConvert(name, value)}'
            for field in self.__dataclass_fields__.values() if field.repr
            # This just assigns shorter names to code to improve readability above.
            # It's like the new assignment operator.
            for name, value in ((field.name, self.__getattribute__(field.name)),)
            )
        return f'{self.__class__.__name__}({", ".join(fields)})'


data_header = b'\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00i\x00\x01\x00\x00\x00X\xb3\x00\x00\x0c\xe4\x1a\x00L\xe5\x1a\x00\x00\x00\x00\x004\x00 \x00\n\x00(\x00l\x00k\x00'
print(ElfHeader.from_bytes(data_header))

Output (Modified then Original):

ElfHeader(magic=b'\x7fELF', bitwidth=1, endianess=1, version=1, osabi=0, abi=0, filetype=2, machine=0x69, version2=1, entry_address=0xb358, phoff=1762316, shoff=1762636, flags=0, header_size=52, ph_entry_size=32, ph_num=10, sh_entry_size=40, sh_num=108, sh_string_index=107)
ElfHeader(magic=b'\x7fELF', bitwidth=1, endianess=1, version=1, osabi=0, abi=0, filetype=2, machine=105, version2=1, entry_address=45912, phoff=1762316, shoff=1762636, flags=0, header_size=52, ph_entry_size=32, ph_num=10, sh_entry_size=40, sh_num=108, sh_string_index=107)