Can a paired hierarchy of data and metadata classes be annotated with type variables?

56 Views Asked by At

I'm working on typing an old code base that attempts to provide uniform access to data file formats, including versions of the same format. Generally, the data format is a straightforward binary block, and the difference is in the header, so we have a structure File(Header, Data).

There is a paired hierarchy, such that more recent versions of a particular format are subclasses of the previous versions of that format, as are the headers.

Here is an example hierarchy that I am trying to type annotate:

class MetaDataMixin:
    def __init__(self, metadata=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metadata = {}
        if metadata:
            self.metadata.update(metadata)

class DataFile:
    def __init__(self, header=None, data=None):
        self.header = header or self.header_class()

class HeaderV1:
    magic_number = b'HDR'
    format_version = 1

class DataFileV1(DataFile):
    header_class = HeaderV1

class HeaderV2(HeaderV1, MetaDataMixin):
    format_version = 2

class DataFileV2(DataFileV1):
    header_class = HeaderV2

It would be nice for type checkers to be able to recognize that DataFileV1().header has type HeaderV1 and DataFileV2().header has type HeaderV2. A critical component is that, for backwards compatibility, DataFileV1 must be both an instantiable class and a superclass of DataFileV2.

I tried making DataFile generic on a Header variable that would annotate both the header_class and header variables. Once made concrete with HeaderV1, there doesn't seem to be a way to override with HeaderV2. Here's my attempt:

import typing as ty

class Header:
    pass

HdrT = ty.TypeVar('HdrT', bound=Header)    

class MetaDataMixin:
    metadata: dict[str, str]

    def __init__(self, metadata=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metadata = {}
        if metadata:
            self.metadata.update(metadata)

class DataFile(ty.Generic[HdrT]):
    header: HdrT
    header_class: type[HdrT]
    
    def __init__(self, header: HdrT | None = None, data: ty.Any = None):
        self.header = header or self.header_class()

class HeaderV1(Header): 
    magic_number: bytes = b'HDR'
    format_version: int = 1

class DataFileV1(DataFile[HeaderV1]):
    header_class = HeaderV1

class HeaderV2(HeaderV1, MetaDataMixin):
    format_version = 2

class DataFileV2(DataFileV1, DataFile[HeaderV2]):
    header_class = HeaderV2

file1 = DataFileV1()
file2 = DataFileV2()

print(file2.header.format_version)
if ty.TYPE_CHECKING:
    # Shows HeaderV1, but I would like it to be HeaderV2
    reveal_type(file2.header)
else:
    # Acts like HeaderV2
    print(file2.header.metadata)

I can manually set header: HeaderV2 in DataFileV2, but I'd hoped to eliminate that additional boilerplate by using type variables.

Swapping the order of the superclasses in DataFileV2 breaks the method resolution order, so that's not an option. I had an additional thought of creating a HdrV1T = ty.TypeVar('HdrV1T', bound=HeaderV1) and making class DataFileV1(DataFile[HdrV1T]), but then the annotations of DataFileV1 become Any.


While my question is specifically about how to retrofit typing onto an old structure, I would also be interested in how someone would design an API like this now, with typing as a first-class consideration.

0

There are 0 best solutions below