Skip to content

Instantly share code, notes, and snippets.

@mementum
Last active August 15, 2023 21:01
Show Gist options
  • Save mementum/8c1ac92feeb4f0946716c7130ede0176 to your computer and use it in GitHub Desktop.
Save mementum/8c1ac92feeb4f0946716c7130ede0176 to your computer and use it in GitHub Desktop.
PEP-07xx draft

PEP: 7xx Title: Dataclasses - Annotated support for field as metainformation Author: Sponsor: Discussions-To: https://discuss.python.org/t/dataclasses-make-use-of-annotated/30028/22 Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Jul-2023 Python-Version: 3.12 (draft) Post-History: 23-Jul-2023

Abstract

This PEP proposes a new syntax for the dataclasses facility, by using Annotated context-specific metadata for an annotation, to add per-field information, i.e: a "field-annotated" syntax.

It can be seen as an update to the current "field-default-value" syntax, in which the per-field information is conveyed as the default value of a field, creating a type mismatch with the type hint in the annotation and sometimes adding a default value, when the fiel has none.

Motivation

Back when dataclasses was introduced, Annotated did not yet exist. The _"field-default-syntax"_ foresaw using the default value as an optimal way to convey extra per-field information, to then manipulate the default value. Example:

from dataclasses import dataclass, field

@dataclass
class A:
    # attributes with no dataclass meta-information
    a: int  # no default value
    b: int = 5  # default value, type matches type hint

    # attributes with meta-information
    c: int = field(init=False)  # no default, but "field" is the default
    d: int = field(init=False, default=5)  # default in "field"

c and d need per-field because both are excluded from the generated __init__. Considerations:

  1. default value is a field (actually Field) instead of an int
  2. Attributes which do not have a default value but which need per-field information, do still get a default value. This is dealt with by the dataclasses machinery by deleting the attribute at the end of class processing.
  3. Type checkers need to handle dataclasses as a special case and look into the Field to ensure proper type checking.

The proposed "field-annotated" syntax addresses the issues laid out above for c and d. Example:

from dataclasses import dataclass, field
from typing import Annotated

@dataclass
class A:
    # attributes with no dataclass meta-information
    a: int  # no default value
    b: int = 5  # default value, type matches type hint

    # attributes with meta-information
    c: Annotated[int, field(init=False)]  # no default
    d: Annotated[int, field(init=False)] = 5  # default specified as usual

With the new syntax:

  • c which does not have a default value, doesn't get one assigned, which is consistent, unlike in the case of the *"field-default-value"*_ syntax.

  • d which does have a default value, gets one and has a type which matches the type hint in the annotation (first argument to Annotated)

    This makes it also straightforward for type checkers to handle the c and d cases without considering dataclasses a special case.

Extra-Motivation

As contributed by users during the discussion, using Annotated to convey extra information, in similar ways to this proposal is becoming a thing amongst the Python community. Which is to be expected, given that Annotated was not created simply for the sake of it.

Examples:

  • Pydantic:

    class Cat(BaseModel):
        name: Annotated[str, Field(title="Name")] = "unknown"
    
  • FastAPI:

    @app.get('/{id}')
    def get_cat(id: Annotated[str, Path()] = "Garfield"):
        pass
    
  • Typer:

    def main(name: Annotated[str, typer.Argument()]):
       print(f"Hello {name}")
    

Specification

When the "field-annotated" syntax is applied and Annotated is how a field is typed, the following applies:

  • Annotated shall be detected whether tha annotation has been stringified or not.

    See the implementation section for more details on this.

  • The args after the type hint will be iterated to look for a Field instance:

    @dataclass
    class A:
        # Usually the Field instance will be the only meta-information
        a: Annotated[int, field(default=5)]
    
        # the ``Field`` can be the 3rd or later  parameter
        b: Annotated[int, {"my": "info"}, field(default=5)]
    
  • If one is found, it will be used as the Field instance for the attribute, rather than instantiating a new one, in order to retain the information the user has put into that Field

  • The type hint, first argument to Annotated will be recorded as the Field.type

  • Once a Field instance has been found, further meta-information args will be ignored and not examined, which means any futher Field instance will be ignored:

    @dataclass
    class A:
        # Second Field intance is ignored
        a: Annotated[int, field(default=5), field=(init=False)]
    
  • If the Field instance has a default value for the attribute and a default value has also been assigned to the attribute a ValueError exception will be raised:

    @dataclass
    class A:
        # This raises a ValueError exception
        a: Annotated[int, field(init=False, default=5)] = 7
    
  • If the Field instance has a default_factory value for the attribute and a default value has also been assigned to the attribute, the standard behavior of the field method will be retained and a ValueError exception will be raised exception will be raised:

    @dataclass
    class A:
        # This raises a ValueError exception
        a: Annotated[int, field(init=False, default_factory=int)] = 7
    
  • If there is a Field instance and another Field instance is assigned as the default value, a ValueError exception will be raised:

    @dataclass
    class A:
        # This raises a ValueError exception
        a: Annotated[int, field(init=False)] = field(default=5)
    

Backwards Compatibility

This PEP is backwards compatible. The "field-default-value" syntax is still valid.

This PEP introduces also no compabitility issues with type checkers:

  • The special cases introduced to handle the "field-default-value" syntax in dataclasses are not affected.
  • Type checking the "field-annotated" syntax is 100% standard and done as for Annotated annotations in non-dataclass classes.

Performance Impact

The dataclasses implementors were concerned with two kind of performance hits:

  • Importing the heavyweight typing module if not imported by the user. Therefore a check is made to see if the user has imported it, and only in this case non-stringified annotations are checked.

  • Having to eval stringfied annotations to detect dataclasses.InitVar and typing.ClassVar. The latter requiring typing be already imported.

    This concern was addressed with regular expression matching to avoid evaluation and performance oriented check of the detected annotation to recognize typing.ClassVar and dataclasses.InitVar. As the source code reveals, in the comments (and GitHub issue discussion), not all possible use cases are covered, but it was considered enough.

The implemenation of this PEP:

  • Will re-use and re-apply the orignal dataclasses technique, including the regular expression matching, to identify tpying.ClassVar and dataclasses.InitVar, to identify typing.Annotated, whether it is in stringified form or not.

  • Because of it, not all cases will be identified. Quoting an example from the comments in the dataclasses implementation:

    # With string annotations, cv0 will be detected as a ClassVar:
    #   CV = ClassVar
    #   @dataclass
    #   class C0:
    #     cv0: CV
    
    # But in this example cv1 will not be detected as a ClassVar:
    #   @dataclass
    #   class C1:
    #     CV = ClassVar
    #     cv1: CV
    
  • In order to handle the per-field information specified as context-specific metadata with Annotated, two options are possible:

    1. Using eval and then iterate over the results until a Field instance is found.

      This will of course not only eval the Field instance but also the type hint and other bits of metadata

    2. Appyling a regular expression (like it is onde to identify ClassVar and InitVar to isolate the Field definition and only apply eval to it.

  • Because the user has specified per-fiel information in the eval'ed Field instance, this is kept for later processing, rather than creating a blank one, which would add a performance hit.

How to Teach This

  1. Documentation:

The documentation shall be updated to include the field-annotated syntax as the primary and preferred way to provide meta-information for a field.

Ideally the field-default-value syntax will be simply mentioned once, indicating it is no longer the preferred way.

  1. Type Checkers

    This PEP proposes that type checkers signal the field-default-value with a warning and recommend using the field-annotated syntax

Reference Implementation

An initial proof-of-concept implementation, which does not cover the entire PEP yet, is available in:

Rejected Ideas

Empty (still in draft)

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment