PEP: 7xx Title: Dataclasses - Annotated support for field as metainformation Author: Sponsor: Discussions-To: https://discuss.python.org/t/dataclasses-make-use-of-annotated/30028/22 Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Jul-2023 Python-Version: 3.12 (draft) Post-History: 23-Jul-2023
This PEP proposes a new syntax for the dataclasses facility, by using
Annotated context-specific metadata for an annotation, to add per-field
information, i.e: a "field-annotated" syntax.
It can be seen as an update to the current "field-default-value" syntax, in
which the per-field information is conveyed as the default value of a
field, creating a type mismatch with the type hint in the annotation and
sometimes adding a default value, when the fiel has none.
Back when dataclasses was introduced, Annotated did not yet exist. The
_"field-default-syntax"_ foresaw using the default value as an optimal way
to convey extra per-field information, to then manipulate the default
value. Example:
from dataclasses import dataclass, field
@dataclass
class A:
# attributes with no dataclass meta-information
a: int # no default value
b: int = 5 # default value, type matches type hint
# attributes with meta-information
c: int = field(init=False) # no default, but "field" is the default
d: int = field(init=False, default=5) # default in "field"
c and d need per-field because both are excluded from the generated
__init__. Considerations:
defaultvalue is afield(actuallyField) instead of anint- Attributes which do not have a
defaultvalue but which need per-field information, do still get adefaultvalue. This is dealt with by thedataclassesmachinery by deleting the attribute at the end of class processing.- Type checkers need to handle
dataclassesas a special case and look into theFieldto ensure proper type checking.
The proposed "field-annotated" syntax addresses the issues laid out above for
c and d. Example:
from dataclasses import dataclass, field
from typing import Annotated
@dataclass
class A:
# attributes with no dataclass meta-information
a: int # no default value
b: int = 5 # default value, type matches type hint
# attributes with meta-information
c: Annotated[int, field(init=False)] # no default
d: Annotated[int, field(init=False)] = 5 # default specified as usual
With the new syntax:
cwhich does not have adefaultvalue, doesn't get one assigned, which is consistent, unlike in the case of the *"field-default-value"*_ syntax.
dwhich does have adefaultvalue, gets one and has a type which matches the type hint in the annotation (first argument toAnnotated)This makes it also straightforward for type checkers to handle the
canddcases without consideringdataclassesa special case.
As contributed by users during the discussion, using Annotated to convey
extra information, in similar ways to this proposal is becoming a thing amongst
the Python community. Which is to be expected, given that Annotated was not
created simply for the sake of it.
Examples:
Pydantic:
class Cat(BaseModel): name: Annotated[str, Field(title="Name")] = "unknown"FastAPI:
@app.get('/{id}') def get_cat(id: Annotated[str, Path()] = "Garfield"): passTyper:
def main(name: Annotated[str, typer.Argument()]): print(f"Hello {name}")
When the "field-annotated" syntax is applied and Annotated is how a field
is typed, the following applies:
Annotatedshall be detected whether tha annotation has been stringified or not.See the implementation section for more details on this.
The args after the type hint will be iterated to look for a
Fieldinstance:@dataclass class A: # Usually the Field instance will be the only meta-information a: Annotated[int, field(default=5)] # the ``Field`` can be the 3rd or later parameter b: Annotated[int, {"my": "info"}, field(default=5)]If one is found, it will be used as the
Fieldinstance for the attribute, rather than instantiating a new one, in order to retain the information the user has put into thatFieldThe type hint, first argument to
Annotatedwill be recorded as theField.typeOnce a
Fieldinstance has been found, further meta-information args will be ignored and not examined, which means any futherFieldinstance will be ignored:@dataclass class A: # Second Field intance is ignored a: Annotated[int, field(default=5), field=(init=False)]If the
Fieldinstance has adefaultvalue for the attribute and adefaultvalue has also been assigned to the attribute aValueErrorexception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False, default=5)] = 7If the
Fieldinstance has adefault_factoryvalue for the attribute and adefaultvalue has also been assigned to the attribute, the standard behavior of thefieldmethod will be retained and aValueErrorexception will be raised exception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False, default_factory=int)] = 7If there is a
Fieldinstance and anotherFieldinstance is assigned as thedefaultvalue, aValueErrorexception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False)] = field(default=5)
This PEP is backwards compatible. The "field-default-value" syntax is still valid.
This PEP introduces also no compabitility issues with type checkers:
- The special cases introduced to handle the "field-default-value" syntax in
dataclassesare not affected.- Type checking the "field-annotated" syntax is 100% standard and done as for
Annotatedannotations in non-dataclass classes.
The dataclasses implementors were concerned with two kind of performance hits:
Importing the heavyweight
typingmodule if not imported by the user. Therefore a check is made to see if the user has imported it, and only in this case non-stringifiedannotationsare checked.Having to
evalstringfied annotations to detectdataclasses.InitVarandtyping.ClassVar. The latter requiringtypingbe already imported.This concern was addressed with regular expression matching to avoid evaluation and performance oriented check of the detected annotation to recognize
typing.ClassVaranddataclasses.InitVar. As the source code reveals, in the comments (and GitHub issue discussion), not all possible use cases are covered, but it was considered enough.
The implemenation of this PEP:
Will re-use and re-apply the orignal
dataclassestechnique, including the regular expression matching, to identifytpying.ClassVaranddataclasses.InitVar, to identifytyping.Annotated, whether it is in stringified form or not.Because of it, not all cases will be identified. Quoting an example from the comments in the
dataclassesimplementation:# With string annotations, cv0 will be detected as a ClassVar: # CV = ClassVar # @dataclass # class C0: # cv0: CV # But in this example cv1 will not be detected as a ClassVar: # @dataclass # class C1: # CV = ClassVar # cv1: CVIn order to handle the per-field information specified as context-specific metadata with
Annotated, two options are possible:
Using
evaland then iterate over the results until aFieldinstance is found.This will of course not only
evaltheFieldinstance but also the type hint and other bits of metadataAppyling a regular expression (like it is onde to identify
ClassVarandInitVarto isolate theFielddefinition and only applyevalto it.Because the user has specified per-fiel information in the
eval'edFieldinstance, this is kept for later processing, rather than creating a blank one, which would add a performance hit.
- Documentation:
The documentation shall be updated to include the field-annotated syntax as the primary and preferred way to provide meta-information for a field.
Ideally the field-default-value syntax will be simply mentioned once, indicating it is no longer the preferred way.
Type Checkers
This PEP proposes that type checkers signal the field-default-value with a warning and recommend using the field-annotated syntax
An initial proof-of-concept implementation, which does not cover the entire PEP yet, is available in:
Empty (still in draft)
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.