PEP: 7xx Title: Dataclasses - Annotated support for field as metainformation Author: Sponsor: Discussions-To: https://discuss.python.org/t/dataclasses-make-use-of-annotated/30028/22 Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Jul-2023 Python-Version: 3.12 (draft) Post-History: 23-Jul-2023
This PEP proposes a new syntax for the dataclasses
facility, by using
Annotated
context-specific metadata for an annotation, to add per-field
information, i.e: a "field-annotated" syntax.
It can be seen as an update to the current "field-default-value" syntax, in
which the per-field information is conveyed as the default
value of a
field, creating a type mismatch with the type hint in the annotation and
sometimes adding a default value, when the fiel has none.
Back when dataclasses
was introduced, Annotated
did not yet exist. The
_"field-default-syntax"_ foresaw using the default
value as an optimal way
to convey extra per-field information, to then manipulate the default
value. Example:
from dataclasses import dataclass, field @dataclass class A: # attributes with no dataclass meta-information a: int # no default value b: int = 5 # default value, type matches type hint # attributes with meta-information c: int = field(init=False) # no default, but "field" is the default d: int = field(init=False, default=5) # default in "field"
c
and d
need per-field because both are excluded from the generated
__init__
. Considerations:
default
value is afield
(actuallyField
) instead of anint
- Attributes which do not have a
default
value but which need per-field information, do still get adefault
value. This is dealt with by thedataclasses
machinery by deleting the attribute at the end of class processing.- Type checkers need to handle
dataclasses
as a special case and look into theField
to ensure proper type checking.
The proposed "field-annotated" syntax addresses the issues laid out above for
c
and d
. Example:
from dataclasses import dataclass, field from typing import Annotated @dataclass class A: # attributes with no dataclass meta-information a: int # no default value b: int = 5 # default value, type matches type hint # attributes with meta-information c: Annotated[int, field(init=False)] # no default d: Annotated[int, field(init=False)] = 5 # default specified as usual
With the new syntax:
c
which does not have adefault
value, doesn't get one assigned, which is consistent, unlike in the case of the *"field-default-value"*_ syntax.
d
which does have adefault
value, gets one and has a type which matches the type hint in the annotation (first argument toAnnotated
)This makes it also straightforward for type checkers to handle the
c
andd
cases without consideringdataclasses
a special case.
As contributed by users during the discussion, using Annotated
to convey
extra information, in similar ways to this proposal is becoming a thing amongst
the Python community. Which is to be expected, given that Annotated
was not
created simply for the sake of it.
Examples:
Pydantic:
class Cat(BaseModel): name: Annotated[str, Field(title="Name")] = "unknown"FastAPI:
@app.get('/{id}') def get_cat(id: Annotated[str, Path()] = "Garfield"): passTyper:
def main(name: Annotated[str, typer.Argument()]): print(f"Hello {name}")
When the "field-annotated" syntax is applied and Annotated
is how a field
is typed, the following applies:
Annotated
shall be detected whether tha annotation has been stringified or not.See the implementation section for more details on this.
The args after the type hint will be iterated to look for a
Field
instance:@dataclass class A: # Usually the Field instance will be the only meta-information a: Annotated[int, field(default=5)] # the ``Field`` can be the 3rd or later parameter b: Annotated[int, {"my": "info"}, field(default=5)]If one is found, it will be used as the
Field
instance for the attribute, rather than instantiating a new one, in order to retain the information the user has put into thatField
The type hint, first argument to
Annotated
will be recorded as theField.type
Once a
Field
instance has been found, further meta-information args will be ignored and not examined, which means any futherField
instance will be ignored:@dataclass class A: # Second Field intance is ignored a: Annotated[int, field(default=5), field=(init=False)]If the
Field
instance has adefault
value for the attribute and adefault
value has also been assigned to the attribute aValueError
exception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False, default=5)] = 7If the
Field
instance has adefault_factory
value for the attribute and adefault
value has also been assigned to the attribute, the standard behavior of thefield
method will be retained and aValueError
exception will be raised exception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False, default_factory=int)] = 7If there is a
Field
instance and anotherField
instance is assigned as thedefault
value, aValueError
exception will be raised:@dataclass class A: # This raises a ValueError exception a: Annotated[int, field(init=False)] = field(default=5)
This PEP is backwards compatible. The "field-default-value" syntax is still valid.
This PEP introduces also no compabitility issues with type checkers:
- The special cases introduced to handle the "field-default-value" syntax in
dataclasses
are not affected.- Type checking the "field-annotated" syntax is 100% standard and done as for
Annotated
annotations in non-dataclass classes.
The dataclasses
implementors were concerned with two kind of performance hits:
Importing the heavyweight
typing
module if not imported by the user. Therefore a check is made to see if the user has imported it, and only in this case non-stringifiedannotations
are checked.Having to
eval
stringfied annotations to detectdataclasses.InitVar
andtyping.ClassVar
. The latter requiringtyping
be already imported.This concern was addressed with regular expression matching to avoid evaluation and performance oriented check of the detected annotation to recognize
typing.ClassVar
anddataclasses.InitVar
. As the source code reveals, in the comments (and GitHub issue discussion), not all possible use cases are covered, but it was considered enough.
The implemenation of this PEP:
Will re-use and re-apply the orignal
dataclasses
technique, including the regular expression matching, to identifytpying.ClassVar
anddataclasses.InitVar
, to identifytyping.Annotated
, whether it is in stringified form or not.Because of it, not all cases will be identified. Quoting an example from the comments in the
dataclasses
implementation:# With string annotations, cv0 will be detected as a ClassVar: # CV = ClassVar # @dataclass # class C0: # cv0: CV # But in this example cv1 will not be detected as a ClassVar: # @dataclass # class C1: # CV = ClassVar # cv1: CVIn order to handle the per-field information specified as context-specific metadata with
Annotated
, two options are possible:
Using
eval
and then iterate over the results until aField
instance is found.This will of course not only
eval
theField
instance but also the type hint and other bits of metadataAppyling a regular expression (like it is onde to identify
ClassVar
andInitVar
to isolate theField
definition and only applyeval
to it.Because the user has specified per-fiel information in the
eval
'edField
instance, this is kept for later processing, rather than creating a blank one, which would add a performance hit.
- Documentation:
The documentation shall be updated to include the field-annotated syntax as the primary and preferred way to provide meta-information for a field.
Ideally the field-default-value syntax will be simply mentioned once, indicating it is no longer the preferred way.
Type Checkers
This PEP proposes that type checkers signal the field-default-value with a warning and recommend using the field-annotated syntax
An initial proof-of-concept implementation, which does not cover the entire PEP yet, is available in:
Empty (still in draft)
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.