sharwell/Documentation.md Secret

Last active May 11, 2024 15:22

Star (4) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sharwell/ab7a6ccab745c7e0a5b8662104e79735.js"></script>
Save sharwell/ab7a6ccab745c7e0a5b8662104e79735 to your computer and use it in GitHub Desktop.

Download ZIP

Documentation comments revised

Raw

Documentation.md

Overview

Markdown documentation comments are a backwards-compatible replacement for XML documentation comments.

If the first non-whitespace character of the comment is <, it is treated as an XML documentation comment
Otherwise, the comment is treated as a Markdown documentation comment

Unlike XML documentation, Markdown comments are allowed anywhere a line or block comment is allowed in the language.

🔗 dotnet/csharplang#891

Language and Compiler

Language changes

While XML documentation files will remain the standard for shipping documentation with assemblies, the language will relax its rules surrounding the form for these comments in code.

The behavior of a documentation comment whose first non-whitespace character is not < is implementation-defined.
The behavior of a documentation comment not placed on a type or member is implementation-defined.
Documentation comments are allowed to contain arbitrary valid XML. In addition to the elements defined in earlier versions of the C# language, documentation rendering tools are encouraged to support the following elements:
- <em>
- <strong>
- <inheritdoc>
- <a href="">
- <see href="">
- <see langword="keyword">

Compiler changes

The compiler translates documentation comments for exposed types and members to XML during the build. For new documentation comments, the compiler delegates the translation to a documentation analyzer, which is responsible for:

Translation of documentation comments to XML form for inclusion in compiler outputs
Analysis of documentation comments for any diagnostics

The compiler provides a default documentation analyzer which handles XML documentation comments. It may provide a minimal documentation analyzer for non-XML documentation comments based on a minimal CommonMark behavior, which is used if no other documentation analyzer is provided.

IDE extensibility

Documentation analyzers interact at a low level with the compiler. The documentation analyzer specifies a content type for the documentation contents, which IDEs may use to provide a default editing experience. A separate documentation presenter can be provided which interacts with higher-level IDE features. It is responsible for:

Classification
Find references
Get symbol info (determine the symbol(s) referenced by a specific location within the comment)
Complexification and simplification
Rename

Sections

Sections are treated as extensions to the thematic breaks behavior of CommonMark.

`@summary`

The first section of a comment is the summary. This section may optionally start with a @summary thematic break. The @summary element typically does not need to be specified explicitly. However, a user may want to include it for one of the following reasons:

The content of the summary section starts with a < character, which would otherwise cause the compiler to treat the comment as an XML documentation comment.
The content of the summary section includes more than one paragraph, and the comment does not include a remarks section.

`@remarks`

The remarks section starts following the @remarks thematic break.

Other sections

🚧 Other sections may be supported by this design. Possible approaches include:

Restrict the sections to @summary and @remarks
Allow additional sections, but restrict the set to an allowlist
Allow any section in the form ^\s*@\w+\s*$, or perhaps a more restricted form focusing on identifiers

Implicit breaks

If the @summary and @remarks thematic breaks are omitted, a @remarks thematic break is implicitly added immediately following the first paragraph of the summary section.

Parameters

Parameters are defined using an extension to the list syntax.

🚧 The delimiter syntax is not finalized for this, but may look like one of the following:

name:
@param name
@name

The documentation for a parameter follows the list delimiter under the same rules as bulleted or numbered lists.

Type parameters

Type parameters would be documented in a manner similar to parameters.

🚧 The name portion of the delimiter syntax is not finalized, but could be either T or <T> for a type parameter T.

Return values

Return values would be documented in the same list as parameters and/or type parameters.

🚧 The delimiter syntax is not finalized, but could be one of the following:

return:
returns:
@return
@returns

Tuple elements

Tuple elements may be documented in the same manner as parameters, appearing as a nested list under the item whose type is a tuple.

/// point: The point to scale
///     x: The x-coordinate of the point to scale
///     y: The y-coordinate of the point to scale
/// scale: The amount by which to scale the point
/// return: The scaled point
///     x: The x-coordinate of the scaled point
///     y: The y-coordinate of the scaled point
(double x, double y) Scale((double x, double y) point, double scale);

Code and References

By default, code within a comment is validated. In their simplest forms, inline code and code blocks are treated as code in the same language as the containing document.

Inline code may be treated as "plain" code by using one more set of backticks than is necessary for escaping purposes.
- `semantic`
- ``"Semantic string with backtick (`)"``
- ``plain``
- ```plain backtick (`)```

Fenced code may be treated as "plain" code in the current lanuage by including plain in the info string.

```
// In a C# source file, this is treated as C# code and semantically validated
void Method() { }
```

```csharp
// This is semantically validated
void Method() { }
```

```csharp plain
// This is highlighted as C# code but not semantically validated
void Method() { }
```

Resolving references

For comments not placed in a code block, resolve the comments from a pseudo-context "inside" the element (i.e. parameters resolve, then element name, then containers...)
For comments preceding a statement which can have child statements, resolve the comments from the beginning of the first child statement
For comments preceding a standalone statement, resolve the comments from the end of the statement
For comments at the end of a code block, resolve from the current location

mosra commented Jan 10, 2019

Background: I'm the author of the m.css content authoring/documentation framework, which, among other things, contains a Doxygen-based tool for documenting C++ projects (example). I'm now creating a similar tool for C#, based on the XMLDoc output.

Why I'm commenting: I have some experience with both writing and parsing JavaDoc/Doxygen syntax and I think some of my insights could be useful to you. Mainly to avoid repeating the same mistakes Doxygen did :) Please note that the below experience was made when documenting C++ code, but it equally applies to C# as well as it's mainly about the doc block syntax.

The proposal above is reminding me a lot of what Markdown-enabled Doxygen looks like, which immediately suggests a question about making the syntax compatible with Doxygen. An argument for doing that would be to make it easier for users (no new syntax to learn), however there is a lot of counter-arguments:

The syntax has a few really nasty corner cases. One of them is the @ref command, which is used to reference symbols. The argument to @ref is "anything in the following text as long as it looks like a symbol", which leads to a very complicated parser implementation. In your proposal above, you wrap the reference in backticks, which makes much more sense.
Doxygen has various notions of specially-styled blocks -- @note, @warning etc., which for example put the next paragraph in a yellow box to make it more visible. This is a very useful feature to have (I don't see it in the above proposal yet), but the problem is that it's limited to a single paragraph. Often it's desired to have more paragraphs in a @note and, in order to support that, Doxygen had to implement a complex handling on the parser side that merges adjacent @notes into a single block.
Similarly to @notes, general nesting of block-level elements is problematic. Markdown, as I see it, was not designed for complex layouting capabilities and in order to do more complex things users often have to resort in writing plain HTML inside (which in turn means most Markdown parsers have to implement HTML parsing at some point as well). The usual cases I'm hitting very often are:
- a code block or a @note in a list (have to use HTML <ul> to achieve correct nesting)
- code blocks, lists or generally anything more complex than plain text inside tables (again I have to use HTML <table> to make my way around the limitations)

To give an example, a real-world case of a more complex documentation layout could be this: Magnum::Animation::Easing -- it combines a responsive table-like layout containing embedded SVG images, together with math rendering and custom styled elements. While not absolutely essential, having such options at hand when writing docs is what makes the difference.

So, knowing the limitations of the Markdown/Doxygen syntax, an alternative idea I am now toying with is to use the same syntax as Sphinx has -- reStructuredText. The main syntactical difference for your above examples would be that it's :foo: instead of foo: / @foo, but the rest would stay mostly the same -- references to symbols with backticks, inline code with double backticks. The main advantages of this syntax are:

It's already used by Sphinx for Python projects (and reStructuredText alone is used by many for authoring web content in tools like Pelican), so the users can again reuse something well known (and it can open the possibilities for C# support in Sphinx or https://readthedocs.org)
paragraph nesting and other advanced layouts are not a problem, code-block in a note in a list in a table is a completely valid use case parsable in a completely unambiguous way
the syntax is well-documented and made to be easily extensible for new inline and block elements without resorting to hacks (so you could introduce a builtin .. exceptions:: directive, for example)
since the syntax is standard and there are many parsers for it, it'll be easy to add support for this to 3rd party tools without forcing them to implement their own modified parsers

To visualize how this could look, here's an example taken from this tweet (sorry, it's an image, don't have the original code anymore) --- again the particular code here is C++, using /** */-style comments, but the doc block syntax is language-agnostic. Resulting rendered docs, for comparison, are here.

I hope this rather lengthy comment is of some use to this discussion :)

CyrusNajmabadi commented Apr 5, 2019

@sharwell Definitely curious about how you would like symbol-references to be encoded. Do you have thoughts on that?

CyrusNajmabadi commented Apr 5, 2019

Definitely curious on more details about things like:

```csharp
// This is semantically validated
void Method() { }
```

Even syntactic validation is tricky given the desire to write potentially arbitrary code, without clear indications about what scope that code would be contained in. For example, i could easily see someone writing code that would only be valid at the namespace/type level, or only valid inside a type, or only valid as a statement, or only valid as an expression. Both syntactic and semantic validation are def tricky here.

If there is to be validation, i would suggest something like csharp=statement. If the scope wasn't provided, perhaps the IDE would make some reasonable efforts to try to figure out what was going on, but i would't then validate.

THoughts?

Note: i really like this proposal :) Not trying to poke holes, just trying to start good discussions on thorny problems!

nanoant commented Nov 20, 2019

@sharwell Hi Sam, is there a chance to push this forward. E.g. having this spec RFC moved to C# Lang issue list so it can be referenced by dotnet/csharplang#2394 and tracked? I am afraid that if there's not enough pressure we gonna end up without any viable alternative to XML comments.
And to be honest the example you're showing https://github.com/tunnelvisionlabs/dotnet-threading/blob/3e99a9d13476a1e8224d81f282f3cedad143c1bc/Rackspace.Threading/TaskBlocks.cs#L16-L68 clearly demonstrates how bad and unreadable eyesore XML comments are.

CyrusNajmabadi commented Nov 23, 2019

A separate documentation presenter can be provided which interacts with higher-level IDE features. It is responsible for:

Note: this sounds very similar to the IEmbeddedLanguage system i built for embedded json/regex literals. When you get to this part, i would both like to be part of the discussion, and i think it would be good if we could evaluate how we might be able to build a system where we have one single concept here instead of multiple similar concepts.

Note: a recent thought i've been having here is that all of these areas should simply be represented as (Contained)?Documents. Documents already have a defined and understandable way to get at structure and to expose services. And we already have the concept of embedded documents in the ASP/razor system. The only real difference we need is:

arbitrary nesting levels. We would expect to potentially have a 'markdown' doc embedded in a C# doc. Then we would expect to potentially have 'semantically inert c#' docs embedded in the 'markdown doc'.
an appropriate registration/discovery system for the language processors here.
a system to load/embed processing of the different sections to the different processors. Note that this would have to be collaborative. i.e. the markdown-provider would be the one that would have to figure out which sections of itself would then have to load and be processed by a diferent language processor.

The system i built was effectively this. Though i didn't reuse the Document abstraction as i was worried too much about the potential size impact on the rest of the system. I intentionally tried to keep things explicitly separated for simplicity. However, i would want to not do that in the future if this is a first-class concept in the workspace and presentation models.