Skip to content

Instantly share code, notes, and snippets.

@qnighy
Last active April 17, 2022 01:04
Show Gist options
  • Save qnighy/b7e5e9c9aa5d4dd41ea6701fb4506d62 to your computer and use it in GitHub Desktop.
Save qnighy/b7e5e9c9aa5d4dd41ea6701fb4506d62 to your computer and use it in GitHub Desktop.
HTML構文メモ

tokenization state 便覧

https://html.spec.whatwg.org/multipage/parsing.html#tokenization

state example 備考
data "" 基本の状態
RCDATA "<title>" 文字参照と閉じタグのみ有効
RAWTEXT "<xmp>" 閉じタグのみ有効
script data "<script>" RCDATA+特別なネストルール
PLAINTEXT "<plaintext>" 脱出は不可能
tag open "<"
end tag open "</"
tag name "<a"
RCDATA less-than sign "<title><"
RCDATA end tag open "<title></"
RCDATA end tag name "<title></a"
RAWTEXT less-than sign "<xmp><"
RAWTEXT end tag open "<xmp></"
RAWTEXT end tag name "<xmp></a"
script data less-than sign "<script><"
script data end tag open "<script></"
script data end tag name "<script></a"
script data escape start "<script><!"
script data escape start dash "<script><!-"
script data escaped "<script><!-- "
script data escaped dash "<script><!-- -"
script data escaped dash dash "<script><!--"
script data escaped less-than sign "<script><!--<"
script data escaped end tag open "<script><!--</"
script data escaped end tag name "<script><!--</a"
script data double escape start "<script><!--<a"
script data double escaped "<script><!--<script>"
script data double escaped dash "<script><!--<script>-"
script data double escaped dash dash "<script><!--<script>--"
script data double escaped less-than sign "<script><!--<script><"
script data double escape end "<script><!--<script></"
before attribute name "<a "
attribute name "<a a"
after attribute name "<a a "
before attribute value "<a a="
attribute value (double-quoted) "<a a=\""
attribute value (single-quoted) "<a a='"
attribute value (unquoted) "<a a=a"
after attribute value (quoted) "<a a='a'"
self-closing start tag "<a/"
bogus comment "<?"
markup declaration open "<!"
comment start "<!--"
comment start dash "<!---"
comment "<!-- "
comment less-than sign "<!--<"
comment less-than sign bang "<!--<!"
comment less-than sign bang dash "<!--<!-"
comment less-than sign bang dash dash "<!--<!--"
comment end dash "<!-- -"
comment end "<!----"
comment end bang "<!----!"
DOCTYPE state "<!doctype"
before DOCTYPE name "<!doctype "
DOCTYPE name "<!doctype a"
after DOCTYPE name "<!doctype a "
after DOCTYPE public keyword "<!doctype a public"
before DOCTYPE public identifier "<!doctype a public "
DOCTYPE public identifier (double-quoted) "<!doctype a public \""
DOCTYPE public identifier (single-quoted) "<!doctype a public '"
after DOCTYPE public identifier "<!doctype a public ''"
between DOCTYPE public and system identifiers "<!doctype a public '' "
after DOCTYPE system keyword "<!doctype a system"
before DOCTYPE system identifier "<!doctype a system "
DOCTYPE system identifier (double-quoted) "<!doctype a system \""
DOCTYPE system identifier (single-quoted) "<!doctype a system '"
after DOCTYPE system identifier "<!doctype a system ''"
bogus DOCTYPE "<!doctype a a"
CDATA section "<svg><![CDATA[" SVG, MathML内のみ
]]> のみ有効
CDATA section bracket "<svg><![CDATA[]"
CDATA section end "<svg><![CDATA[]]"
Character reference "&"
Named character reference "&a"
Named character reference "&aa"
Numeric character reference "&#"
Hexadecimal character reference start "&#x"
Decimal character reference start "&#1"
Hexadecimal character reference "&#x1"
Decimal character reference "&#1"
Numeric character reference end "&#1;"

タグごとの違い:

種類 該当 対応する閉じタグ 文字参照 コメント タグ CDATA
data 以下を除くすべて 1
RCDATA <textarea>
<title>
RAWTEXT <iframe>
<noembed>
<noframes>
<noscript>2
<style>
<xmp>
script data <script> 3
PLAINTEXT <plaintext>

Footnotes

  1. SVG, MathML内のみ

  2. scriptを無効化した環境では通常のタグとして扱う

  3. <!----> で囲まれた範囲内で <script> ... </script> が対で出現した場合は、その </script> は閉じタグとしては扱わない。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment