https://html.spec.whatwg.org/multipage/parsing.html#tokenization
state | example | 備考 |
---|---|---|
data | "" |
基本の状態 |
RCDATA | "<title>" |
文字参照と閉じタグのみ有効 |
RAWTEXT | "<xmp>" |
閉じタグのみ有効 |
script data | "<script>" |
RCDATA+特別なネストルール |
PLAINTEXT | "<plaintext>" |
脱出は不可能 |
tag open | "<" |
|
end tag open | "</" |
|
tag name | "<a" |
|
RCDATA less-than sign | "<title><" |
|
RCDATA end tag open | "<title></" |
|
RCDATA end tag name | "<title></a" |
|
RAWTEXT less-than sign | "<xmp><" |
|
RAWTEXT end tag open | "<xmp></" |
|
RAWTEXT end tag name | "<xmp></a" |
|
script data less-than sign | "<script><" |
|
script data end tag open | "<script></" |
|
script data end tag name | "<script></a" |
|
script data escape start | "<script><!" |
|
script data escape start dash | "<script><!-" |
|
script data escaped | "<script><!-- " |
|
script data escaped dash | "<script><!-- -" |
|
script data escaped dash dash | "<script><!--" |
|
script data escaped less-than sign | "<script><!--<" |
|
script data escaped end tag open | "<script><!--</" |
|
script data escaped end tag name | "<script><!--</a" |
|
script data double escape start | "<script><!--<a" |
|
script data double escaped | "<script><!--<script>" |
|
script data double escaped dash | "<script><!--<script>-" |
|
script data double escaped dash dash | "<script><!--<script>--" |
|
script data double escaped less-than sign | "<script><!--<script><" |
|
script data double escape end | "<script><!--<script></" |
|
before attribute name | "<a " |
|
attribute name | "<a a" |
|
after attribute name | "<a a " |
|
before attribute value | "<a a=" |
|
attribute value (double-quoted) | "<a a=\"" |
|
attribute value (single-quoted) | "<a a='" |
|
attribute value (unquoted) | "<a a=a" |
|
after attribute value (quoted) | "<a a='a'" |
|
self-closing start tag | "<a/" |
|
bogus comment | "<?" |
|
markup declaration open | "<!" |
|
comment start | "<!--" |
|
comment start dash | "<!---" |
|
comment | "<!-- " |
|
comment less-than sign | "<!--<" |
|
comment less-than sign bang | "<!--<!" |
|
comment less-than sign bang dash | "<!--<!-" |
|
comment less-than sign bang dash dash | "<!--<!--" |
|
comment end dash | "<!-- -" |
|
comment end | "<!----" |
|
comment end bang | "<!----!" |
|
DOCTYPE state | "<!doctype" |
|
before DOCTYPE name | "<!doctype " |
|
DOCTYPE name | "<!doctype a" |
|
after DOCTYPE name | "<!doctype a " |
|
after DOCTYPE public keyword | "<!doctype a public" |
|
before DOCTYPE public identifier | "<!doctype a public " |
|
DOCTYPE public identifier (double-quoted) | "<!doctype a public \"" |
|
DOCTYPE public identifier (single-quoted) | "<!doctype a public '" |
|
after DOCTYPE public identifier | "<!doctype a public ''" |
|
between DOCTYPE public and system identifiers | "<!doctype a public '' " |
|
after DOCTYPE system keyword | "<!doctype a system" |
|
before DOCTYPE system identifier | "<!doctype a system " |
|
DOCTYPE system identifier (double-quoted) | "<!doctype a system \"" |
|
DOCTYPE system identifier (single-quoted) | "<!doctype a system '" |
|
after DOCTYPE system identifier | "<!doctype a system ''" |
|
bogus DOCTYPE | "<!doctype a a" |
|
CDATA section | "<svg><![CDATA[" |
SVG, MathML内のみ ]]> のみ有効 |
CDATA section bracket | "<svg><![CDATA[]" |
|
CDATA section end | "<svg><![CDATA[]]" |
|
Character reference | "&" |
|
Named character reference | "&a" |
|
Named character reference | "&aa" |
|
Numeric character reference | "&#" |
|
Hexadecimal character reference start | "&#x" |
|
Decimal character reference start | "" |
|
Hexadecimal character reference | "" |
|
Decimal character reference | "" |
|
Numeric character reference end | "" |
タグごとの違い:
種類 | 該当 | 対応する閉じタグ | 文字参照 | コメント | タグ | CDATA |
---|---|---|---|---|---|---|
data | 以下を除くすべて | ✓ | ✓ | ✓ | ✓ | △1 |
RCDATA | <textarea> <title> |
✓ | ✓ | |||
RAWTEXT | <iframe> <noembed> <noframes> <noscript> 2 <style> <xmp> |
✓ | ||||
script data | <script> |
△3 | ||||
PLAINTEXT | <plaintext> |