Skip to content

Instantly share code, notes, and snippets.

@cmalven
Last active September 19, 2024 10:19
Show Gist options
  • Save cmalven/1885287 to your computer and use it in GitHub Desktop.
Save cmalven/1885287 to your computer and use it in GitHub Desktop.
Shortest (useful) HTML5 Document
<!-- http://www.brucelawson.co.uk/2010/a-minimal-html5-document/ -->
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>blah</title>
</head>
<body>
<p>I'm the content</p>
</body>
</html>
@mindplay-dk
Copy link

Okay, but are we authoring for flat files on somebody's local filesystem, or for uploading to a webserver?

Declaring encoding in the file is dodgy, in my opinion:

For one, the potential conflict with Content-Type creates a wonderful opportunity for confusion - if they don't match, the Content-Type happens to win, which would be very confusing for someone trying to debug an encoding issue... "why the heck doesn't my tag work - it worked locally!"

For anoter, declaring the charset in this manner doesn't even work if the tag doesn't appear in the first 1kb of the document... which it very probably will, but, you know... just one of those things that could send somebody down a deep rabbithole when, for no apparent reason, the page goes bonkers because you added 1kb of JavaScript before that tag.

I mean, if we're talking about a minimal HTML document, it's certainly not required - but doesn't really seem like a good idea either way.

@hh-lohmann
Copy link

Modern authoring tools, IDEs, editors, etc. will surely assume UTF-8 as default. But imagine something like Leftpad, i.e. something deep inside the dependency graph of your complex tool and module chain (= out of your awareness) that breaks everything esp. on higher levels just because you had no charset declaration, maybe because it was added as a dependency - by another dependency you are not even aware you have it - before the whole world was UTF-8 or by any other reason setting ISO-8859-1 as default (the parallel to the Leftpad disaster is being killed by something you not even knew about). It's trivial to set a charset declaration before trouble arises, it may be very time consuming to find out that a missing charset declaration was the cause. Note that such a dev tool dependency will also hit you if you will never need any non-ASCII character.

Often small companies (maybe your clients) have no control themselves on the - outdated - server space they bought some years ago from someone who bought it themself from someone other etc. - you may have to do reverse engineering to know about the HTTP headers that are sent, and you cannot change them, so it's quite nice that you are always safe if you have a charset declaration in your code. You may get hit on a server just after years if there arises a new requirement for e.g. a French version and out of the blue there are strange replacements just because of quotation marks that are not in your implicit default charset.

Anyway, you will always have better arguments for stakeholders when things break although you adhered to standards.

@alemens
Copy link

alemens commented Oct 24, 2022

Why does everyone include <meta charset="utf-8">?

UTF-8 is the only valid encoding for HTML5 documents. Means if you have <!DOCTYPE html> at the top of an HTML file then charset is implied

@punund
Copy link

punund commented Sep 15, 2023

if we're talking about a minimal HTML document, it's certainly not required - but doesn't really seem like a good idea either way.

Now imagine serving files with different encodings off the same web server.
Encoding is inalienable from the document itself. You can't usefully change the encoding without changing the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment