I'm glad to announce the first release of mrmime
, a parser and a generator of
emails. This library provides an OCaml way to analyze and craft an email.
Then, the goal is to make the entire stack about email (such as SMTP or IMAP) to
be able to provide then tools and unikernels around the email service.
In this article, we will show what is currently possible with mrmime
and some
others libraries around it and our next plan.
Some years ago, I did [a talk][talk-mrmime] about what is really an email. Indeed, beside a human-comprehensible format (or a rich-document as we said a long time ago), an email has several details which complexify the process to analyze them (and can be prone to security lapses).
First at all, email is described by mainly 3 RFCs:
- [RFC822][rfc822]
- [RFC2822][rfc2822]
- [RFC5322][rfc5322]
Even if they keep compatibility together, an archeological work is needed to provide the most legacy way to parse an email. In fact, in some ways, some emails continue to respect old standards which did not realize (in 1970) bad or ugly designs.
The last RFC about email (RFC5322) try to fix them and provide a better [ABNF][abnf-rfc] to descrbe format - but of course, it comes with plenty of obsolete rules which need to be implemented. So, along the standard, you find grammar rule and its obsolete version.
Of course, at the end, to respect rules described by RFCs is not enough to be
able to analyze any emails from the real world (from the true scope of the
truth). Implementations about generation of email can, sometimes, produce wrong
email. So mrmime
is tested to parse a bunch of 2 billions emails to see if it
can parse everything - even if it does not produce the expected result.
So, we updated, in some details, ABNF to be able to parse these bad emails when they appear multiple times.
Of course, even if definition of the email can be done only by 3 RFCs, you will miss internationalization of mail ([RFC6532][rfc6532]), MIME format ([RFC2045][rfc2045], [RFC2046][rfc2046], [RFC2047][rfc2047], [RFC2049][rfc2049]), or details needed to be interoperable with SMTP ([RFC5321][rfc5321]) - or, again, some others RFCs which add some elements into an email like S/MIME or Content-Disposition field.
By this way, we took most general RFCs and try to provide an easy way to deal with them. Of course, the main difficulty is about the multipart parser (who tried to make an HTTP 1.1 parser knows about that).
One proof of concept of the usability of mrmime
is ocaml-dkim
which wants to
extract a specific field from your mail and then verify if hash and signature
correspond to what is expected.
ocaml-dkim
is used with the last and new implementation of ocaml-dns
to ask
the public key to verify email.
An other point about ocaml-dkim
and the most important is: it is able to
verify your email in one pass. Indeed, currently some implementations of DKIM
need 2 passes to verify your email (one to extract the DKIM signature, the other
to digest some fields and bodies).
So we mostly focus on that to be able then to provide an unikernel which will be an SMTP relay and verify your received emails.
OCaml is a good language to make a little DSL to serve our purpose. In this way, we took the advantage of OCaml to let the user to easily craft an email from nothing.
The idea is to make OCaml values and then, let the generator to make a stream and use it, for example, into a SMTP implementation.
This snippet show you how to make a little email header:
#require "mrmime" ;;
#require "ptime.clock.os" ;;
open Mrmime
let romain_calascibetta =
let open Mailbox in
Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "gmail"; a "com" ])
let john_doe =
let open Mailbox in
Local.[ w "john" ] @ Domain.(domain, [ a "doe"; a "org" ])
|> with_name Phrase.(v [ w "John"; w "D." ])
let now () =
let open Date in
of_ptime ~zone:Zone.GMT (Ptime_clock.now ())
let subject =
Unstructured.[ v "A"; sp 1; v "Simple"; sp 1; v "Mail" ]
let header =
let open Header in
Field.(Subject $ subject)
& Field.(Sender $ romain_calascibetta)
& Field.(To $ Address.[ mailbox john_doe ])
& Field.(Date $ now ())
& empty
let stream = Header.to_stream header
let () =
let rec go () = match stream () with Some buf -> print_string buf ; go () | None -> () in go ()
And produce:
Date: 2 Aug 2019 14:10:10 GMT
To: John "D." <[email protected]>
Sender: [email protected]
Subject: A Simple Mail
One aspect about email and SMTP is about some historical rules of how to generate them. One of them is about the limitation of bytes per line. Indeed, a generator of mail should emit at most 80 bytes per line - and, of course, it should emits entirely the email line per line.
So mrmime
has his own encoder which tries to wrap your mail into this limit.
It was mostly inspired by [Faraday][faraday] and [Format][format] powered with
GADT to easily describe how to encode/generate parts of an email.
Of course, the main point about email is to be able to generate a multipart
email - just to be able to send file-attachement. And, of course, a deep work
was done about that to make parts, compose them into specific Content-Type
fields and merge them into one email.
At the end, from it, you can easily make a stream which respects rules (78 bytes per line, stream line per line) and use it directly into an SMTP implementation.
This is what we did with the project [facteur
][facteur]. It's a little
command-line tool to send with file attachement mails in pure OCaml - but it
works only on an UNIX operating system for instance.
Even if you are able to parse and generate an email, we need to do some works before to give you results.
Indeed, email is a exchange unit between people and the biggest deal on that is to find a common way to ensure a understable communication each others. About that, encoding is probably the most important piece and when a French guy wants to communicate with a latin1 encoding, an American guy still uses ASCII.
So about this problem, the choice was made to unify any contents to UTF-8 as the
most general encoding of the world. So, first, thanks to [@dbuenzli][dbuenzli]
about [uutf
][uutf] and then, we did some libraries which map an encoding flow
to Unicode code-point. Then, we use uutf
to normalize it to UTF-8.
The main goal it's to avoid an headache to the user about that and even if contents of the mail is encoded with latin1 we ensure to translate it correctly (and according RFCs) to UTF-8.
This project is [rosetta
][rosetta] and it comes with:
- [
uuuu
][uuuu] about ISO-8859 encoding - [
coin
][coin] about KOI8-{R,U} encoding - [
yuscii
][yuscii] about UTF-7 encoding
Then, bodies can be encoded in some ways, 2 precisely:
- A base64 encoding, used to store your file
- A quoted-printable encoding
So, about the base64
package, it comes with a sub-package base64.rfc2045
which respects the special case to encode a body according RFC2045 and SMTP
limitation.
Then, pecu
was made to encode and decode quoted-printable contents. It was
tested and fuzzed of course like any others MirageOS's libraries.
These libraries are needed for an other historical reason which is: bytes used to store mail should use only 7 bits instead of 8 bits. This is the purpose of the base64 and the quoted-printable encoding which uses only 127 possibilities of a byte. Again, this limitation comes with SMTP protocol.
mrmime
can be considered as a huge project when it try to parse and generate
email according 50 years of usability, several RFCs and legacy rules. So, it
still is an experimental project. We reach the first version of it because we
are currently able to parse many mails and generate them then correctly.
Of course, a bug (a malformed mail, a server which does not respect standards or a bad use of our API) can appear easily where we did not test everything. But we have the conscious that it was the time to release it and let people to use it.
The best feedback about mrmime
and the best improvement is you. So don't be
afraid to use it and start to hack your emails with it.