Skip to content

Instantly share code, notes, and snippets.

@nicolo-ribaudo
Created July 12, 2024 22:03
Show Gist options
  • Save nicolo-ribaudo/b885e2decfe7f4dc32bf7dc073dd9718 to your computer and use it in GitHub Desktop.
Save nicolo-ribaudo/b885e2decfe7f4dc32bf7dc073dd9718 to your computer and use it in GitHub Desktop.

MessageFormat 2 modules integration

This is a proposal for better integration of MessageFormat 2 (Unicode proposal, TC39 proposal) with the rest of the web platform.

TLDR

import message from "./message.mf2" with { type: "messageformat" };

export function Notifications({ count }) {
  return html`<p>${message.format({ count })}</p>`;
}

While I am strongly convinced that MessageFormat belongs to Intl, as it is an internationalization API, it has a significant difference from all the others Intl.* API: it needs a lot of developer-provided data (all the translations!), rather than using mostly data from CLDR.

The TC39 proposal glosses over how developers are meant to rertive the translations, and instead only shows examples with inline strings. In practice, applications will look similar to this:

const locale = getUserLocale();

const message = await fetch("/messages/notifications-count.mf2?lang=" + locale)
  .then(response => response.text())
  .then(raw => new Intl.MessageFormat(locale, raw));

export function Notifications({ count }) {
  return html`
    <p>${message.format({ count })}</p>
  `;
}

or, if the developers is pre-bundling their MessageFormat messages in a JSON files, it could look like this:

const locale = getUserLocale();

// The fetch would actually be in its own module to be deduplicated
// among all the components that need it
const message = await fetch("/messages.json?lang=" + locale)
  .then(response => response.json())
  .then(raw => new Intl.MessageFormat(locale, raw.notifications));

export function Notifications({ count }) {
  return html`
    <p>${message.format({ count })}</p>
  `;
}

Given that MessageFormat messages are a data resource used to render the app, the loading boilerplate could be abstracted away similar to how it has been done for JSON and CSS. We can introduced a new module type specifically for MessageFormat, so that its usage would become as follows:

import message from "/messages/notifications.mf2" with { type: "messageformat" };

export function Notifications({ count }) {
  return html`
    <p>${message.format({ count })}</p>
  `;
}

When importing a type: "messageformat" module, the following happens:

  • as part of module loading, the browser fetches the imported file from the server
  • the server choose which language to provide to the client, through one of:
    • the Accept-Language header in the HTTP request
    • whatever preference they have stored in their database for the user
    • the referrer URL (for websites using, for example, en.example.com/my-page or exaple.com/en/my-page)
  • the server will respond to the HTTP request with the message, together with an indication of the message language, through one of:
    • the Content-Language HTTP header
    • some in-band annotation stored in the MessageFormat file (such as using a .lang keyword)
    • maybe with a fallback to navigator.language
  • the browser will parse the MessageFormat contents, and create an Intl.MessageFormat object with the language defined by the server

This can work either for standalone messages and for hypothetical "message bundles" (https://github.com/eemeli/message-resource-wg/). A message bundle could have an annotation setting the language for all the messages in the file (e.g. @lang it at the beginning), and the messages could be exposed as named exports of the module.

While in many cases the language would be defined by the server, it's possible that it is client-controlled (for example, with a EN/CH switch at the top of the page that re-renders the page without reloading). In this case, applications could still pass it as a dynamic query parameter using dynamic import:

const { default: message } = await import(
  "/messages/notifications.mf2?lang=" + lang,
  { with: { type: "messageformat" } }
);

I am proposing this feature for multiple reasons:

  • Static analyzability: Imports are much more easier to analyze than fetch calls for tools, so with an import-based syntax it would be possible to have:
    • bundlers that automatically bundle and tree-shake messages based on how they are used in the app
    • linters or type-checkers that check that you are passing the correct values to message.format
  • Ergonomics: The logic to fetch messages and construct the Intl.MessageFormat objects is always the same, and this would abstract it away to a one-liner. It is the same reason we had for adding JSON modules to the platform.
  • Syntax ownership: TC39 expressed that we are not sure wether we want to own the parsing logic for the syntax defined by MessageFormat, or wether we want to just provide the formatting/stringifying logic. I am convinced that a feature that just does half of the job and thus having to load a third-party library is unfortunate, but also that having an Intl-related API in its own spec rather than together with all the other Intl APIs is unfortunate. Developing this feature in a well-integrated way, splitting responsability between the JavaScript standard and other web standards, would avoid having to choose one of the two unfortunate directions.
@eemeli
Copy link

eemeli commented Jul 13, 2024

Module integration is an interesting approach that I had not considered. Some initial thoughts:

  • The main reason why the Intl.MessageFormat proposal glosses over where the strings are coming from is that it can; there are multiple potentially valid ways to solve the problem, but all of them do need a formatter that it's providing. Also, the parseResource() part of it was split off into its own proposal. That might be an appropriate space for us to continue some parts of this conversation?

  • When considering a scope greater than what's happening with the formatting of one specific message, we almost always want to do something with multiple messages that need to correlate with each other; If a dialog prompts you to "Click OK to continue", the button label better be "OK". In other words, something like an imported module should always resolve as a bundle of messages.

  • The locale dependency makes resource loading a challenging problem to solve, as it means that a static identifier like /messages/notifications is not enough; we also need to ask for a specific locale or even a locale fallback chain (e.g. es-MX, es, fr, en). If we are to implement this in JS, we probably need to provide a solution that works really well in browsers as well as serverside, where module loading doesn't otherwise consider HTTP headers or user preferences. This raises the question whether TC39 is the right place where to solve the problem, rather than e.g. WhatWG?

@nicolo-ribaudo
Copy link
Author

nicolo-ribaudo commented Jul 15, 2024

The main reason why the Intl.MessageFormat proposal glosses over where the strings are coming from is that it can; there are multiple potentially valid ways to solve the problem, but all of them do need a formatter that it's providing. Also, the parseResource() part of it was split off into its own proposal. That might be an appropriate space for us to continue some parts of this conversation?

Thanks for the link, I'll take a look :)

When considering a scope greater than what's happening with the formatting of one specific message, we almost always want to do something with multiple messages that need to correlate with each other; If a dialog prompts you to "Click OK to continue", the button label better be "OK". In other words, something like an imported module should always resolve as a bundle of messages.

That was my original idea as well, with something like

import { ok_prompt, ok_button } from "./messages" with { type: "messageformat" }

I wrote the examples for a single message just because I couldn't find the resources proposal

The locale dependency makes resource loading a challenging problem to solve, as it means that a static identifier like /messages/notifications is not enough; we also need to ask for a specific locale or even a locale fallback chain (e.g. es-MX, es, fr, en). If we are to implement this in JS, we probably need to provide a solution that works really well in browsers as well as serverside, where module loading doesn't otherwise consider HTTP headers or user preferences. This raises the question whether TC39 is the right place where to solve the problem, rather than e.g. WhatWG?

100% agree that this module integration should be done outside of TC39, since it needs more than "just JS"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment