Skip to content

Instantly share code, notes, and snippets.

@kosivantsov
Last active May 20, 2026 17:36
Show Gist options
  • Select an option

  • Save kosivantsov/30e941bcd21dc64d69da583c4cb13d7f to your computer and use it in GitHub Desktop.

Select an option

Save kosivantsov/30e941bcd21dc64d69da583c4cb13d7f to your computer and use it in GitHub Desktop.
Creating DSL dictionary files

Creating DSL Dictionaries

The DSL (Dictionary Specification Language) format is a plain text file structure used by OmegaT to display rich dictionary lookups. Every file must begin with a header specifying the dictionary name, index language, and contents language, followed by the dictionary entries where headwords are flush left and all definitions are indented with at least one tab or space.

Common DSL Tags

Tags in DSL are used to format text and organize definitions into logical sections. They function similarly to HTML tags and must be opened and closed properly around the targeted text.

  • [m1], [m2], etc.: Sets the left margin indentation level for the text block to create visual hierarchy. The closing tag is [/m].
  • [trn]...[/trn]: Wraps the core translation or definition of the given word.
  • [ex]...[/ex]: Indicates an example sentence illustrating how the word is used.
  • [com]...[/com]: Adds a comment or explanatory note for the translator.
  • [b], [i], [u]: Applies basic text formatting (bold, italics, underline).
  • [c color]...[/c]: Changes the text color (e.g., [c blue]text[/c]).
  • [ref]word[/ref]: Creates a clickable cross-reference link to another headword within the same dictionary.
  • [*]...[/*]: Hides the enclosed content by default, requiring the user to expand the view to read it.

Linux Encoding Fix

OmegaT requires DSL files to be encoded in UTF-16LE with a Byte Order Mark (BOM), which Linux text editors like Geany usually omit upon saving. Without this BOM, the Java-based OmegaT engine cannot identify the file encoding, resulting in an empty dictionary pane. To resolve this, save your draft as standard UTF-8 in your editor, and then use the terminal to manually inject the UTF-8 BOM before converting the stream to UTF-16LE.

Run this in your terminal to process the UTF-8 file:

sed '1s/^/\xef\xbb\xbf/' my_draft.utf8.dsl | iconv -f UTF-8 -t UTF-16LE > my_dictionary.dsl

Sample Dictionary File

Below is a complete example of a functional DSL file containing three definitions. Each entry demonstrates different margin levels, text formatting, multiple examples, and cross-references.

#NAME "English-Polish Tech Dictionary"
#INDEX_LANGUAGE "English"
#CONTENTS_LANGUAGE "Polish"

compile
	[m1][trn][b]1.[/b] [c blue]kompilować[/c] (kod źródłowy)[/trn][/m]
	[m2][ex][i]The IDE will compile the program automatically.[/i] — Środowisko automatycznie skompiluje program.[/ex][/m]
	[m2][ex][i]It takes a long time to compile this kernel.[/i] — Kompilacja tego jądra zajmuje dużo czasu.[/ex][/m]
	[m1][trn][b]2.[/b] sporządzać, opracowywać (np. listę)[/trn][/m]
	[m2][ex][i]She compiled a list of the best developers.[/i] — Ona sporządziła listę najlepszych programistów.[/ex][/m]
	[m1][*][com]See also: [ref]compiler[/ref][/com][/*][/m]

compiler
	[m1][trn]kompilator[/trn][/m]
	[m2][ex][i]The C++ compiler found three syntax errors.[/i] — Kompilator C++ znalazł trzy błędy składniowe.[/ex][/m]
	[m2][ex][i]You need to install the latest compiler version.[/i] — Musisz zainstalować najnowszą wersję kompilatora.[/ex][/m]
	[m2][ex][i]A good compiler optimizes the code.[/i] — Dobry kompilator optymalizuje kod.[/ex][/m]
	[m1][*][com]A program that converts source code into executable code. Related verb: [ref]compile[/ref][/com][/*][/m]

script
	[m1][trn][b]1.[/b] [c blue]skrypt[/c] (w programowaniu)[/trn][/m]
	[m2][ex][i]I wrote a bash script to automate the backup.[/i] — Napisałem skrypt bash, aby zautomatyzować kopię zapasową.[/ex][/m]
	[m2][ex][i]The script uses sed and iconv to fix encoding.[/i] — Skrypt używa sed i iconv do naprawy kodowania.[/ex][/m]
	[m1][trn][b]2.[/b] scenariusz (filmowy, teatralny)[/trn][/m]
	[m2][ex][i]The actors are reading the script now.[/i] — Aktorzy czytają teraz scenariusz.[/ex][/m]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment