Extending Exuberant Ctags

Configuration

Like most CLI programs, Ctags can have its behaviour changed by passing it options like -R, to make it work recursively, or -f badaboom, to make it generate a file called badaboom.

Teaching Ctags about a new language or extending the rules of a supported language is also done with options like --langmap or --regex-<lang>. For example, we could call Ctags with these options:

$ ctags -R --regex-javascript=<regex> --regex-javascript=<regex> [...] --regex-javascript=<regex> .

but that will become unwieldy quickly, even if we alias it.

To make customisation easier, Ctags lets us put all our options in a specific file: $HOME/.ctags (among others), each option on its own line:

--regex-javascript=<regex>
--regex-javascript=<regex>
[...]
--regex-javascript=<regex>

If you don't have that file already, create it and open it in your favourite editor: we are going to extend the existing support for JavaScript and add support for Rust.

Extending language support

Let's check if our language is supported by Ctags with the --list-languages option:

$ ctags --list-languages
[...]
JavaScript
[...]

Good. How about the file extensions associated by default with JavaScript?

$ ctags --list-maps=javascript
JavaScript *.js

Well, that's as good a starting point as any.

Mappings

By default, Ctags only considers *.js files as JavaScript but we have to work with *.vue files as well, and *.jsx, and we would like Ctags to index those, too. This is done with the --langmap option:

--langmap=javascript:+.vue+.jsx

where the + before .vue and .jsx means "add this extension to the already defined ones".

$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx

If we want full control over the list, we can simply drop the pluses:

--langmap=javascript:.js.vue.jsx

Which would have the same result:

$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx

Rules

Custom rules follow this syntax:

--regex-<lang>=/<regexp>/<replacement>/<kinds>/<flags>

where:

<lang> is a supported language like javascript or c,
<regexp> is a regular expression that matches the line of a potential tags, with the tag itself wrapped in a sub-expression (or "capture group"),
<replacement> is a back-reference to a sub-expression of <regexp>,
<kind> is a comma-separated list of keywords used to qualify the tag,
<flags> modify the behaviour of the regular expression engine.

Here is a concrete example:

--regex-javascript=/\/\/[ \t]*\(FIXME\)[ \t]*:?.*/\1/T,Todo,Todo messages/b

where:

\/\/[ \t]*$FIXME$[ \t]*:*$.*$ is a regular expression that matches:
- //,
- 0 or more spaces or tabs,
- FIXME in capture group number one,
- 0 or more spaces or tabs,
- an optional colon,
- anything,
\1 is a back-reference to the first capture group, used as the name of the tag (FIXME, here),
T,Todo,Todo messages tells Ctags that the tag is of kind 'T', short for 'Todo', with description 'Todo messages',
/b means that we are using the most portable (but also the most limited) regexp syntax available, BRE.

In practical terms, the rule above will match this line in foo.js:

// FIXME: implementation is too slow

and index it with the tag FIXME, allowing us to do:

:tselect FIXME in Vim and get a list of all FIXMEs in our code base,
start Vim on the first FIXME (and all the others queued and accessible with :tnext/:tprevious) with $ vim -t FIXME.

Adding language support

As we can see with the --list-languages option, Rust is not supported. Instead of adding Rust support all by our own, we are going to look at the language's official Ctags config.

Definition

Since Ctags doesn't know about Rust, we use the --langdef option to define a new language:

--langdef=Rust

Mappings

Rust is a sane language with a short history so we only have to deal with a single extension:

--langmap=Rust:.rs

Rules

--regex-Rust=/^[ \t]*(#\[[^\]]\][ \t]*)*(pub[ \t]+)?(extern[ \t]+)?("[^"]+"[ \t]+)?(unsafe[ \t]+)?fn[ \t]+([a-zA-Z0-9_]+)/\6/f,functions,function definitions/

Here, we define a regular expression that matches:

^ anchor at the beginning of the line,
0 or more spaces or tabs,
0 or more group containing an outter attribute followed by 1 or more spaces or tabs,
0 or 1 group containing the pub keyword followed by 1 or more spaces or tabs,
0 or 1 group containing the extern keyword followed by 1 or more spaces or tabs,
0 or 1 group containing an ABI identifier followed by 1 or more spaces or tabs,
0 or 1 group containing the unsafe keyword followed by 1 or more spaces or tabs,
the keyword fn followed by 1 or more spaces or tabs,
a capture group containing the name of the function.

Then, we use the 6th capture group as name of our tag and we qualify it as "function".

And so on…

--regex-Rust=/^[ \t]*(pub[ \t]+)?type[ \t]+([a-zA-Z0-9_]+)/\2/T,types,type definitions/
--regex-Rust=/^[ \t]*(pub[ \t]+)?enum[ \t]+([a-zA-Z0-9_]+)/\2/g,enum,enumeration names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?struct[ \t]+([a-zA-Z0-9_]+)/\2/s,structure names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?mod[ \t]+([a-zA-Z0-9_]+)/\2/m,modules,module names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(static|const)[ \t]+(mut[ \t]+)?([a-zA-Z0-9_]+)/\4/c,consts,static constants/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?trait[ \t]+([a-zA-Z0-9_]+)/\3/t,traits,traits/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?impl([ \t\n]*<[^>]*>)?[ \t]+(([a-zA-Z0-9_:]+)[ \t]*(<[^>]*>)?[ \t]+(for)[ \t]+)?([a-zA-Z0-9_]+)/\5 \7 \8/i,impls,trait implementations/
--regex-Rust=/^[ \t]*macro_rules![ \t]+([a-zA-Z0-9_]+)/\1/d,macros,macro definitions/

Fun fact: the identifier patterns used in those rules, [a-zA-Z0-9_]+, don't conform with the spec.

Reference

$ man ctags, of course.
$ man re_format.

romainl/dotctags.md

Extending Exuberant Ctags

Configuration

Extending language support

Mappings

Rules

Adding language support

Definition

Mappings

Rules

Reference