Like most CLI programs, Ctags can have its behaviour changed by passing it options like -R, to make it work recursively, or -f badaboom, to make it generate a file called badaboom.
Teaching Ctags about a new language or extending the rules of a supported language is also done with options like --langmap or --regex-<lang>. For example, we could call Ctags with these options:
$ ctags -R --regex-javascript=<regex> --regex-javascript=<regex> [...] --regex-javascript=<regex> .
but that will become unwieldy quickly, even if we alias it.
To make customisation easier, Ctags lets us put all our options in a specific file: $HOME/.ctags (among others), each option on its own line:
--regex-javascript=<regex>
--regex-javascript=<regex>
[...]
--regex-javascript=<regex>
If you don't have that file already, create it and open it in your favourite editor: we are going to extend the existing support for JavaScript and add support for Rust.
Let's check if our language is supported by Ctags with the --list-languages option:
$ ctags --list-languages
[...]
JavaScript
[...]
Good. How about the file extensions associated by default with JavaScript?
$ ctags --list-maps=javascript
JavaScript *.js
Well, that's as good a starting point as any.
By default, Ctags only considers *.js files as JavaScript but we have to work with *.vue files as well, and *.jsx, and we would like Ctags to index those, too. This is done with the --langmap option:
--langmap=javascript:+.vue+.jsx
where the + before .vue and .jsx means "add this extension to the already defined ones".
$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx
If we want full control over the list, we can simply drop the pluses:
--langmap=javascript:.js.vue.jsx
Which would have the same result:
$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx
Custom rules follow this syntax:
--regex-<lang>=/<regexp>/<replacement>/<kinds>/<flags>
where:
<lang>is a supported language likejavascriptorc,<regexp>is a regular expression that matches the line of a potential tags, with the tag itself wrapped in a sub-expression (or "capture group"),<replacement>is a back-reference to a sub-expression of<regexp>,<kind>is a comma-separated list of keywords used to qualify the tag,<flags>modify the behaviour of the regular expression engine.
Here is a concrete example:
--regex-javascript=/\/\/[ \t]*\(FIXME\)[ \t]*:?.*/\1/T,Todo,Todo messages/b
where:
- 
\/\/[ \t]*\(FIXME\)[ \t]*:*\(.*\)is a regular expression that matches://,- 0 or more spaces or tabs,
 FIXMEin capture group number one,- 0 or more spaces or tabs,
 - an optional colon,
 - anything,
 
 - 
\1is a back-reference to the first capture group, used as the name of the tag (FIXME, here), - 
T,Todo,Todo messagestells Ctags that the tag is of kind 'T', short for 'Todo', with description 'Todo messages', - 
/bmeans that we are using the most portable (but also the most limited) regexp syntax available, BRE. 
In practical terms, the rule above will match this line in foo.js:
// FIXME: implementation is too slow
and index it with the tag FIXME, allowing us to do:
:tselect FIXMEin Vim and get a list of allFIXMEs in our code base,- start Vim on the first 
FIXME(and all the others queued and accessible with:tnext/:tprevious) with$ vim -t FIXME. 
As we can see with the --list-languages option, Rust is not supported. Instead of adding Rust support all by our own, we are going to look at the language's official Ctags config.
Since Ctags doesn't know about Rust, we use the --langdef option to define a new language:
--langdef=Rust
Rust is a sane language with a short history so we only have to deal with a single extension:
--langmap=Rust:.rs
--regex-Rust=/^[ \t]*(#\[[^\]]\][ \t]*)*(pub[ \t]+)?(extern[ \t]+)?("[^"]+"[ \t]+)?(unsafe[ \t]+)?fn[ \t]+([a-zA-Z0-9_]+)/\6/f,functions,function definitions/
Here, we define a regular expression that matches:
^anchor at the beginning of the line,- 0 or more spaces or tabs,
 - 0 or more group containing an outter attribute followed by 1 or more spaces or tabs,
 - 0 or 1 group containing the 
pubkeyword followed by 1 or more spaces or tabs, - 0 or 1 group containing the 
externkeyword followed by 1 or more spaces or tabs, - 0 or 1 group containing an ABI identifier followed by 1 or more spaces or tabs,
 - 0 or 1 group containing the 
unsafekeyword followed by 1 or more spaces or tabs, - the keyword 
fnfollowed by 1 or more spaces or tabs, - a capture group containing the name of the function.
 
Then, we use the 6th capture group as name of our tag and we qualify it as "function".
And so on…
--regex-Rust=/^[ \t]*(pub[ \t]+)?type[ \t]+([a-zA-Z0-9_]+)/\2/T,types,type definitions/
--regex-Rust=/^[ \t]*(pub[ \t]+)?enum[ \t]+([a-zA-Z0-9_]+)/\2/g,enum,enumeration names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?struct[ \t]+([a-zA-Z0-9_]+)/\2/s,structure names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?mod[ \t]+([a-zA-Z0-9_]+)/\2/m,modules,module names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(static|const)[ \t]+(mut[ \t]+)?([a-zA-Z0-9_]+)/\4/c,consts,static constants/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?trait[ \t]+([a-zA-Z0-9_]+)/\3/t,traits,traits/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?impl([ \t\n]*<[^>]*>)?[ \t]+(([a-zA-Z0-9_:]+)[ \t]*(<[^>]*>)?[ \t]+(for)[ \t]+)?([a-zA-Z0-9_]+)/\5 \7 \8/i,impls,trait implementations/
--regex-Rust=/^[ \t]*macro_rules![ \t]+([a-zA-Z0-9_]+)/\1/d,macros,macro definitions/
Fun fact: the identifier patterns used in those rules, [a-zA-Z0-9_]+, don't conform with the spec.
$ man ctags, of course.$ man re_format.