Like most CLI programs, Ctags can have its behaviour changed by passing it options like -R
, to make it work recursively, or -f badaboom
, to make it generate a file called badaboom
.
Teaching Ctags about a new language or extending the rules of a supported language is also done with options like --langmap
or --regex-<lang>
. For example, we could call Ctags with these options:
$ ctags -R --regex-javascript=<regex> --regex-javascript=<regex> [...] --regex-javascript=<regex> .
but that will become unwieldy quickly, even if we alias it.
To make customisation easier, Ctags lets us put all our options in a specific file: $HOME/.ctags
(among others), each option on its own line:
--regex-javascript=<regex>
--regex-javascript=<regex>
[...]
--regex-javascript=<regex>
If you don't have that file already, create it and open it in your favourite editor: we are going to extend the existing support for JavaScript and add support for Rust.
Let's check if our language is supported by Ctags with the --list-languages
option:
$ ctags --list-languages
[...]
JavaScript
[...]
Good. How about the file extensions associated by default with JavaScript?
$ ctags --list-maps=javascript
JavaScript *.js
Well, that's as good a starting point as any.
By default, Ctags only considers *.js
files as JavaScript but we have to work with *.vue
files as well, and *.jsx
, and we would like Ctags to index those, too. This is done with the --langmap
option:
--langmap=javascript:+.vue+.jsx
where the +
before .vue
and .jsx
means "add this extension to the already defined ones".
$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx
If we want full control over the list, we can simply drop the pluses:
--langmap=javascript:.js.vue.jsx
Which would have the same result:
$ ctags --list-maps=javascript
JavaScript *.js *.vue *.jsx
Custom rules follow this syntax:
--regex-<lang>=/<regexp>/<replacement>/<kinds>/<flags>
where:
<lang>
is a supported language likejavascript
orc
,<regexp>
is a regular expression that matches the line of a potential tags, with the tag itself wrapped in a sub-expression (or "capture group"),<replacement>
is a back-reference to a sub-expression of<regexp>
,<kind>
is a comma-separated list of keywords used to qualify the tag,<flags>
modify the behaviour of the regular expression engine.
Here is a concrete example:
--regex-javascript=/\/\/[ \t]*\(FIXME\)[ \t]*:?.*/\1/T,Todo,Todo messages/b
where:
-
\/\/[ \t]*\(FIXME\)[ \t]*:*\(.*\)
is a regular expression that matches://
,- 0 or more spaces or tabs,
FIXME
in capture group number one,- 0 or more spaces or tabs,
- an optional colon,
- anything,
-
\1
is a back-reference to the first capture group, used as the name of the tag (FIXME
, here), -
T,Todo,Todo messages
tells Ctags that the tag is of kind 'T', short for 'Todo', with description 'Todo messages', -
/b
means that we are using the most portable (but also the most limited) regexp syntax available, BRE.
In practical terms, the rule above will match this line in foo.js
:
// FIXME: implementation is too slow
and index it with the tag FIXME
, allowing us to do:
:tselect FIXME
in Vim and get a list of allFIXME
s in our code base,- start Vim on the first
FIXME
(and all the others queued and accessible with:tnext
/:tprevious
) with$ vim -t FIXME
.
As we can see with the --list-languages
option, Rust is not supported. Instead of adding Rust support all by our own, we are going to look at the language's official Ctags config.
Since Ctags doesn't know about Rust, we use the --langdef
option to define a new language:
--langdef=Rust
Rust is a sane language with a short history so we only have to deal with a single extension:
--langmap=Rust:.rs
--regex-Rust=/^[ \t]*(#\[[^\]]\][ \t]*)*(pub[ \t]+)?(extern[ \t]+)?("[^"]+"[ \t]+)?(unsafe[ \t]+)?fn[ \t]+([a-zA-Z0-9_]+)/\6/f,functions,function definitions/
Here, we define a regular expression that matches:
^
anchor at the beginning of the line,- 0 or more spaces or tabs,
- 0 or more group containing an outter attribute followed by 1 or more spaces or tabs,
- 0 or 1 group containing the
pub
keyword followed by 1 or more spaces or tabs, - 0 or 1 group containing the
extern
keyword followed by 1 or more spaces or tabs, - 0 or 1 group containing an ABI identifier followed by 1 or more spaces or tabs,
- 0 or 1 group containing the
unsafe
keyword followed by 1 or more spaces or tabs, - the keyword
fn
followed by 1 or more spaces or tabs, - a capture group containing the name of the function.
Then, we use the 6th capture group as name of our tag and we qualify it as "function".
And so on…
--regex-Rust=/^[ \t]*(pub[ \t]+)?type[ \t]+([a-zA-Z0-9_]+)/\2/T,types,type definitions/
--regex-Rust=/^[ \t]*(pub[ \t]+)?enum[ \t]+([a-zA-Z0-9_]+)/\2/g,enum,enumeration names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?struct[ \t]+([a-zA-Z0-9_]+)/\2/s,structure names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?mod[ \t]+([a-zA-Z0-9_]+)/\2/m,modules,module names/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(static|const)[ \t]+(mut[ \t]+)?([a-zA-Z0-9_]+)/\4/c,consts,static constants/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?trait[ \t]+([a-zA-Z0-9_]+)/\3/t,traits,traits/
--regex-Rust=/^[ \t]*(pub[ \t]+)?(unsafe[ \t]+)?impl([ \t\n]*<[^>]*>)?[ \t]+(([a-zA-Z0-9_:]+)[ \t]*(<[^>]*>)?[ \t]+(for)[ \t]+)?([a-zA-Z0-9_]+)/\5 \7 \8/i,impls,trait implementations/
--regex-Rust=/^[ \t]*macro_rules![ \t]+([a-zA-Z0-9_]+)/\1/d,macros,macro definitions/
Fun fact: the identifier patterns used in those rules, [a-zA-Z0-9_]+
, don't conform with the spec.
$ man ctags
, of course.$ man re_format
.