This step by step guide uses issue RegEx: add a way to get the positions of groups #42307 as a concrete example.
Ask Dart team before jumping into the implementation. The best place to ask is on the issue tracker, you can also try hackers-dart
channel on Flutter Discord or one of the channels on Dart Community discord - but most SDK developers are not on either of them.
Why ask the team?
- They might give you suggestions for the design and implementation
- They might warn you about potential challenges
- They might have tried implementing such feature before, faced unexpected complications and had to back out.
- They might tell you that this feature goes against Dart's priorities and they can't take it.
Get the source code and configure build environment. Follow instructions form Building.md in the SDK.
Checking out the SDK (once you have installed depot_tools
) is as simple as:
$ mkdir dart
$ cd dart
$ fetch dart
To keep your checkout up-to-date you need to do:
$ git pull --rebase
$ gclient sync -f
Building is handled by tools/build.py
(which delegates to GN and Ninja).
For example to build a release mode dart
binary you do:
$ tools/build.py -m release dart
To build full SDK you do:
$ tools/build.py -m release create_sdk
SDK tests are usually placed into one of subdirectories of tests/
directory: tests/corelib
is for core library tests. Test file names should end up with _test
- this way they are automatically picked up by the test runner.
In our particular case we can place test into tests/corelib/regexp/regexp_captures_test.dart
.
You can start with something simple, e.g.
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
import 'package:expect/expect.dart';
void main() {
final pattern = RegExp('(?:(?<=[^\\\\])|^){{(?<name>\\w*)}}');
final match = pattern.firstMatch('A captured word I {{capture}}')!;
Expect.equals((start: 18, end: 29), match.captures[0]);
Expect.equals((start: 20, end: 27), match.captures[1]);
Expect.equals((start: 20, end: 27), match.namedCaptures['name']);
}
Note
- Any new source file should include a copyright header - just copy it from a neighboring file and update the year.
- SDK tests don't use
package:test
, because you want to minize the amount of external dependencies for each individual test. Instead tests use a barebones helper which resides inpackage:expect/expect.dart
.
But eventually you want to cover as many parts of the feature as you can: e.g. named groups, groups that don't participate in the match, nested groups, non-capturing groups, etc.
Tests are executed using two scripts: tools/test.py
is a low-level test runner and tools/test.dart
is a wrapper integrated with results database, which tracks temporary approved failures in the test suite. Before using tools/test.py
you need to make sure to build necessary parts of the SDK.
Here are some examples:
# Build and run VM tests
$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures
# Build core libraries for dart2js and run test on dart2js
$ tools/build.py -m release dart2js_platform.dill
$ tools/test.py -c dart2js corelib/regexp_captures
# Build necessary bits of DDC and run tests on it
$ tools/build.py -m release ddc_stable_test_local
$ tools/test.py -c ddc corelib/regexp_captures
# Build necessary parts of dart2wasm and run test in Wasm mode
$ tools/build.py -m release dart2wasm
$ tools/test.py -c dart2wasm corelib/regexp_captures
Initially the test will just fail:
$ tools/test.py -m release corelib/regexp_captures
...
FAILED: dartk-vm release_arm64 corelib/regexp/regexp_captures_test
Expected: Pass
Actual: CompileTimeError
...
=== 0 tests passed, 1 failed ===
Now the hardest part: you need to understand which files of the SDK to change and actually make necessary changes. Most of core libraries are located in sdk/lib
. sdk/lib/_internal
contains platform specific code:
- VM versions are in
sdk/lib/_internal/{vm,vm_shared}
runtime/vm
,runtime/lib
andruntime/bin
contain C++ part (bin
containsdart:io
related code,lib
non-dart:io
natives,vm
low-level implementation of runtime components).
dart2js
versions are insdk/lib/_internal/{js_runtime,js_runtime_shared}
- DDC version are in
sdk/lib/_internal/{js_dev_runtime,js_runtime_shared}
- Wasm versions are in
sdk/lib/_internal/{wasm,vm_shared}
.
sdk/lib/libraries.yaml
describes how these files are all used together to form individual dart:*
libraries on each platform.
In our example we start by looking for implementation of class RegExpMatch
:
$ git grep 'class RegExpMatch'
sdk/lib/core/regexp.dart:abstract interface class RegExpMatch implements Match {
$ git grep 'implements RegExpMatch'
sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {
sdk/lib/_internal/js_runtime/lib/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {
sdk/lib/_internal/vm/lib/regexp_patch.dart:class _RegExpMatch implements RegExpMatch {
sdk/lib/_internal/wasm/lib/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {
sdk/lib/core/regexp.dart
contains platform independent interface related to RegExp
functionality. We edit it in the following way:
diff --git a/sdk/lib/core/regexp.dart b/sdk/lib/core/regexp.dart
index 7cac56f34ed..59fcda21f48 100644
--- a/sdk/lib/core/regexp.dart
+++ b/sdk/lib/core/regexp.dart
@@ -483,5 +483,31 @@ abstract interface class RegExpMatch implements Match {
/// The names of the named capture groups of [pattern].
Iterable<String> get groupNames;
+ /// The capture groups of this match.
+ ///
+ /// An unmodifiable list of slices for each capture group of this
+ /// regular expression which participated in the match.
+ ///
+ /// The list has length [groupCount] + 1, and has an entry for each
+ /// capture group of the regular expression, plus an entry for the
+ /// entire match, treated as capture group zero.
+ ///
+ /// The entry for a capture is `null` if the capture did not participate in
+ /// the entire match.
+ List<({int start, int end})?> get captures;
+
+ /// The named capture groups of this match.
+ ///
+ /// An unmodifiable map containing slices for each named capture group
+ /// of this regular expression which participated in the match.
+ ///
+ /// The list has length [groupCount] + 1, and has an entry for each
+ /// capture group of the regular expression, plus an entry for the
+ /// entire match, treated as capture group zero.
+ ///
+ /// The entry for a capture is `null` if the capture did not participate in
+ /// the entire match.
+ Map<String, ({int start, int end})> get namedCaptures;
+
RegExp get pattern;
}
Now we need to actually provide platform specific implementations.
Important
As we are making this change we should notice that RegExpMatch
is an abstract interface class
which means it can be implemented outside of dart:core
libraries. This makes adding methods to it a breaking change: a perfectly valid Dart code which implements RegExpMatch
will stop compiling because it does not implement newly added methods or implements them with conflicting signatures. Dart SDK does not outright disallow breaking changes, but does require you to follow a special process described in docs/process/breaking-changes.md.
VM implementation resides in sdk/lib/_internal/vm/lib/regexp_patch.dart
.
Note
Teaching you how to navigate an unfamiliar code base is out of scope for this guide. The key is to read the code. For example, here you can notice that _RegExpMatch
has two private methods int _start(int groupIdx)
and int _end(int groupIdx)
- which return exactly what you need for the implementation of this feature. Similarly you can look at the existing implementation of _RegExpMatch.namedGroup
and _RegExpMatch.groupNames
to figure out how to obtain name-to-index mapping for named groups.
We change it in the following way:
diff --git a/sdk/lib/_internal/vm/lib/regexp_patch.dart b/sdk/lib/_internal/vm/lib/regexp_patch.dart
index 5a80d108e86..e67a4399491 100644
--- a/sdk/lib/_internal/vm/lib/regexp_patch.dart
+++ b/sdk/lib/_internal/vm/lib/regexp_patch.dart
@@ -135,7 +135,35 @@ class _RegExpMatch implements RegExpMatch {
return _regexp._groupNames;
}
- final RegExp _regexp;
+ List<({int start, int end})?> get captures {
+ final result = List<({int start, int end})?>.filled(groupCount + 1, null);
+ for (var i = 0; i <= groupCount; i++) {
+ if (_start(i) != -1) {
+ result[i] = (start: _start(i), end: _end(i));
+ }
+ }
+ return List.unmodifiable(result);
+ }
+
+ Map<String, ({int start, int end})> get namedCaptures {
+ final nameList = _regexp._groupNameList;
+ final result = <String, ({int start, int end})>{};
+ if (nameList != null) {
+ for (int i = 0; i < nameList.length; i += 2) {
+ final groupName = nameList[i] as String;
+ final groupIdx = nameList[i] as int;
+
+ final groupStart = _start(groupIdx);
+ final groupEnd = _end(groupIdx);
+ if (groupStart != -1) {
+ result[groupName] = (start: groupStart, end: groupEnd);
+ }
+ }
+ }
+ return Map.unmodifiable(result);
+ }
+
+ final _RegExp _regexp;
final String input;
final List<int> _match;
static const int _MATCH_PAIR = 2;
Once we have applied this change we are ready to test:
Tip
When working on core libraries you can get faster compilation cycles by setting DART_GN_ARGS=precompile_tools=true
environment variable.
$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures
...
FAILED: dartk-vm release_arm64 corelib/regexp/regexp_captures_test
Expected: Pass
Actual: RuntimeError
--- Command "vm" (took 803ms):
DART_CONFIGURATION=ReleaseARM64 xcodebuild/ReleaseARM64/dart --sound-null-safety -Dtest_runner.configuration=custom-configuration-1 --ignore-unrecognized-flags --packages=/Users/vegorov/src/dart/sdk/.dart_tool/package_config.json /Users/vegorov/src/dart/sdk/tests/corelib/regexp/regexp_captures_test.dart
exit code:
255
stderr:
Unhandled exception:
type 'String' is not a subtype of type 'int' in type cast
#0 _RegExpMatch.namedCaptures (dart:core-patch/regexp_patch.dart:154:38)
#1 main (file:///Users/vegorov/src/dart/sdk/tests/corelib/regexp/regexp_captures_test.dart:12:45)
#2 _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:315:19)
#3 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:194:12)
I have intentionally made a typo in my code, instead of:
final groupName = nameList[i] as String;
final groupIdx = nameList[i] as int;
it should have been
final groupName = nameList[i] as String;
final groupIdx = nameList[i + 1] as int;
Now let us try again:
$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures
...
=== All 1 test passed ===
The test passes! Hooray! Ready to ship...
Not really: we still have to implement it in dart2js, DDC and dart2wasm.
This implementation resides in sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
.
Note
VM ships its own RegExp
engine, but dart2js, DDC and Wasm implementations all fallback to JS RegExp
. That's why you will see a lot of JS interop code directly accessing properties on the underlying RegExp
object.
The implementation of these methods looks like this:
diff --git a/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart b/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
index de9100b6378..c35bb4e7b8b 100644
--- a/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
+++ b/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
@@ -96,8 +96,9 @@ class JSSyntaxRegExp implements RegExp {
} catch (e) {
return e;
}
- })(#, # + # + # + # + #)''',
+ })(#, # + # + # + # + # + #)''',
source,
+ 'd', // Always request indices.
m,
i,
u,
@@ -229,6 +230,46 @@ class _MatchImplementation implements RegExpMatch {
}
return Iterable.empty();
}
+
+ List<({int start, int end})?> get captures {
+ var result = List<({int start, int end})?>.filled(_match.length, null);
+ JSExtendableArray indices = JS('JSExtendableArray', '#.indices', _match);
+ for (var i = 0; i <= groupCount; i++) {
+ JSExtendableArray? slice = JS('JSExtendableArray|Null', '#', indices[i]);
+ if (slice != null) {
+ result[i] = (
+ start: JS('int', '#', slice[0]),
+ end: JS('int', '#', slice[1]),
+ );
+ }
+ }
+ return List.unmodifiable(result);
+ }
+
+ Map<String, ({int start, int end})> get namedCaptures {
+ var result = <String, ({int start, int end})>{};
+ var groups = JS('=Object|Null', '#.indices.groups', _match);
+ if (groups != null) {
+ var names = JSArray<String>.markGrowable(
+ JS('returns:JSExtendableArray;new:true', 'Object.keys(#)', groups),
+ );
+ for (var i = 0; i < names.length; i++) {
+ JSExtendableArray? value = JS(
+ 'JSExtendableArray|Null',
+ '#[#]',
+ groups,
+ names[i],
+ );
+ if (value != null) {
+ result[names[i]] = (
+ start: JS('int', '#', value[0]),
+ end: JS('int', '#', value[1]),
+ );
+ }
+ }
+ }
+ return Map.unmodifiable(result);
+ }
}
class _AllMatchesIterable extends Iterable<RegExpMatch> {
Note that in addition to defining get captures
and get namedCaptures
we had to change the definition of JSSyntaxRegExp.makeNative
to include the d
flag when constructing RegExp
, to tell JS implementation to compute and add indices
property to the match object.
$ tools/build.py -m release dart2js_platform.dill
$ tools/test.py -c dart2js corelib/regexp_captures
...
=== All 1 test passed ===
We have changed the implementation to unconditionally pass d
to RegExp
constructor. This means we have to ask ourselves a question whether this is something that adds an additional cost? To answer this question we need to benchmark it a bit, e.g.
// Build inputs
let K = 3;
let prefix = Array(K).join('-');
let inputs = [];
for (var i = 0; i < 1000; i++) {
inputs.push(`${prefix} {{v${i}}} ---`);
inputs.push(`${prefix} --------- ---`);
}
// Run benchmark applying re to each input.
function benchmark(name, re, N) {
let start = performance.now();
for (let i = 0; i < N; i++) {
let sum = 0;
for (var j = 0; j < inputs.length; j++) {
sum += inputs[j].match(re) ? 0 : 1;
}
if (sum != 1000) throw `Unexpected: ${sum} at ${i}?`;
}
let end = performance.now();
print(`${name} took ${(end - start) * 1000 * 1000 / (inputs.length * N)} ns per match`);
}
benchmark('without d', /(?:(?<=[^\\])|^){{(\w*)}}/, 10000);
benchmark('with d', /(?:(?<=[^\\])|^){{(\w*)}}/d, 10000);
This yields:
$ ~/.jsvu/bin/v8 /tmp/benchmark.js
without d took 30.658349999999995 ns per match
with d took 70.65830000000001 ns per match
$ ~/.jsvu/bin/jsc /tmp/benchmark.js
without d took 289.93300000000005 ns per match
with d took 301.391 ns per match
$ ~/.jsvu/bin/sm /tmp/benchmark.js
without d took 33.14544677734375 ns per match
with d took 74.343359375 ns per match
Note
SpiderMonkey and V8 use the same regexp engine so there no surprise their numbers are so close.
Changing K
value in the benchmark allows to estimate that the cost of d
modifier does not depend on the length of the string and is around 40ns for this particular regexp.
Similarly you can change number of capture groups in the regexp (e.g. (\w*)
, ((\w*))
, ((((\w*))))
, etc) and estimate that the cost is fairly linear: 40ns for one group, 60ns for two, 80ns for 4, 120ns for 8.
This information should be included into the review to help reviewer assess the cost of the change.
Implementation resides in sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
.
We make the following change:
diff --git a/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart b/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
index b2e16aca7a4..0215af59aeb 100644
--- a/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
+++ b/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
@@ -122,12 +122,13 @@ class JSSyntaxRegExp implements RegExp {
'',
'(function() {'
'try {'
- 'return new RegExp(#, # + # + # + # + #);'
+ 'return new RegExp(#, # + # + # + # + # + #);'
'} catch (e) {'
'return e;'
'}'
'})()',
source,
+ 'd', // Always request indices.
m,
i,
u,
@@ -256,6 +257,48 @@ class _MatchImplementation implements RegExpMatch {
}
return Iterable.empty();
}
+
+ List<({int start, int end})?> get captures {
+ var result = List<({int start, int end})?>.filled(_match.length, null);
+ List indices = JS('JSExtendableArray', '#.indices', _match);
+ for (var i = 0; i <= groupCount; i++) {
+ JSExtendableArray? slice = JS('JSExtendableArray|Null', '#', indices[i]);
+ if (slice != null) {
+ result[i] = (
+ start: JS('int', '#', slice[0]),
+ end: JS('int', '#', slice[1]),
+ );
+ }
+ }
+ return List.unmodifiable(result);
+ }
+
+ Map<String, ({int start, int end})> get namedCaptures {
+ var result = <String, ({int start, int end})>{};
+ var groups = JS('=Object|Null', '#.indices.groups', _match);
+ if (groups != null) {
+ JSExtendableArray names = JS(
+ 'JSExtendableArray',
+ 'Object.keys(#)',
+ groups,
+ );
+ for (var i = 0; i < names.length; i++) {
+ JSExtendableArray? value = JS(
+ 'JSExtendableArray|Null',
+ '#[#]',
+ groups,
+ names[i],
+ );
+ if (value != null) {
+ result[names[i]] = (
+ start: JS('int', '#', value[0]),
+ end: JS('int', '#', value[1]),
+ );
+ }
+ }
+ }
+ return Map.unmodifiable(result);
+ }
}
class _AllMatchesIterable extends Iterable<RegExpMatch> {
$ tools/build.py -m release ddc_stable_test_local
$ tools/test.py -c ddc corelib/regexp_captures
...
=== All 1 test passed ===
Wasm implementation resides in sdk/lib/_internal/wasm/lib/regexp_helper.dart
.
We apply the following patch:
diff --git a/sdk/lib/_internal/wasm/lib/regexp_helper.dart b/sdk/lib/_internal/wasm/lib/regexp_helper.dart
index 29364fcf9f5..0fea9ebc657 100644
--- a/sdk/lib/_internal/wasm/lib/regexp_helper.dart
+++ b/sdk/lib/_internal/wasm/lib/regexp_helper.dart
@@ -33,6 +33,30 @@ extension type JSNativeMatch._(JSArray _) implements JSArray {
external JSNumber get index;
external JSObject? get groups;
external JSAny? pop();
+ external JSIndices get indices;
+}
+
+extension type JSIndices._(JSArray _) implements JSArray {
+ external JSObject? get groups;
+}
+
+js_types.JSArrayImpl<JSAny> _namedGroupIndices(JSNativeMatch o) {
+ return js_types.JSArrayImpl<JSAny>(
+ JS<WasmExternRef>(r"""m => {
+ let result = [];
+ if (typeof m.indices !== 'undefined' &&
+ typeof m.indices.groups !== 'undefined') {
+ let groups = m.indices.groups;
+ for (let key of Object.keys(groups)) {
+ let indices = groups[key];
+ if (typeof indices !== 'undefined') {
+ result.push(key, indices[0], indices[1]);
+ }
+ }
+ }
+ return result;
+ }""", o.toExternRef),
+ );
}
extension type JSNativeRegExp._(JSObject _) implements JSObject {
@@ -117,7 +141,7 @@ class JSSyntaxRegExp implements RegExp {
String u = unicode ? 'u' : '';
String s = dotAll ? 's' : '';
String g = global ? 'g' : '';
- String modifiers = '$m$i$u$s$g';
+ String modifiers = 'd$m$i$u$s$g';
// The call to create the regexp is wrapped in a try catch so we can
// reformat the exception if need be.
final result = JS<WasmExternRef?>(
@@ -238,6 +262,32 @@ class _MatchImplementation implements RegExpMatch {
}
return Iterable.empty();
}
+
+ List<({int start, int end})?> get captures {
+ final result = List<({int start, int end})?>.filled(_match.length, null);
+ for (var i = 0; i <= groupCount; i++) {
+ final slice = _match.indices[i] as JSArray?;
+ if (slice != null) {
+ result[i] = (
+ start: (slice[0] as JSNumber).toDartInt,
+ end: (slice[1] as JSNumber).toDartInt,
+ );
+ }
+ }
+ return List.unmodifiable(result);
+ }
+
+ Map<String, ({int start, int end})> get namedCaptures {
+ final result = <String, ({int start, int end})>{};
+ final groups = _namedGroupIndices(_match);
+ for (var i = 0; i < groups.length; i += 3) {
+ result[(groups[i] as JSString).toDart] = (
+ start: (groups[i + 1] as JSNumber).toDartInt,
+ end: (groups[i + 2] as JSNumber).toDartInt,
+ );
+ }
+ return Map.unmodifiable(result);
+ }
}
class _AllMatchesIterable extends Iterable<RegExpMatch> {
$ tools/build.py -m release dart2wasm
$ tools/test.py -c dart2wasm corelib/regexp_captures
...
=== All 1 test passed ===
Dart SDK uses Gerrit instance located at https://dart-review.googlesource.com/ for code reviews. Basics of Gerrit workflow are described here. This is our main and preferred way - though we also do support smaller contributions via GitHub PRs, which are automatically mirrored into Gerrit.
To work with Gerrit (e.g. to reply to review comments) you'll need a Google account. Signing in to Gerrit is done by going to https://dart-review.googlesource.com/ and pressing Sign in in the top right corner. Once signed in you can navigate to Obtain password which will generate a cookie to place in .gitcookies
.
Once these steps are done you are ready to send a change for review:
$ git new-branch regexp-captures
$ git add ...
$ git commit
$ git cl upload
Note
By default git-cl
will squash the history of your branch and upload a single CL - this means the history of your local branch does not really matter.
Important
It is a good idea to follow Commit Message Best Practices when writing CL description because CL description will become commit message once your CL is reviewed and landed.
CLs start in WIP (Work in Progress) state. You will need to send your CL to an appropriate reviewer (you can use OWNER
files in the code base to find one). Core library changes should be sent to Lasse Nielsen <[email protected]>
, who owns core libraries.
Before sending you can optionally walk through the code and leave additional comments to guide reviewers through. Once you are ready - sending is done by pressing REPLY, updating Reviewer field and then pressing SEND. This will send the link to a CL and all associated comments to the reviewer.
Note
If you don't know an appropriate reviewer or are struggling with Gerrit for some other reason you can also just share a link to your Gerrit CL on the issue tracker and ask for a review.
And now you wait for a review.