Skip to content

Instantly share code, notes, and snippets.

@mraleph
Created February 24, 2025 13:15
Show Gist options
  • Save mraleph/5926b1638246fd0d5bf9849d39cf0542 to your computer and use it in GitHub Desktop.
Save mraleph/5926b1638246fd0d5bf9849d39cf0542 to your computer and use it in GitHub Desktop.

This step by step guide uses issue RegEx: add a way to get the positions of groups #42307 as a concrete example.

Step 0: Ask the team

Ask Dart team before jumping into the implementation. The best place to ask is on the issue tracker, you can also try hackers-dart channel on Flutter Discord or one of the channels on Dart Community discord - but most SDK developers are not on either of them.

Why ask the team?

  • They might give you suggestions for the design and implementation
  • They might warn you about potential challenges
  • They might have tried implementing such feature before, faced unexpected complications and had to back out.
  • They might tell you that this feature goes against Dart's priorities and they can't take it.

Step 1: Get the source

Get the source code and configure build environment. Follow instructions form Building.md in the SDK.

Checking out the SDK (once you have installed depot_tools) is as simple as:

$ mkdir dart
$ cd dart
$ fetch dart

To keep your checkout up-to-date you need to do:

$ git pull --rebase
$ gclient sync -f 

Building is handled by tools/build.py (which delegates to GN and Ninja).

For example to build a release mode dart binary you do:

$ tools/build.py -m release dart

To build full SDK you do:

$ tools/build.py -m release create_sdk

Step 2: Write a test

SDK tests are usually placed into one of subdirectories of tests/ directory: tests/corelib is for core library tests. Test file names should end up with _test - this way they are automatically picked up by the test runner.

In our particular case we can place test into tests/corelib/regexp/regexp_captures_test.dart.

You can start with something simple, e.g.

// Copyright (c) 2025, the Dart project authors.  Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

import 'package:expect/expect.dart';

void main() {
  final pattern = RegExp('(?:(?<=[^\\\\])|^){{(?<name>\\w*)}}');
  final match = pattern.firstMatch('A captured word I {{capture}}')!;
  Expect.equals((start: 18, end: 29), match.captures[0]);
  Expect.equals((start: 20, end: 27), match.captures[1]);
  Expect.equals((start: 20, end: 27), match.namedCaptures['name']);
}

Note

  1. Any new source file should include a copyright header - just copy it from a neighboring file and update the year.
  2. SDK tests don't use package:test, because you want to minize the amount of external dependencies for each individual test. Instead tests use a barebones helper which resides in package:expect/expect.dart.

But eventually you want to cover as many parts of the feature as you can: e.g. named groups, groups that don't participate in the match, nested groups, non-capturing groups, etc.

Tests are executed using two scripts: tools/test.py is a low-level test runner and tools/test.dart is a wrapper integrated with results database, which tracks temporary approved failures in the test suite. Before using tools/test.py you need to make sure to build necessary parts of the SDK.

Here are some examples:

# Build and run VM tests
$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures

# Build core libraries for dart2js and run test on dart2js
$ tools/build.py -m release dart2js_platform.dill
$ tools/test.py -c dart2js corelib/regexp_captures

# Build necessary bits of DDC and run tests on it
$ tools/build.py -m release ddc_stable_test_local
$ tools/test.py -c ddc corelib/regexp_captures

# Build necessary parts of dart2wasm and run test in Wasm mode
$ tools/build.py -m release dart2wasm
$ tools/test.py -c dart2wasm corelib/regexp_captures

Initially the test will just fail:

$ tools/test.py -m release corelib/regexp_captures
...
FAILED: dartk-vm release_arm64 corelib/regexp/regexp_captures_test
Expected: Pass
Actual: CompileTimeError
...
=== 0 tests passed, 1 failed ===

Step 3: Make changes

Now the hardest part: you need to understand which files of the SDK to change and actually make necessary changes. Most of core libraries are located in sdk/lib. sdk/lib/_internal contains platform specific code:

  • VM versions are in sdk/lib/_internal/{vm,vm_shared}
    • runtime/vm, runtime/lib and runtime/bin contain C++ part (bin contains dart:io related code, lib non-dart:io natives, vm low-level implementation of runtime components).
  • dart2js versions are in sdk/lib/_internal/{js_runtime,js_runtime_shared}
  • DDC version are in sdk/lib/_internal/{js_dev_runtime,js_runtime_shared}
  • Wasm versions are in sdk/lib/_internal/{wasm,vm_shared}.

sdk/lib/libraries.yaml describes how these files are all used together to form individual dart:* libraries on each platform.

In our example we start by looking for implementation of class RegExpMatch:

$ git grep 'class RegExpMatch'
sdk/lib/core/regexp.dart:abstract interface class RegExpMatch implements Match {
$ git grep 'implements RegExpMatch'
sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {
sdk/lib/_internal/js_runtime/lib/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {
sdk/lib/_internal/vm/lib/regexp_patch.dart:class _RegExpMatch implements RegExpMatch {
sdk/lib/_internal/wasm/lib/regexp_helper.dart:class _MatchImplementation implements RegExpMatch {

Changing platform independent interface

sdk/lib/core/regexp.dart contains platform independent interface related to RegExp functionality. We edit it in the following way:

diff --git a/sdk/lib/core/regexp.dart b/sdk/lib/core/regexp.dart
index 7cac56f34ed..59fcda21f48 100644
--- a/sdk/lib/core/regexp.dart
+++ b/sdk/lib/core/regexp.dart
@@ -483,5 +483,31 @@ abstract interface class RegExpMatch implements Match {
   /// The names of the named capture groups of [pattern].
   Iterable<String> get groupNames;
 
+  /// The capture groups of this match.
+  ///
+  /// An unmodifiable list of slices for each capture group of this
+  /// regular expression which participated in the match.
+  ///
+  /// The list has length [groupCount] + 1, and has an entry for each
+  /// capture group of the regular expression, plus an entry for the
+  /// entire match, treated as capture group zero.
+  ///
+  /// The entry for a capture is `null` if the capture did not participate in
+  /// the entire match.
+  List<({int start, int end})?> get captures;
+
+  /// The named capture groups of this match.
+  ///
+  /// An unmodifiable map containing slices for each named capture group
+  /// of this regular expression which participated in the match.
+  ///
+  /// The list has length [groupCount] + 1, and has an entry for each
+  /// capture group of the regular expression, plus an entry for the
+  /// entire match, treated as capture group zero.
+  ///
+  /// The entry for a capture is `null` if the capture did not participate in
+  /// the entire match.
+  Map<String, ({int start, int end})> get namedCaptures;
+
   RegExp get pattern;
 }

Now we need to actually provide platform specific implementations.

Important

As we are making this change we should notice that RegExpMatch is an abstract interface class which means it can be implemented outside of dart:core libraries. This makes adding methods to it a breaking change: a perfectly valid Dart code which implements RegExpMatch will stop compiling because it does not implement newly added methods or implements them with conflicting signatures. Dart SDK does not outright disallow breaking changes, but does require you to follow a special process described in docs/process/breaking-changes.md.

Changing VM implementation

VM implementation resides in sdk/lib/_internal/vm/lib/regexp_patch.dart.

Note

Teaching you how to navigate an unfamiliar code base is out of scope for this guide. The key is to read the code. For example, here you can notice that _RegExpMatch has two private methods int _start(int groupIdx) and int _end(int groupIdx) - which return exactly what you need for the implementation of this feature. Similarly you can look at the existing implementation of _RegExpMatch.namedGroup and _RegExpMatch.groupNames to figure out how to obtain name-to-index mapping for named groups.

We change it in the following way:

diff --git a/sdk/lib/_internal/vm/lib/regexp_patch.dart b/sdk/lib/_internal/vm/lib/regexp_patch.dart
index 5a80d108e86..e67a4399491 100644
--- a/sdk/lib/_internal/vm/lib/regexp_patch.dart
+++ b/sdk/lib/_internal/vm/lib/regexp_patch.dart
@@ -135,7 +135,35 @@ class _RegExpMatch implements RegExpMatch {
     return _regexp._groupNames;
   }
 
-  final RegExp _regexp;
+  List<({int start, int end})?> get captures {
+    final result = List<({int start, int end})?>.filled(groupCount + 1, null);
+    for (var i = 0; i <= groupCount; i++) {
+      if (_start(i) != -1) {
+        result[i] = (start: _start(i), end: _end(i));
+      }
+    }
+    return List.unmodifiable(result);
+  }
+
+  Map<String, ({int start, int end})> get namedCaptures {
+    final nameList = _regexp._groupNameList;
+    final result = <String, ({int start, int end})>{};
+    if (nameList != null) {
+      for (int i = 0; i < nameList.length; i += 2) {
+        final groupName = nameList[i] as String;
+        final groupIdx = nameList[i] as int;
+
+        final groupStart = _start(groupIdx);
+        final groupEnd = _end(groupIdx);
+        if (groupStart != -1) {
+          result[groupName] = (start: groupStart, end: groupEnd);
+        }
+      }
+    }
+    return Map.unmodifiable(result);
+  }
+
+  final _RegExp _regexp;
   final String input;
   final List<int> _match;
   static const int _MATCH_PAIR = 2;

Once we have applied this change we are ready to test:

Tip

When working on core libraries you can get faster compilation cycles by setting DART_GN_ARGS=precompile_tools=true environment variable.

$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures
...
FAILED: dartk-vm release_arm64 corelib/regexp/regexp_captures_test
Expected: Pass
Actual: RuntimeError

--- Command "vm" (took 803ms):
DART_CONFIGURATION=ReleaseARM64 xcodebuild/ReleaseARM64/dart --sound-null-safety -Dtest_runner.configuration=custom-configuration-1 --ignore-unrecognized-flags --packages=/Users/vegorov/src/dart/sdk/.dart_tool/package_config.json /Users/vegorov/src/dart/sdk/tests/corelib/regexp/regexp_captures_test.dart

exit code:
255

stderr:
Unhandled exception:
type 'String' is not a subtype of type 'int' in type cast
#0      _RegExpMatch.namedCaptures (dart:core-patch/regexp_patch.dart:154:38)
#1      main (file:///Users/vegorov/src/dart/sdk/tests/corelib/regexp/regexp_captures_test.dart:12:45)
#2      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:315:19)
#3      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:194:12)

I have intentionally made a typo in my code, instead of:

final groupName = nameList[i] as String;
final groupIdx = nameList[i] as int;

it should have been

final groupName = nameList[i] as String;
final groupIdx = nameList[i + 1] as int;

Now let us try again:

$ tools/build.py -m release dart
$ tools/test.py -m release corelib/regexp_captures
...
=== All 1 test passed ===

The test passes! Hooray! Ready to ship...

Not really: we still have to implement it in dart2js, DDC and dart2wasm.

Changing dart2js implementation

This implementation resides in sdk/lib/_internal/js_runtime/lib/regexp_helper.dart.

Note

VM ships its own RegExp engine, but dart2js, DDC and Wasm implementations all fallback to JS RegExp. That's why you will see a lot of JS interop code directly accessing properties on the underlying RegExp object.

The implementation of these methods looks like this:

diff --git a/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart b/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
index de9100b6378..c35bb4e7b8b 100644
--- a/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
+++ b/sdk/lib/_internal/js_runtime/lib/regexp_helper.dart
@@ -96,8 +96,9 @@ class JSSyntaxRegExp implements RegExp {
             } catch (e) {
               return e;
             }
-          })(#, # + # + # + # + #)''',
+          })(#, # + # + # + # + # + #)''',
         source,
+        'd',  // Always request indices.
         m,
         i,
         u,
@@ -229,6 +230,46 @@ class _MatchImplementation implements RegExpMatch {
     }
     return Iterable.empty();
   }
+
+  List<({int start, int end})?> get captures {
+    var result = List<({int start, int end})?>.filled(_match.length, null);
+    JSExtendableArray indices = JS('JSExtendableArray', '#.indices', _match);
+    for (var i = 0; i <= groupCount; i++) {
+      JSExtendableArray? slice = JS('JSExtendableArray|Null', '#', indices[i]);
+      if (slice != null) {
+        result[i] = (
+          start: JS('int', '#', slice[0]),
+          end: JS('int', '#', slice[1]),
+        );
+      }
+    }
+    return List.unmodifiable(result);
+  }
+
+  Map<String, ({int start, int end})> get namedCaptures {
+    var result = <String, ({int start, int end})>{};
+    var groups = JS('=Object|Null', '#.indices.groups', _match);
+    if (groups != null) {
+      var names = JSArray<String>.markGrowable(
+        JS('returns:JSExtendableArray;new:true', 'Object.keys(#)', groups),
+      );
+      for (var i = 0; i < names.length; i++) {
+        JSExtendableArray? value = JS(
+          'JSExtendableArray|Null',
+          '#[#]',
+          groups,
+          names[i],
+        );
+        if (value != null) {
+          result[names[i]] = (
+            start: JS('int', '#', value[0]),
+            end: JS('int', '#', value[1]),
+          );
+        }
+      }
+    }
+    return Map.unmodifiable(result);
+  }
 }
 
 class _AllMatchesIterable extends Iterable<RegExpMatch> {

Note that in addition to defining get captures and get namedCaptures we had to change the definition of JSSyntaxRegExp.makeNative to include the d flag when constructing RegExp, to tell JS implementation to compute and add indices property to the match object.

$ tools/build.py -m release dart2js_platform.dill
$ tools/test.py -c dart2js corelib/regexp_captures
...
=== All 1 test passed ===

Benchmarking the cost of d modifier

We have changed the implementation to unconditionally pass d to RegExp constructor. This means we have to ask ourselves a question whether this is something that adds an additional cost? To answer this question we need to benchmark it a bit, e.g.

// Build inputs
let K = 3;
let prefix = Array(K).join('-');
let inputs = [];
for (var i = 0; i < 1000; i++) {
  inputs.push(`${prefix} {{v${i}}} ---`);
  inputs.push(`${prefix} --------- ---`);
}

// Run benchmark applying re to each input.
function benchmark(name, re, N) {
  let start = performance.now();
  for (let i = 0; i < N; i++) {
    let sum = 0;
    for (var j = 0; j < inputs.length; j++) {
      sum += inputs[j].match(re) ? 0 : 1;
    }
    if (sum != 1000) throw `Unexpected: ${sum} at ${i}?`;
  }
  let end = performance.now();
  print(`${name} took ${(end - start) * 1000 * 1000 / (inputs.length * N)} ns per match`);
}

benchmark('without d', /(?:(?<=[^\\])|^){{(\w*)}}/, 10000);
benchmark('with d', /(?:(?<=[^\\])|^){{(\w*)}}/d, 10000);

This yields:

$ ~/.jsvu/bin/v8 /tmp/benchmark.js
without d took 30.658349999999995 ns per match
with d took 70.65830000000001 ns per match
$ ~/.jsvu/bin/jsc /tmp/benchmark.js
without d took 289.93300000000005 ns per match
with d took 301.391 ns per match
$ ~/.jsvu/bin/sm /tmp/benchmark.js
without d took 33.14544677734375 ns per match
with d took 74.343359375 ns per match

Note

SpiderMonkey and V8 use the same regexp engine so there no surprise their numbers are so close.

Changing K value in the benchmark allows to estimate that the cost of d modifier does not depend on the length of the string and is around 40ns for this particular regexp.

Similarly you can change number of capture groups in the regexp (e.g. (\w*), ((\w*)), ((((\w*)))), etc) and estimate that the cost is fairly linear: 40ns for one group, 60ns for two, 80ns for 4, 120ns for 8.

This information should be included into the review to help reviewer assess the cost of the change.

Changing DDC implementation

Implementation resides in sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart.

We make the following change:

diff --git a/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart b/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
index b2e16aca7a4..0215af59aeb 100644
--- a/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
+++ b/sdk/lib/_internal/js_dev_runtime/private/regexp_helper.dart
@@ -122,12 +122,13 @@ class JSSyntaxRegExp implements RegExp {
       '',
       '(function() {'
           'try {'
-          'return new RegExp(#, # + # + # + # + #);'
+          'return new RegExp(#, # + # + # + # + # + #);'
           '} catch (e) {'
           'return e;'
           '}'
           '})()',
       source,
+      'd',  // Always request indices.
       m,
       i,
       u,
@@ -256,6 +257,48 @@ class _MatchImplementation implements RegExpMatch {
     }
     return Iterable.empty();
   }
+
+  List<({int start, int end})?> get captures {
+    var result = List<({int start, int end})?>.filled(_match.length, null);
+    List indices = JS('JSExtendableArray', '#.indices', _match);
+    for (var i = 0; i <= groupCount; i++) {
+      JSExtendableArray? slice = JS('JSExtendableArray|Null', '#', indices[i]);
+      if (slice != null) {
+        result[i] = (
+          start: JS('int', '#', slice[0]),
+          end: JS('int', '#', slice[1]),
+        );
+      }
+    }
+    return List.unmodifiable(result);
+  }
+
+  Map<String, ({int start, int end})> get namedCaptures {
+    var result = <String, ({int start, int end})>{};
+    var groups = JS('=Object|Null', '#.indices.groups', _match);
+    if (groups != null) {
+      JSExtendableArray names = JS(
+        'JSExtendableArray',
+        'Object.keys(#)',
+        groups,
+      );
+      for (var i = 0; i < names.length; i++) {
+        JSExtendableArray? value = JS(
+          'JSExtendableArray|Null',
+          '#[#]',
+          groups,
+          names[i],
+        );
+        if (value != null) {
+          result[names[i]] = (
+            start: JS('int', '#', value[0]),
+            end: JS('int', '#', value[1]),
+          );
+        }
+      }
+    }
+    return Map.unmodifiable(result);
+  }
 }
 
 class _AllMatchesIterable extends Iterable<RegExpMatch> {
$ tools/build.py -m release ddc_stable_test_local
$ tools/test.py -c ddc corelib/regexp_captures
...
=== All 1 test passed ===

Changing Wasm implementation

Wasm implementation resides in sdk/lib/_internal/wasm/lib/regexp_helper.dart.

We apply the following patch:

diff --git a/sdk/lib/_internal/wasm/lib/regexp_helper.dart b/sdk/lib/_internal/wasm/lib/regexp_helper.dart
index 29364fcf9f5..0fea9ebc657 100644
--- a/sdk/lib/_internal/wasm/lib/regexp_helper.dart
+++ b/sdk/lib/_internal/wasm/lib/regexp_helper.dart
@@ -33,6 +33,30 @@ extension type JSNativeMatch._(JSArray _) implements JSArray {
   external JSNumber get index;
   external JSObject? get groups;
   external JSAny? pop();
+  external JSIndices get indices;
+}
+
+extension type JSIndices._(JSArray _) implements JSArray {
+  external JSObject? get groups;
+}
+
+js_types.JSArrayImpl<JSAny> _namedGroupIndices(JSNativeMatch o) {
+  return js_types.JSArrayImpl<JSAny>(
+    JS<WasmExternRef>(r"""m => {
+    let result = [];
+    if (typeof m.indices !== 'undefined' &&
+        typeof m.indices.groups !== 'undefined') {
+      let groups = m.indices.groups;
+      for (let key of Object.keys(groups)) {
+        let indices = groups[key];
+        if (typeof indices !== 'undefined') {
+          result.push(key, indices[0], indices[1]);
+        }
+      }
+    }
+    return result;
+  }""", o.toExternRef),
+  );
 }
 
 extension type JSNativeRegExp._(JSObject _) implements JSObject {
@@ -117,7 +141,7 @@ class JSSyntaxRegExp implements RegExp {
     String u = unicode ? 'u' : '';
     String s = dotAll ? 's' : '';
     String g = global ? 'g' : '';
-    String modifiers = '$m$i$u$s$g';
+    String modifiers = 'd$m$i$u$s$g';
     // The call to create the regexp is wrapped in a try catch so we can
     // reformat the exception if need be.
     final result = JS<WasmExternRef?>(
@@ -238,6 +262,32 @@ class _MatchImplementation implements RegExpMatch {
     }
     return Iterable.empty();
   }
+
+  List<({int start, int end})?> get captures {
+    final result = List<({int start, int end})?>.filled(_match.length, null);
+    for (var i = 0; i <= groupCount; i++) {
+      final slice = _match.indices[i] as JSArray?;
+      if (slice != null) {
+        result[i] = (
+          start: (slice[0] as JSNumber).toDartInt,
+          end: (slice[1] as JSNumber).toDartInt,
+        );
+      }
+    }
+    return List.unmodifiable(result);
+  }
+
+  Map<String, ({int start, int end})> get namedCaptures {
+    final result = <String, ({int start, int end})>{};
+    final groups = _namedGroupIndices(_match);
+    for (var i = 0; i < groups.length; i += 3) {
+      result[(groups[i] as JSString).toDart] = (
+        start: (groups[i + 1] as JSNumber).toDartInt,
+        end: (groups[i + 2] as JSNumber).toDartInt,
+      );
+    }
+    return Map.unmodifiable(result);
+  }
 }
 
 class _AllMatchesIterable extends Iterable<RegExpMatch> {
$ tools/build.py -m release dart2wasm
$ tools/test.py -c dart2wasm corelib/regexp_captures
...
=== All 1 test passed ===

Step 4: Sending a change for review

Dart SDK uses Gerrit instance located at https://dart-review.googlesource.com/ for code reviews. Basics of Gerrit workflow are described here. This is our main and preferred way - though we also do support smaller contributions via GitHub PRs, which are automatically mirrored into Gerrit.

To work with Gerrit (e.g. to reply to review comments) you'll need a Google account. Signing in to Gerrit is done by going to https://dart-review.googlesource.com/ and pressing Sign in in the top right corner. Once signed in you can navigate to Obtain password which will generate a cookie to place in .gitcookies.

Once these steps are done you are ready to send a change for review:

$ git new-branch regexp-captures
$ git add ...
$ git commit
$ git cl upload

Note

By default git-cl will squash the history of your branch and upload a single CL - this means the history of your local branch does not really matter.

Important

It is a good idea to follow Commit Message Best Practices when writing CL description because CL description will become commit message once your CL is reviewed and landed.

CLs start in WIP (Work in Progress) state. You will need to send your CL to an appropriate reviewer (you can use OWNER files in the code base to find one). Core library changes should be sent to Lasse Nielsen <[email protected]>, who owns core libraries.

Before sending you can optionally walk through the code and leave additional comments to guide reviewers through. Once you are ready - sending is done by pressing REPLY, updating Reviewer field and then pressing SEND. This will send the link to a CL and all associated comments to the reviewer.

Note

If you don't know an appropriate reviewer or are struggling with Gerrit for some other reason you can also just share a link to your Gerrit CL on the issue tracker and ask for a review.

And now you wait for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment