Package regexp implements regular expression search.
The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by RE2 and described at https://golang.org/s/re2syntax, except for \C. For an overview of the syntax, run
go doc regexp/syntax
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.) For more information about this property, see
[https://swtch.com/~rsc/regexp/regexp1.html](https://swtch.com/~rsc/regexp/regexp1.html)
or any book about automata theory.
All characters are UTF-8-encoded code points.
There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
If 'All' is present, the routine matches successive non-overlapping matches of the entire expression. Empty matches abutting a preceding match are ignored. The return value is a slice containing the successive return values of the corresponding non-'All' routine. These routines take an extra integer argument, n. If n >= 0, the function returns at most n matches/submatches; otherwise, it returns all of them.
If 'String' is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate.
If 'Submatch' is present, the return value is a slice identifying the successive submatches of the expression. Submatches are matches of parenthesized subexpressions (also known as capturing groups) within the regular expression, numbered from left to right in order of opening parenthesis. Submatch 0 is the match of the entire expression, submatch 1 the match of the first parenthesized subexpression, and so on.
If 'Index' is present, matches and submatches are identified by byte index pairs within the input string: result[2n:2n+1] identifies the indexes of the nth submatch. The pair for n==0 identifies the match of the entire expression. If 'Index' is not present, the match is identified by the text of the match/submatch. If an index is negative or text is nil, it means that subexpression did not match any string in the input. For 'String' versions an empty string means either no match or an empty match.
There is also a subset of the methods that can be applied to text read from a RuneReader:
MatchReader, FindReaderIndex, FindReaderSubmatchIndex
This set may grow. Note that regular expression matches may need to examine text beyond the text returned by a match, so the methods that match text from a RuneReader may read arbitrarily far into the input before returning.
(There are a few other methods that do not match this pattern.)
- Constants
- Variables
- var bitStatePool
- var t
- var flag
- var onePassPool
- var flag
- var arrayNoInts
- var buf
- var noRune
- var noNext
- var lx
- var rx
- var anyRuneNotNL
- var anyRune
- var instQueue
- var visitQueue
- var check
- var onePassRunes
- var matchSize
- var matchPool
- var lnext
- var buf
- var endPos
- var dstCap
- var width
- var specialBytes
- var i
- var end
- var width
- var dstCap
- var dstCap
- var dstCap
- var dstCap
- var result
- var result
- var result
- var result
- var result
- var result
- var result
- var result
- var goodRe
- var badRe
- var replaceTests
- var replaceLiteralTests
- var replaceFuncTests
- var metaTests
- var literalPrefixTests
- var emptySubexpIndices
- var subexpCases
- var splitTests
- var sink
- var compileBenchData
- var minInputLenTests
- var txt
- var str
- var input
- var inStrings
- var re
- var refull
- var nfail
- var ncase
- var text
- var run
- var match
- var notab
- var x
- var end
- var v
- var err
- var text
- var benchData
- var benchSizes
- var findTests
- var runeMergeTests
- var onePassTests
- var p
- var re
- var err
- var onePassTests1
- Types
- type job struct
- type bitState struct
- type queue struct
- type entry struct
- type thread struct
- type machine struct
- func (m *machine) init(ncap int)
- func (m *machine) alloc(i *syntax.Inst) *thread
- func (m *machine) match(i input, pos int) bool
- func (m *machine) clear(q *queue)
- func (m *machine) step(runq, nextq *queue, pos, nextPos int, c rune, nextCond *lazyFlag)
- func (m *machine) add(q *queue, pc uint32, pos int, cap []int, cond *lazyFlag, t *thread) *thread
- type inputs struct
- type lazyFlag uint64
- type onePassMachine struct
- type onePassProg struct
- type onePassInst struct
- type queueOnePass struct
- type runeSlice []rune
- type Regexp struct
- func Compile(expr string) (*Regexp, error)
- func CompilePOSIX(expr string) (*Regexp, error)
- func compile(expr string, mode syntax.Flags, longest bool) (*Regexp, error)
- func MustCompile(str string) *Regexp
- func MustCompilePOSIX(str string) *Regexp
- func compileTest(t *testing.T, expr string, error string) *Regexp
- func tryCompile(s string) (re *Regexp, err error)
- func (re *Regexp) tryBacktrack(b *bitState, i input, pc uint32, pos int) bool
- func (re *Regexp) backtrack(ib []byte, is string, pos int, ncap int, dstCap []int) []int
- func (re *Regexp) doOnePass(ir io.RuneReader, ib []byte, is string, pos, ncap int, dstCap []int) []int
- func (re *Regexp) doMatch(r io.RuneReader, b []byte, s string) bool
- func (re *Regexp) doExecute(r io.RuneReader, b []byte, s string, pos int, ncap int, dstCap []int) []int
- func (re *Regexp) String() string
- func (re *Regexp) Copy() *Regexp
- func (re *Regexp) Longest()
- func (re *Regexp) get() *machine
- func (re *Regexp) put(m *machine)
- func (re *Regexp) NumSubexp() int
- func (re *Regexp) SubexpNames() []string
- func (re *Regexp) SubexpIndex(name string) int
- func (re *Regexp) LiteralPrefix() (prefix string, complete bool)
- func (re *Regexp) MatchReader(r io.RuneReader) bool
- func (re *Regexp) MatchString(s string) bool
- func (re *Regexp) Match(b []byte) bool
- func (re *Regexp) ReplaceAllString(src, repl string) string
- func (re *Regexp) ReplaceAllLiteralString(src, repl string) string
- func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string
- func (re *Regexp) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []byte
- func (re *Regexp) ReplaceAll(src, repl []byte) []byte
- func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte
- func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte
- func (re *Regexp) pad(a []int) []int
- func (re *Regexp) allMatches(s string, b []byte, n int, deliver func([]int))
- func (re *Regexp) Find(b []byte) []byte
- func (re *Regexp) FindIndex(b []byte) (loc []int)
- func (re *Regexp) FindString(s string) string
- func (re *Regexp) FindStringIndex(s string) (loc []int)
- func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)
- func (re *Regexp) FindSubmatch(b []byte) [][]byte
- func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte
- func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte
- func (re *Regexp) expand(dst []byte, template string, bsrc []byte, src string, match []int) []byte
- func (re *Regexp) FindSubmatchIndex(b []byte) []int
- func (re *Regexp) FindStringSubmatch(s string) []string
- func (re *Regexp) FindStringSubmatchIndex(s string) []int
- func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int
- func (re *Regexp) FindAll(b []byte, n int) [][]byte
- func (re *Regexp) FindAllIndex(b []byte, n int) [][]int
- func (re *Regexp) FindAllString(s string, n int) []string
- func (re *Regexp) FindAllStringIndex(s string, n int) [][]int
- func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte
- func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int
- func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string
- func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int
- func (re *Regexp) Split(s string, n int) []string
- type input interface
- type inputString struct
- type inputBytes struct
- type inputReader struct
- type stringError struct
- type ReplaceTest struct
- type ReplaceFuncTest struct
- type MetaTest struct
- type subexpIndex struct
- type subexpCase struct
- type FindTest struct
- Functions
- func freeBitState(b *bitState)
- func maxBitStateLen(prog *syntax.Prog) int
- func shouldBacktrack(prog *syntax.Prog) bool
- func freeOnePassMachine(m *onePassMachine)
- func onePassPrefix(p *syntax.Prog) (prefix string, complete bool, pc uint32)
- func onePassNext(i *onePassInst, r rune) uint32
- func iop(i *syntax.Inst) syntax.InstOp
- func mergeRuneSets(leftRunes, rightRunes *[]rune, leftPC, rightPC uint32) ([]rune, []uint32)
- func cleanupOnePass(prog *onePassProg, original *syntax.Prog)
- func minInputLen(re *syntax.Regexp) int
- func quote(s string) string
- func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)
- func MatchString(pattern string, s string) (matched bool, err error)
- func Match(pattern string, b []byte) (matched bool, err error)
- func special(b byte) bool
- func init()
- func QuoteMeta(s string) string
- func extract(str string) (name string, num int, rest string, ok bool)
- func TestGoodCompile(t *testing.T)
- func TestBadCompile(t *testing.T)
- func matchTest(t *testing.T, test *FindTest)
- func TestMatch(t *testing.T)
- func matchFunctionTest(t *testing.T, test *FindTest)
- func TestMatchFunction(t *testing.T)
- func copyMatchTest(t *testing.T, test *FindTest)
- func TestCopyMatch(t *testing.T)
- func TestReplaceAll(t *testing.T)
- func TestReplaceAllLiteral(t *testing.T)
- func TestReplaceAllFunc(t *testing.T)
- func TestQuoteMeta(t *testing.T)
- func TestLiteralPrefix(t *testing.T)
- func TestSubexp(t *testing.T)
- func TestSplit(t *testing.T)
- func TestParseAndCompile(t *testing.T)
- func TestOnePassCutoff(t *testing.T)
- func TestSwitchBacktrack(t *testing.T)
- func BenchmarkFind(b *testing.B)
- func BenchmarkFindAllNoMatches(b *testing.B)
- func BenchmarkFindString(b *testing.B)
- func BenchmarkFindSubmatch(b *testing.B)
- func BenchmarkFindStringSubmatch(b *testing.B)
- func BenchmarkLiteral(b *testing.B)
- func BenchmarkNotLiteral(b *testing.B)
- func BenchmarkMatchClass(b *testing.B)
- func BenchmarkMatchClass_InRange(b *testing.B)
- func BenchmarkReplaceAll(b *testing.B)
- func BenchmarkAnchoredLiteralShortNonMatch(b *testing.B)
- func BenchmarkAnchoredLiteralLongNonMatch(b *testing.B)
- func BenchmarkAnchoredShortMatch(b *testing.B)
- func BenchmarkAnchoredLongMatch(b *testing.B)
- func BenchmarkOnePassShortA(b *testing.B)
- func BenchmarkNotOnePassShortA(b *testing.B)
- func BenchmarkOnePassShortB(b *testing.B)
- func BenchmarkNotOnePassShortB(b *testing.B)
- func BenchmarkOnePassLongPrefix(b *testing.B)
- func BenchmarkOnePassLongNotPrefix(b *testing.B)
- func BenchmarkMatchParallelShared(b *testing.B)
- func BenchmarkMatchParallelCopied(b *testing.B)
- func BenchmarkQuoteMetaAll(b *testing.B)
- func BenchmarkQuoteMetaNone(b *testing.B)
- func BenchmarkCompile(b *testing.B)
- func TestDeepEqual(t *testing.T)
- func TestMinInputLen(t *testing.T)
- func TestRE2Exhaustive(t *testing.T)
- func TestRE2Search(t *testing.T)
- func testRE2(t *testing.T, file string)
- func runFull(re, refull *Regexp, text string) ([]int, string)
- func runPartial(re, refull *Regexp, text string) ([]int, string)
- func runFullLongest(re, refull *Regexp, text string) ([]int, string)
- func runPartialLongest(re, refull *Regexp, text string) ([]int, string)
- func matchFull(re, refull *Regexp, text string) (bool, string)
- func matchPartial(re, refull *Regexp, text string) (bool, string)
- func matchFullLongest(re, refull *Regexp, text string) (bool, string)
- func matchPartialLongest(re, refull *Regexp, text string) (bool, string)
- func isSingleBytes(s string) bool
- func parseResult(t *testing.T, file string, lineno int, res string) []int
- func same(x, y []int) bool
- func TestFowler(t *testing.T)
- func testFowler(t *testing.T, file string)
- func parseFowlerResult(s string) (ok, compiled, matched bool, pos []int)
- func makeText(n int) []byte
- func BenchmarkMatch(b *testing.B)
- func BenchmarkMatch_onepass_regex(b *testing.B)
- func TestLongest(t *testing.T)
- func TestProgramTooLongForBacktrack(t *testing.T)
- func build(n int, x ...int) [][]int
- func TestFind(t *testing.T)
- func TestFindString(t *testing.T)
- func testFindIndex(test *FindTest, result []int, t *testing.T)
- func TestFindIndex(t *testing.T)
- func TestFindStringIndex(t *testing.T)
- func TestFindReaderIndex(t *testing.T)
- func TestFindAll(t *testing.T)
- func TestFindAllString(t *testing.T)
- func testFindAllIndex(test *FindTest, result [][]int, t *testing.T)
- func TestFindAllIndex(t *testing.T)
- func TestFindAllStringIndex(t *testing.T)
- func testSubmatchBytes(test *FindTest, n int, submatches []int, result [][]byte, t *testing.T)
- func TestFindSubmatch(t *testing.T)
- func testSubmatchString(test *FindTest, n int, submatches []int, result []string, t *testing.T)
- func TestFindStringSubmatch(t *testing.T)
- func testSubmatchIndices(test *FindTest, n int, expect, result []int, t *testing.T)
- func testFindSubmatchIndex(test *FindTest, result []int, t *testing.T)
- func TestFindSubmatchIndex(t *testing.T)
- func TestFindStringSubmatchIndex(t *testing.T)
- func TestFindReaderSubmatchIndex(t *testing.T)
- func TestFindAllSubmatch(t *testing.T)
- func TestFindAllStringSubmatch(t *testing.T)
- func testFindAllSubmatchIndex(test *FindTest, result [][]int, t *testing.T)
- func TestFindAllSubmatchIndex(t *testing.T)
- func TestFindAllStringSubmatchIndex(t *testing.T)
- func TestMergeRuneSet(t *testing.T)
- func TestCompileOnePass(t *testing.T)
- func TestRunOnePass(t *testing.T)
const visitedBits = 32const maxBacktrackProg = 500 // len(prog.Inst) <= maxconst maxBacktrackVector = 256 * 1024 // bit vector size <= max (bits)const mergeFailed = uint32(0xffffffff)mergeRuneSets merges two non-intersecting runesets, and returns the merged result, and a NextIp array. The idea is that if a rune matches the OnePassRunes at index i, NextIp[i/2] is the target. If the input sets intersect, an empty runeset and a NextIp array with the single element mergeFailed is returned. The code assumes that both inputs contain ordered and non-intersecting rune pairs.
const endOfText rune = -1const startSize = 10 // The size at which to start a slice in the 'All' routines.var bitStatePool sync.Poolvar t *threadvar flag lazyFlagvar onePassPool sync.Poolvar flag lazyFlagvar arrayNoInts [0]intarrayNoInts is returned by doExecute match if nil dstCap is passed to it with ncap=0.
var buf strings.BuilderHave prefix; gather characters.
var noRune = ...var noNext = ...var lx, rx intvar lx, rx intvar anyRuneNotNL = ...var anyRune = ...var instQueue = newQueue(len(p.Inst))var visitQueue = newQueue(len(p.Inst))var check func(uint32, []bool) boolvar onePassRunes = make([][]rune, len(p.Inst))var matchSize = ...Pools of machine for use during (Regexp).doExecute, split up by the size of the execution queues. matchPool[i] machines have queue size matchSize[i]. On a 64-bit system each queue entry is 16 bytes, so matchPool[0] has 162128 = 4kB queues, etc. The final matchPool is a catch-all for very large queues.
var matchPool [len(matchSize)]sync.PoolPools of machine for use during (Regexp).doExecute, split up by the size of the execution queues. matchPool[i] machines have queue size matchSize[i]. On a 64-bit system each queue entry is 16 bytes, so matchPool[0] has 162128 = 4kB queues, etc. The final matchPool is a catch-all for very large queues.
var lnext intvar buf []bytevar endPos intvar dstCap [2]intvar width intAdvance past this match; always advance at least one character.
var specialBytes [16]byteBitmap used by func special to check whether a character needs to be escaped.
var i intA byte loop is correct because all metacharacters are ASCII.
var end intvar width intvar dstCap [2]intvar dstCap [2]intvar dstCap [4]intvar dstCap [4]intvar result [][]bytevar result [][]intvar result []stringvar result [][]intvar result [][][]bytevar result [][]intvar result [][]stringvar result [][]intvar goodRe = ...var badRe = ...var replaceTests = ...var replaceLiteralTests = ...var replaceFuncTests = ...var metaTests = ...var literalPrefixTests = ...var emptySubexpIndices = ...var subexpCases = ...var splitTests = ...var sink stringvar compileBenchData = ...var minInputLenTests = ...var txt io.Readervar str []stringvar input []stringvar inStrings boolvar re *Regexpvar refull *Regexpvar nfail intvar ncase intvar text stringvar run = ...var match = ...var notab = MustCompilePOSIX(`[^\t]+`)var x []intvar end byte = ')'var v = -1var err errorvar text []bytevar benchData = ...var benchSizes = ...var findTests = ...var runeMergeTests = ...var onePassTests = ...var p *syntax.Progvar re *syntax.Regexpvar err errorvar onePassTests1 = ...TODO(cespare): Unify with onePassTests and rationalize one-pass test cases.
type job struct {
pc uint32
arg bool
pos int
}A job is an entry on the backtracker's job stack. It holds the instruction pc and the position in the input.
type bitState struct {
end int
cap []int
matchcap []int
jobs []job
visited []uint32
inputs inputs
}bitState holds state for the backtracker.
func newBitState() *bitStatefunc (b *bitState) reset(prog *syntax.Prog, end int, ncap int)reset resets the state of the backtracker. end is the end position in the input. ncap is the number of captures.
func (b *bitState) shouldVisit(pc uint32, pos int) boolshouldVisit reports whether the combination of (pc, pos) has not been visited yet.
func (b *bitState) push(re *Regexp, pc uint32, pos int, arg bool)push pushes (pc, pos, arg) onto the job stack if it should be visited.
type queue struct {
sparse []uint32
dense []entry
}A queue is a 'sparse array' holding pending threads of execution. See https://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html
type entry struct {
pc uint32
t *thread
}An entry is an entry on a queue. It holds both the instruction pc and the actual thread. Some queue entries are just place holders so that the machine knows it has considered that pc. Such entries have t == nil.
type thread struct {
inst *syntax.Inst
cap []int
}A thread is the state of a single path through the machine: an instruction and a corresponding capture array. See https://swtch.com/~rsc/regexp/regexp2.html
type machine struct {
re *Regexp // corresponding Regexp
p *syntax.Prog // compiled program
q0, q1 queue // two queues for runq, nextq
pool []*thread // pool of available threads
matched bool // whether a match was found
matchcap []int // capture information for the match
inputs inputs
}A machine holds all the state during an NFA simulation for p.
func (m *machine) init(ncap int)func (m *machine) alloc(i *syntax.Inst) *threadalloc allocates a new thread with the given instruction. It uses the free pool if possible.
func (m *machine) match(i input, pos int) boolmatch runs the machine over the input starting at pos. It reports whether a match was found. If so, m.matchcap holds the submatch information.
func (m *machine) clear(q *queue)clear frees all threads on the thread queue.
func (m *machine) step(runq, nextq *queue, pos, nextPos int, c rune, nextCond *lazyFlag)step executes one step of the machine, running each of the threads on runq and appending new threads to nextq. The step processes the rune c (which may be endOfText), which starts at position pos and ends at nextPos. nextCond gives the setting for the empty-width flags after c.
func (m *machine) add(q *queue, pc uint32, pos int, cap []int, cond *lazyFlag, t *thread) *threadadd adds an entry to q for pc, unless the q already has such an entry. It also recursively adds an entry for all instructions reachable from pc by following empty-width conditions satisfied by cond. pos gives the current position in the input.
type inputs struct {
// cached inputs, to avoid allocation
bytes inputBytes
string inputString
reader inputReader
}func (i *inputs) newBytes(b []byte) inputfunc (i *inputs) newString(s string) inputfunc (i *inputs) newReader(r io.RuneReader) inputfunc (i *inputs) clear()func (i *inputs) init(r io.RuneReader, b []byte, s string) (input, int)type lazyFlag uint64A lazyFlag is a lazily-evaluated syntax.EmptyOp, for checking zero-width flags like ^ $ \A \z \B \b. It records the pair of relevant runes and does not determine the implied flags until absolutely necessary (most of the time, that means never).
func newLazyFlag(r1, r2 rune) lazyFlagfunc (f lazyFlag) match(op syntax.EmptyOp) booltype onePassMachine struct {
inputs inputs
matchcap []int
}func newOnePassMachine() *onePassMachinetype onePassProg struct {
Inst []onePassInst
Start int // index of start instruction
NumCap int // number of InstCapture insts in re
}A onePassProg is a compiled one-pass regular expression program. It is the same as syntax.Prog except for the use of onePassInst.
func onePassCopy(prog *syntax.Prog) *onePassProgonePassCopy creates a copy of the original Prog, as we'll be modifying it
func makeOnePass(p *onePassProg) *onePassProgmakeOnePass creates a onepass Prog, if possible. It is possible if at any alt, the match engine can always tell which branch to take. The routine may modify p if it is turned into a onepass Prog. If it isn't possible for this to be a onepass Prog, the Prog nil is returned. makeOnePass is recursive to the size of the Prog.
func compileOnePass(prog *syntax.Prog) (p *onePassProg)compileOnePass returns a new *syntax.Prog suitable for onePass execution if the original Prog can be recharacterized as a one-pass regexp program, or syntax.nil if the Prog cannot be converted. For a one pass prog, the fundamental condition that must be true is: at any InstAlt, there must be no ambiguity about what branch to take.
type onePassInst struct {
syntax.Inst
Next []uint32
}A onePassInst is a single instruction in a one-pass regular expression program. It is the same as syntax.Inst except for the new 'Next' field.
type queueOnePass struct {
sparse []uint32
dense []uint32
size, nextIndex uint32
}Sparse Array implementation is used as a queueOnePass.
func newQueue(size int) (q *queueOnePass)func (q *queueOnePass) empty() boolfunc (q *queueOnePass) next() (n uint32)func (q *queueOnePass) clear()func (q *queueOnePass) contains(u uint32) boolfunc (q *queueOnePass) insert(u uint32)func (q *queueOnePass) insertNew(u uint32)type runeSlice []runeruneSlice exists to permit sorting the case-folded rune sets.
func (p runeSlice) Len() intfunc (p runeSlice) Less(i, j int) boolfunc (p runeSlice) Swap(i, j int)type Regexp struct {
expr string // as passed to Compile
prog *syntax.Prog // compiled program
onepass *onePassProg // onepass program or nil
numSubexp int
maxBitStateLen int
subexpNames []string
prefix string // required prefix in unanchored matches
prefixBytes []byte // prefix, as a []byte
prefixRune rune // first rune in prefix
prefixEnd uint32 // pc for last rune in prefix
mpool int // pool for machines
matchcap int // size of recorded match lengths
prefixComplete bool // prefix is the entire regexp
cond syntax.EmptyOp // empty-width conditions required at start of match
minInputLen int // minimum length of the input in bytes
// This field can be modified by the Longest method,
// but it is otherwise read-only.
longest bool // whether regexp prefers leftmost-longest match
}Regexp is the representation of a compiled regular expression. A Regexp is safe for concurrent use by multiple goroutines, except for configuration methods, such as Longest.
func Compile(expr string) (*Regexp, error)Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.
When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see CompilePOSIX.
func CompilePOSIX(expr string) (*Regexp, error)CompilePOSIX is like Compile but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.
That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.
However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See https://swtch.com/~rsc/regexp/regexp2.html#posix for details.
func compile(expr string, mode syntax.Flags, longest bool) (*Regexp, error)func MustCompile(str string) *RegexpMustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
func MustCompilePOSIX(str string) *RegexpMustCompilePOSIX is like CompilePOSIX but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
func compileTest(t *testing.T, expr string, error string) *Regexpfunc tryCompile(s string) (re *Regexp, err error)func (re *Regexp) tryBacktrack(b *bitState, i input, pc uint32, pos int) booltryBacktrack runs a backtracking search starting at pos.
func (re *Regexp) backtrack(ib []byte, is string, pos int, ncap int, dstCap []int) []intbacktrack runs a backtracking search of prog on the input starting at pos.
func (re *Regexp) doOnePass(ir io.RuneReader, ib []byte, is string, pos, ncap int, dstCap []int) []int
func (re *Regexp) doOnePass(ir io.RuneReader, ib []byte, is string, pos, ncap int, dstCap []int) []intdoOnePass implements r.doExecute using the one-pass execution engine.
func (re *Regexp) doMatch(r io.RuneReader, b []byte, s string) booldoMatch reports whether either r, b or s match the regexp.
func (re *Regexp) doExecute(r io.RuneReader, b []byte, s string, pos int, ncap int, dstCap []int) []int
func (re *Regexp) doExecute(r io.RuneReader, b []byte, s string, pos int, ncap int, dstCap []int) []intdoExecute finds the leftmost match in the input, appends the position of its subexpressions to dstCap and returns dstCap.
nil is returned if no matches are found and non-nil if matches are found.
func (re *Regexp) String() stringString returns the source text used to compile the regular expression.
func (re *Regexp) Copy() *RegexpCopy returns a new Regexp object copied from re. Calling Longest on one copy does not affect another.
Deprecated: In earlier releases, when using a Regexp in multiple goroutines, giving each goroutine its own copy helped to avoid lock contention. As of Go 1.12, using Copy is no longer necessary to avoid lock contention. Copy may still be appropriate if the reason for its use is to make two copies with different Longest settings.
func (re *Regexp) Longest()Longest makes future searches prefer the leftmost-longest match. That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This method modifies the Regexp and may not be called concurrently with any other methods.
func (re *Regexp) get() *machineget returns a machine to use for matching re. It uses the re's machine cache if possible, to avoid unnecessary allocation.
func (re *Regexp) put(m *machine)put returns a machine to the correct machine pool.
func (re *Regexp) NumSubexp() intNumSubexp returns the number of parenthesized subexpressions in this Regexp.
func (re *Regexp) SubexpNames() []stringSubexpNames returns the names of the parenthesized subexpressions in this Regexp. The name for the first sub-expression is names[1], so that if m is a match slice, the name for m[i] is SubexpNames()[i]. Since the Regexp as a whole cannot be named, names[0] is always the empty string. The slice should not be modified.
func (re *Regexp) SubexpIndex(name string) intSubexpIndex returns the index of the first subexpression with the given name, or -1 if there is no subexpression with that name.
Note that multiple subexpressions can be written using the same name, as in (?Pa+)(?Pb+), which declares two subexpressions named "bob". In this case, SubexpIndex returns the index of the leftmost such subexpression in the regular expression.
func (re *Regexp) LiteralPrefix() (prefix string, complete bool)LiteralPrefix returns a literal string that must begin any match of the regular expression re. It returns the boolean true if the literal string comprises the entire regular expression.
func (re *Regexp) MatchReader(r io.RuneReader) boolMatchReader reports whether the text returned by the RuneReader contains any match of the regular expression re.
func (re *Regexp) MatchString(s string) boolMatchString reports whether the string s contains any match of the regular expression re.
func (re *Regexp) Match(b []byte) boolMatch reports whether the byte slice b contains any match of the regular expression re.
func (re *Regexp) ReplaceAllString(src, repl string) stringReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.
func (re *Regexp) ReplaceAllLiteralString(src, repl string) stringReplaceAllLiteralString returns a copy of src, replacing matches of the Regexp with the replacement string repl. The replacement repl is substituted directly, without using Expand.
func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) stringReplaceAllStringFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched substring. The replacement returned by repl is substituted directly, without using Expand.
func (re *Regexp) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []byte
func (re *Regexp) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []bytefunc (re *Regexp) ReplaceAll(src, repl []byte) []byteReplaceAll returns a copy of src, replacing matches of the Regexp with the replacement text repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.
func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byteReplaceAllLiteral returns a copy of src, replacing matches of the Regexp with the replacement bytes repl. The replacement repl is substituted directly, without using Expand.
func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byteReplaceAllFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched byte slice. The replacement returned by repl is substituted directly, without using Expand.
func (re *Regexp) pad(a []int) []intThe number of capture values in the program may correspond to fewer capturing expressions than are in the regexp. For example, "(a){0}" turns into an empty program, so the maximum capture in the program is 0 but we need to return an expression for \1. Pad appends -1s to the slice a as needed.
func (re *Regexp) allMatches(s string, b []byte, n int, deliver func([]int))allMatches calls deliver at most n times with the location of successive matches in the input text. The input text is b if non-nil, otherwise s.
func (re *Regexp) Find(b []byte) []byteFind returns a slice holding the text of the leftmost match in b of the regular expression. A return value of nil indicates no match.
func (re *Regexp) FindIndex(b []byte) (loc []int)FindIndex returns a two-element slice of integers defining the location of the leftmost match in b of the regular expression. The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.
func (re *Regexp) FindString(s string) stringFindString returns a string holding the text of the leftmost match in s of the regular expression. If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use FindStringIndex or FindStringSubmatch if it is necessary to distinguish these cases.
func (re *Regexp) FindStringIndex(s string) (loc []int)FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s of the regular expression. The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.
func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)FindReaderIndex returns a two-element slice of integers defining the location of the leftmost match of the regular expression in text read from the RuneReader. The match text was found in the input stream at byte offset loc[0] through loc[1]-1. A return value of nil indicates no match.
func (re *Regexp) FindSubmatch(b []byte) [][]byteFindSubmatch returns a slice of slices holding the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the 'Submatch' descriptions in the package comment. A return value of nil indicates no match.
func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byteExpand appends template to dst and returns the result; during the append, Expand replaces variables in the template with corresponding matches drawn from src. The match slice should have been returned by FindSubmatchIndex.
In the template, a variable is denoted by a substring of the form
In the $name form, name is taken to be as long as possible:
To insert a literal $ in the output, use $$ in the template.
func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte (exported)
func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byteExpandString is like Expand but the template and source are strings. It appends to and returns a byte slice in order to give the calling code control over allocation.
func (re *Regexp) expand(dst []byte, template string, bsrc []byte, src string, match []int) []bytefunc (re *Regexp) FindSubmatchIndex(b []byte) []intFindSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindStringSubmatch(s string) []stringFindStringSubmatch returns a slice of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindStringSubmatchIndex(s string) []intFindStringSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []intFindReaderSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression of text read by the RuneReader, and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAll(b []byte, n int) [][]byteFindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllIndex(b []byte, n int) [][]intFindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllString(s string, n int) []stringFindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllStringIndex(s string, n int) [][]intFindAllStringIndex is the 'All' version of FindStringIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byteFindAllSubmatch is the 'All' version of FindSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]intFindAllSubmatchIndex is the 'All' version of FindSubmatchIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]stringFindAllStringSubmatch is the 'All' version of FindStringSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]intFindAllStringSubmatchIndex is the 'All' version of FindStringSubmatchIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.
func (re *Regexp) Split(s string, n int) []stringSplit slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.
The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.
Example:
s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]
The count determines the number of substrings to return:
n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings
type input interface {
step(pos int) (r rune, width int) // advance one rune
canCheckPrefix() bool // can we look ahead without losing info?
hasPrefix(re *Regexp) bool
index(re *Regexp, pos int) int
context(pos int) lazyFlag
}input abstracts different representations of the input text. It provides one-character lookahead.
type inputString struct {
str string
}inputString scans a string.
func (i *inputString) step(pos int) (rune, int)func (i *inputString) canCheckPrefix() boolfunc (i *inputString) hasPrefix(re *Regexp) boolfunc (i *inputString) index(re *Regexp, pos int) intfunc (i *inputString) context(pos int) lazyFlagtype inputBytes struct {
str []byte
}inputBytes scans a byte slice.
func (i *inputBytes) step(pos int) (rune, int)func (i *inputBytes) canCheckPrefix() boolfunc (i *inputBytes) hasPrefix(re *Regexp) boolfunc (i *inputBytes) index(re *Regexp, pos int) intfunc (i *inputBytes) context(pos int) lazyFlagtype inputReader struct {
r io.RuneReader
atEOT bool
pos int
}inputReader scans a RuneReader.
func (i *inputReader) step(pos int) (rune, int)func (i *inputReader) canCheckPrefix() boolfunc (i *inputReader) hasPrefix(re *Regexp) boolfunc (i *inputReader) index(re *Regexp, pos int) intfunc (i *inputReader) context(pos int) lazyFlagtype stringError struct {
re string
err string
}type ReplaceTest struct {
pattern, replacement, input, output string
}type ReplaceFuncTest struct {
pattern string
replacement func(string) string
input, output string
}type MetaTest struct {
pattern, output, literal string
isLiteral bool
}type subexpIndex struct {
name string
index int
}type subexpCase struct {
input string
num int
names []string
indices []subexpIndex
}type FindTest struct {
pat string
text string
matches [][]int
}For each pattern/text pair, what is the expected output of each function? We can derive the textual results from the indexed results, the non-submatch results from the submatched results, the single results from the 'all' results, and the byte results from the string results. Therefore the table includes only the FindAllStringSubmatchIndex result.
func (t FindTest) String() stringfunc freeBitState(b *bitState)func maxBitStateLen(prog *syntax.Prog) intmaxBitStateLen returns the maximum length of a string to search with the backtracker using prog.
func shouldBacktrack(prog *syntax.Prog) boolshouldBacktrack reports whether the program is too long for the backtracker to run.
func freeOnePassMachine(m *onePassMachine)func onePassPrefix(p *syntax.Prog) (prefix string, complete bool, pc uint32)OnePassPrefix returns a literal string that all matches for the regexp must start with. Complete is true if the prefix is the entire match. Pc is the index of the last rune instruction in the string. The OnePassPrefix skips over the mandatory EmptyBeginText
func onePassNext(i *onePassInst, r rune) uint32OnePassNext selects the next actionable state of the prog, based on the input character. It should only be called when i.Op == InstAlt or InstAltMatch, and from the one-pass machine. One of the alternates may ultimately lead without input to end of line. If the instruction is InstAltMatch the path to the InstMatch is in i.Out, the normal node in i.Next.
func iop(i *syntax.Inst) syntax.InstOpfunc mergeRuneSets(leftRunes, rightRunes *[]rune, leftPC, rightPC uint32) ([]rune, []uint32)func cleanupOnePass(prog *onePassProg, original *syntax.Prog)cleanupOnePass drops working memory, and restores certain shortcut instructions.
func minInputLen(re *syntax.Regexp) intminInputLen walks the regexp to find the minimum length of any matchable input
func quote(s string) stringfunc MatchReader(pattern string, r io.RuneReader) (matched bool, err error)MatchReader reports whether the text returned by the RuneReader contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.
func MatchString(pattern string, s string) (matched bool, err error)MatchString reports whether the string s contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.
func Match(pattern string, b []byte) (matched bool, err error)Match reports whether the byte slice b contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.
func special(b byte) boolspecial reports whether byte b needs to be escaped by QuoteMeta.
func init()func QuoteMeta(s string) stringQuoteMeta returns a string that escapes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.
func extract(str string) (name string, num int, rest string, ok bool)extract returns the name from a leading "$name" or "${name}" in str. If it is a number, extract returns num set to that number; otherwise num = -1.
func TestGoodCompile(t *testing.T)func TestBadCompile(t *testing.T)func matchTest(t *testing.T, test *FindTest)func TestMatch(t *testing.T)func matchFunctionTest(t *testing.T, test *FindTest)func TestMatchFunction(t *testing.T)func copyMatchTest(t *testing.T, test *FindTest)func TestCopyMatch(t *testing.T)func TestReplaceAll(t *testing.T)func TestReplaceAllLiteral(t *testing.T)func TestReplaceAllFunc(t *testing.T)func TestQuoteMeta(t *testing.T)func TestLiteralPrefix(t *testing.T)func TestSubexp(t *testing.T)func TestSplit(t *testing.T)func TestParseAndCompile(t *testing.T)The following sequence of Match calls used to panic. See issue #12980.
func TestOnePassCutoff(t *testing.T)Check that one-pass cutoff does trigger.
func TestSwitchBacktrack(t *testing.T)Check that the same machine can be used with the standard matcher and then the backtracker when there are no captures.
func BenchmarkFind(b *testing.B)func BenchmarkFindAllNoMatches(b *testing.B)func BenchmarkFindString(b *testing.B)func BenchmarkFindSubmatch(b *testing.B)func BenchmarkFindStringSubmatch(b *testing.B)func BenchmarkLiteral(b *testing.B)func BenchmarkNotLiteral(b *testing.B)func BenchmarkMatchClass(b *testing.B)func BenchmarkMatchClass_InRange(b *testing.B)func BenchmarkReplaceAll(b *testing.B)func BenchmarkAnchoredLiteralShortNonMatch(b *testing.B)func BenchmarkAnchoredLiteralLongNonMatch(b *testing.B)func BenchmarkAnchoredShortMatch(b *testing.B)func BenchmarkAnchoredLongMatch(b *testing.B)func BenchmarkOnePassShortA(b *testing.B)func BenchmarkNotOnePassShortA(b *testing.B)func BenchmarkOnePassShortB(b *testing.B)func BenchmarkNotOnePassShortB(b *testing.B)func BenchmarkOnePassLongPrefix(b *testing.B)func BenchmarkOnePassLongNotPrefix(b *testing.B)func BenchmarkMatchParallelShared(b *testing.B)func BenchmarkMatchParallelCopied(b *testing.B)func BenchmarkQuoteMetaAll(b *testing.B)func BenchmarkQuoteMetaNone(b *testing.B)func BenchmarkCompile(b *testing.B)func TestDeepEqual(t *testing.T)func TestMinInputLen(t *testing.T)func TestRE2Exhaustive(t *testing.T)This test is excluded when running under the race detector because it is a very expensive test and takes too long.
func TestRE2Search(t *testing.T)TestRE2 tests this package's regexp API against test cases considered during RE2's exhaustive tests, which run all possible regexps over a given set of atoms and operators, up to a given complexity, over all possible strings over a given alphabet, up to a given size. Rather than try to link with RE2, we read a log file containing the test cases and the expected matches. The log file, re2-exhaustive.txt, is generated by running 'make log' in the open source RE2 distribution https://github.com/google/re2/.
The test file format is a sequence of stanzas like:
strings
"abc"
"123x"
regexps
"[a-z]+"
0-3;0-3
-;-
"([0-9])([0-9])([0-9])"
-;-
-;0-3 0-1 1-2 2-3
The stanza begins by defining a set of strings, quoted using Go double-quote syntax, one per line. Then the regexps section gives a sequence of regexps to run on the strings. In the block that follows a regexp, each line gives the semicolon-separated match results of running the regexp on the corresponding string. Each match result is either a single -, meaning no match, or a space-separated sequence of pairs giving the match and submatch indices. An unmatched subexpression formats its pair as a single - (not illustrated above). For now each regexp run produces two match results, one for a full match' that restricts the regexp to matching the entire string or nothing, and one for a partial match' that gives the leftmost first match found in the string.
Lines beginning with # are comments. Lines beginning with a capital letter are test names printed during RE2's test suite and are echoed into t but otherwise ignored.
At time of writing, re2-exhaustive.txt is 59 MB but compresses to 385 kB, so we store re2-exhaustive.txt.bz2 in the repository and decompress it on the fly.
func testRE2(t *testing.T, file string)func runFull(re, refull *Regexp, text string) ([]int, string)func runPartial(re, refull *Regexp, text string) ([]int, string)func runFullLongest(re, refull *Regexp, text string) ([]int, string)func runPartialLongest(re, refull *Regexp, text string) ([]int, string)func matchFull(re, refull *Regexp, text string) (bool, string)func matchPartial(re, refull *Regexp, text string) (bool, string)func matchFullLongest(re, refull *Regexp, text string) (bool, string)func matchPartialLongest(re, refull *Regexp, text string) (bool, string)func isSingleBytes(s string) boolfunc parseResult(t *testing.T, file string, lineno int, res string) []intfunc same(x, y []int) boolfunc TestFowler(t *testing.T)TestFowler runs this package's regexp API against the POSIX regular expression tests collected by Glenn Fowler at http://www2.research.att.com/~astopen/testregex/testregex.html.
func testFowler(t *testing.T, file string)func parseFowlerResult(s string) (ok, compiled, matched bool, pos []int)func makeText(n int) []bytefunc BenchmarkMatch(b *testing.B)func BenchmarkMatch_onepass_regex(b *testing.B)func TestLongest(t *testing.T)func TestProgramTooLongForBacktrack(t *testing.T)TestProgramTooLongForBacktrack tests that a regex which is too long for the backtracker still executes properly.
func build(n int, x ...int) [][]intbuild is a helper to construct a [][]int by extracting n sequences from x. This represents n matches with len(x)/n submatches each.
func TestFind(t *testing.T)func TestFindString(t *testing.T)func testFindIndex(test *FindTest, result []int, t *testing.T)func TestFindIndex(t *testing.T)func TestFindStringIndex(t *testing.T)func TestFindReaderIndex(t *testing.T)func TestFindAll(t *testing.T)func TestFindAllString(t *testing.T)func testFindAllIndex(test *FindTest, result [][]int, t *testing.T)func TestFindAllIndex(t *testing.T)func TestFindAllStringIndex(t *testing.T)func testSubmatchBytes(test *FindTest, n int, submatches []int, result [][]byte, t *testing.T)func TestFindSubmatch(t *testing.T)func testSubmatchString(test *FindTest, n int, submatches []int, result []string, t *testing.T)func TestFindStringSubmatch(t *testing.T)func testSubmatchIndices(test *FindTest, n int, expect, result []int, t *testing.T)func testFindSubmatchIndex(test *FindTest, result []int, t *testing.T)func TestFindSubmatchIndex(t *testing.T)func TestFindStringSubmatchIndex(t *testing.T)func TestFindReaderSubmatchIndex(t *testing.T)func TestFindAllSubmatch(t *testing.T)func TestFindAllStringSubmatch(t *testing.T)func testFindAllSubmatchIndex(test *FindTest, result [][]int, t *testing.T)func TestFindAllSubmatchIndex(t *testing.T)func TestFindAllStringSubmatchIndex(t *testing.T)func TestMergeRuneSet(t *testing.T)func TestCompileOnePass(t *testing.T)func TestRunOnePass(t *testing.T)Package syntax parses regular expressions into parse trees and compiles parse trees into programs. Most clients of regular expressions will use the facilities of package regexp (such as Compile and Match) instead of this package.
The regular expression syntax understood by this package when parsing with the Perl flag is as follows. Parts of the syntax can be disabled by passing alternate flags to Parse.
Single characters:
. any character, possibly including newline (flag s=true)
[xyz] character class
[^xyz] negated character class
\d Perl character class
\D negated Perl character class
[[:alpha:]] ASCII character class
[[:^alpha:]] negated ASCII character class
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
Composites:
xy x followed by y
x|y x or y (prefer x)
Repetitions:
x* zero or more x, prefer more
x+ one or more x, prefer more
x? zero or one x, prefer one
x{n,m} n or n+1 or ... or m x, prefer more
x{n,} n or more x, prefer more
x{n} exactly n x
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
x{n,m}? n or n+1 or ... or m x, prefer fewer
x{n,}? n or more x, prefer fewer
x{n}? exactly n x
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
Grouping:
(re) numbered capturing group (submatch)
(?P<name>re) named & numbered capturing group (submatch)
(?:re) non-capturing group
(?flags) set flags within current group; non-capturing
(?flags:re) set flags during re; non-capturing
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:
i case-insensitive (default false)
m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s let . match \n (default false)
U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
Empty strings:
^ at beginning of text or line (flag m=true)
$ at end of text (like \z not \Z) or line (flag m=true)
\A at beginning of text
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B not at ASCII word boundary
\z at end of text
Escape sequences:
\a bell (== \007)
\f form feed (== \014)
\t horizontal tab (== \011)
\n newline (== \012)
\r carriage return (== \015)
\v vertical tab character (== \013)
\* literal *, for any punctuation character *
\123 octal character code (up to three digits)
\x7F hex character code (exactly two digits)
\x{10FFFF} hex character code
\Q...\E literal text ... even if ... has punctuation
Character class elements:
x single character
A-Z character range (inclusive)
\d Perl character class
[:foo:] ASCII character class foo
\p{Foo} Unicode character class Foo
\pF Unicode character class F (one-letter name)
Named character classes as character class elements:
[\d] digits (== \d)
[^\d] not digits (== \D)
[\D] not digits (== \D)
[^\D] not not digits (== \d)
[[:name:]] named ASCII class inside character class (== [:name:])
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
[\p{Name}] named Unicode property inside character class (== \p{Name})
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
Perl character classes (all ASCII-only):
\d digits (== [0-9])
\D not digits (== [^0-9])
\s whitespace (== [\t\n\f\r ])
\S not whitespace (== [^\t\n\f\r ])
\w word characters (== [0-9A-Za-z_])
\W not word characters (== [^0-9A-Za-z_])
ASCII character classes:
[[:alnum:]] alphanumeric (== [0-9A-Za-z])
[[:alpha:]] alphabetic (== [A-Za-z])
[[:ascii:]] ASCII (== [\x00-\x7F])
[[:blank:]] blank (== [\t ])
[[:cntrl:]] control (== [\x00-\x1F\x7F])
[[:digit:]] digits (== [0-9])
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]] lower case (== [a-z])
[[:print:]] printable (== [ -~] == [ [:graph:]])
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
[[:space:]] whitespace (== [\t\n\v\f\r ])
[[:upper:]] upper case (== [A-Z])
[[:word:]] word characters (== [0-9A-Za-z_])
[[:xdigit:]] hex digit (== [0-9A-Fa-f])
Unicode character classes are those in unicode.Categories and unicode.Scripts.
- Constants
- const _Op_name_0
- const _Op_name_1
- const ErrInternalError
- const ErrInvalidCharClass
- const ErrInvalidCharRange
- const ErrInvalidEscape
- const ErrInvalidNamedCapture
- const ErrInvalidPerlOp
- const ErrInvalidRepeatOp
- const ErrInvalidRepeatSize
- const ErrInvalidUTF8
- const ErrMissingBracket
- const ErrMissingParen
- const ErrMissingRepeatArgument
- const ErrTrailingBackslash
- const ErrUnexpectedParen
- const FoldCase
- const Literal
- const ClassNL
- const DotNL
- const OneLine
- const NonGreedy
- const PerlX
- const UnicodeGroups
- const WasDollar
- const Simple
- const MatchNL
- const Perl
- const POSIX
- const opLeftParen
- const opVerticalBar
- const minFold
- const maxFold
- const InstAlt
- const InstAltMatch
- const InstCapture
- const InstEmptyWidth
- const InstMatch
- const InstFail
- const InstNop
- const InstRune
- const InstRune1
- const InstRuneAny
- const InstRuneAnyNotNL
- const EmptyBeginLine
- const EmptyEndLine
- const EmptyBeginText
- const EmptyEndText
- const EmptyWordBoundary
- const EmptyNoWordBoundary
- const noMatch
- const OpNoMatch
- const OpEmptyMatch
- const OpLiteral
- const OpCharClass
- const OpAnyCharNotNL
- const OpAnyChar
- const OpBeginLine
- const OpEndLine
- const OpBeginText
- const OpEndText
- const OpWordBoundary
- const OpNoWordBoundary
- const OpCapture
- const OpStar
- const OpPlus
- const OpQuest
- const OpRepeat
- const OpConcat
- const OpAlternate
- const opPseudo
- const meta
- const testFlags
- Variables
- var c
- var anyRuneNotNL
- var anyRune
- var f
- var f
- var f
- var _Op_index_0
- var str
- var strflags
- var istr
- var iflags
- var first
- var ifirst
- var p
- var err
- var c
- var op
- var lastRepeat
- var lit
- var ok1
- var c
- var anyTable
- var seq
- var name
- var lo
- var hi
- var code1
- var code2
- var code3
- var perlGroup
- var code4
- var code5
- var code6
- var code7
- var code8
- var code9
- var code10
- var code11
- var code12
- var code13
- var code14
- var code15
- var code16
- var code17
- var posixGroup
- var instOpNames
- var op
- var boundary
- var b
- var buf
- var flag
- var b
- var b
- var prefix
- var parseTests
- var foldcaseTests
- var literalTests
- var matchnlTests
- var nomatchnlTests
- var b
- var opNames
- var r
- var invalidRegexps
- var onlyPerl
- var onlyPOSIX
- var compileTests
- var r1
- var simplifyTests
- Types
- type patchList struct
- type frag struct
- type compiler struct
- func (c *compiler) init()
- func (c *compiler) compile(re *Regexp) frag
- func (c *compiler) inst(op InstOp) frag
- func (c *compiler) nop() frag
- func (c *compiler) fail() frag
- func (c *compiler) cap(arg uint32) frag
- func (c *compiler) cat(f1, f2 frag) frag
- func (c *compiler) alt(f1, f2 frag) frag
- func (c *compiler) quest(f1 frag, nongreedy bool) frag
- func (c *compiler) star(f1 frag, nongreedy bool) frag
- func (c *compiler) plus(f1 frag, nongreedy bool) frag
- func (c *compiler) empty(op EmptyOp) frag
- func (c *compiler) rune(r []rune, flags Flags) frag
- type Error struct
- type ErrorCode string
- type Flags uint16
- type parser struct
- func (p *parser) newRegexp(op Op) *Regexp
- func (p *parser) reuse(re *Regexp)
- func (p *parser) push(re *Regexp) *Regexp
- func (p *parser) maybeConcat(r rune, flags Flags) bool
- func (p *parser) literal(r rune)
- func (p *parser) op(op Op) *Regexp
- func (p *parser) repeat(op Op, min, max int, before, after, lastRepeat string) (string, error)
- func (p *parser) concat() *Regexp
- func (p *parser) alternate() *Regexp
- func (p *parser) collapse(subs []*Regexp, op Op) *Regexp
- func (p *parser) factor(sub []*Regexp) []*Regexp
- func (p *parser) leadingString(re *Regexp) ([]rune, Flags)
- func (p *parser) removeLeadingString(re *Regexp, n int) *Regexp
- func (p *parser) leadingRegexp(re *Regexp) *Regexp
- func (p *parser) removeLeadingRegexp(re *Regexp, reuse bool) *Regexp
- func (p *parser) parseRepeat(s string) (min, max int, rest string, ok bool)
- func (p *parser) parsePerlFlags(s string) (rest string, err error)
- func (p *parser) parseInt(s string) (n int, rest string, ok bool)
- func (p *parser) parseVerticalBar() error
- func (p *parser) swapVerticalBar() bool
- func (p *parser) parseRightParen() error
- func (p *parser) parseEscape(s string) (r rune, rest string, err error)
- func (p *parser) parseClassChar(s, wholeClass string) (r rune, rest string, err error)
- func (p *parser) parsePerlClassEscape(s string, r []rune) (out []rune, rest string)
- func (p *parser) parseNamedClass(s string, r []rune) (out []rune, rest string, err error)
- func (p *parser) appendGroup(r []rune, g charGroup) []rune
- func (p *parser) parseUnicodeClass(s string, r []rune) (out []rune, rest string, err error)
- func (p *parser) parseClass(s string) (rest string, err error)
- type charGroup struct
- type ranges struct
- type Prog struct
- type InstOp uint8
- type EmptyOp uint8
- type Inst struct
- type Regexp struct
- func literalRegexp(s string, flags Flags) *Regexp
- func Parse(s string, flags Flags) (*Regexp, error)
- func simplify1(op Op, flags Flags, sub, re *Regexp) *Regexp
- func (x *Regexp) Equal(y *Regexp) bool
- func (re *Regexp) String() string
- func (re *Regexp) MaxCap() int
- func (re *Regexp) CapNames() []string
- func (re *Regexp) capNames(names []string)
- func (re *Regexp) Simplify() *Regexp
- type Op uint8
- type parseTest struct
- Functions
- func minFoldRune(r rune) rune
- func repeatIsValid(re *Regexp, n int) bool
- func cleanAlt(re *Regexp)
- func isValidCaptureName(name string) bool
- func isCharClass(re *Regexp) bool
- func matchRune(re *Regexp, r rune) bool
- func mergeCharClass(dst, src *Regexp)
- func unicodeTable(name string) (*unicode.RangeTable, *unicode.RangeTable)
- func cleanClass(rp *[]rune) []rune
- func appendLiteral(r []rune, x rune, flags Flags) []rune
- func appendRange(r []rune, lo, hi rune) []rune
- func appendFoldedRange(r []rune, lo, hi rune) []rune
- func appendClass(r []rune, x []rune) []rune
- func appendFoldedClass(r []rune, x []rune) []rune
- func appendNegatedClass(r []rune, x []rune) []rune
- func appendTable(r []rune, x *unicode.RangeTable) []rune
- func appendNegatedTable(r []rune, x *unicode.RangeTable) []rune
- func negateClass(r []rune) []rune
- func checkUTF8(s string) error
- func nextRune(s string) (c rune, t string, err error)
- func isalnum(c rune) bool
- func unhex(c rune) rune
- func IsWordChar(r rune) bool
- func bw(b *strings.Builder, args ...string)
- func dumpProg(b *strings.Builder, p *Prog)
- func u32(i uint32) string
- func dumpInst(b *strings.Builder, i *Inst)
- func writeRegexp(b *strings.Builder, re *Regexp)
- func escape(b *strings.Builder, r rune, force bool)
- func TestParseSimple(t *testing.T)
- func TestParseFoldCase(t *testing.T)
- func TestParseLiteral(t *testing.T)
- func TestParseMatchNL(t *testing.T)
- func TestParseNoMatchNL(t *testing.T)
- func testParseDump(t *testing.T, tests []parseTest, flags Flags)
- func dump(re *Regexp) string
- func dumpRegexp(b *strings.Builder, re *Regexp)
- func mkCharClass(f func(rune) bool) string
- func isUpperFold(r rune) bool
- func TestFoldConstants(t *testing.T)
- func TestAppendRangeCollapse(t *testing.T)
- func TestParseInvalidRegexps(t *testing.T)
- func TestToStringEquivalentParse(t *testing.T)
- func TestCompile(t *testing.T)
- func BenchmarkEmptyOpContext(b *testing.B)
- func TestSimplify(t *testing.T)
const _Op_name_0 = "NoMatchEmptyMatchLiteralCharClassAnyCharNotNLAnyCharBeginLineEndLineBeginTextEndTextWordBoundaryNoWordBoundaryCaptureStarPlusQuestRepeatConcatAlternate"const _Op_name_1 = "opPseudo"const ErrInternalError ErrorCode = "regexp/syntax: internal error"Unexpected error
const ErrInvalidCharClass ErrorCode = "invalid character class"Parse errors
const ErrInvalidCharRange ErrorCode = "invalid character class range"const ErrInvalidEscape ErrorCode = "invalid escape sequence"const ErrInvalidNamedCapture ErrorCode = "invalid named capture"const ErrInvalidPerlOp ErrorCode = "invalid or unsupported Perl syntax"const ErrInvalidRepeatOp ErrorCode = "invalid nested repetition operator"const ErrInvalidRepeatSize ErrorCode = "invalid repeat count"const ErrInvalidUTF8 ErrorCode = "invalid UTF-8"const ErrMissingBracket ErrorCode = "missing closing ]"const ErrMissingParen ErrorCode = "missing closing )"const ErrMissingRepeatArgument ErrorCode = "missing argument to repetition operator"const ErrTrailingBackslash ErrorCode = "trailing backslash at end of expression"const ErrUnexpectedParen ErrorCode = "unexpected )"const FoldCase Flags = 1 << iota // case-insensitive matchconst Literal // treat pattern as literal stringconst ClassNL // allow character classes like [^a-z] and [[:space:]] to match newlineconst DotNL // allow . to match newlineconst OneLine // treat ^ and $ as only matching at beginning and end of textconst NonGreedy // make repetition operators default to non-greedyconst PerlX // allow Perl extensionsconst UnicodeGroups // allow \p{Han}, \P{Han} for Unicode group and negationconst WasDollar // regexp OpEndText was $, not \zconst Simple // regexp contains no counted repetitionconst MatchNL = ClassNL | DotNLconst Perl = ClassNL | OneLine | PerlX | UnicodeGroups // as close to Perl as possibleconst POSIX Flags = 0 // POSIX syntaxconst opLeftParen = opPseudo + iotaPseudo-ops for parsing stack.
const opVerticalBarPseudo-ops for parsing stack.
const minFold = 0x0041minimum and maximum runes involved in folding. checked during test.
const maxFold = 0x1e943const InstAlt InstOp = iotaconst InstAltMatchconst InstCaptureconst InstEmptyWidthconst InstMatchconst InstFailconst InstNopconst InstRuneconst InstRune1const InstRuneAnyconst InstRuneAnyNotNLconst EmptyBeginLine EmptyOp = 1 << iotaconst EmptyEndLineconst EmptyBeginTextconst EmptyEndTextconst EmptyWordBoundaryconst EmptyNoWordBoundaryconst noMatch = -1const OpNoMatch Op = 1 + iota // matches no stringsconst OpEmptyMatch // matches empty stringconst OpLiteral // matches Runes sequenceconst OpCharClass // matches Runes interpreted as range pair listconst OpAnyCharNotNL // matches any character except newlineconst OpAnyChar // matches any characterconst OpBeginLine // matches empty string at beginning of lineconst OpEndLine // matches empty string at end of lineconst OpBeginText // matches empty string at beginning of textconst OpEndText // matches empty string at end of textconst OpWordBoundary // matches word boundary `\b`const OpNoWordBoundary // matches word non-boundary `\B`const OpCapture // capturing subexpression with index Cap, optional name Nameconst OpStar // matches Sub[0] zero or more timesconst OpPlus // matches Sub[0] one or more timesconst OpQuest // matches Sub[0] zero or one timesconst OpRepeat // matches Sub[0] at least Min times, at most Max (Max == -1 is no limit)const OpConcat // matches concatenation of Subsconst OpAlternate // matches alternation of Subsconst opPseudo Op = 128 // where pseudo-ops startconst meta = `\.+*?()|[]{}^$`const testFlags = MatchNL | PerlX | UnicodeGroupsvar c compilervar anyRuneNotNL = ...var anyRune = ...var f fragvar f fragvar f fragvar _Op_index_0 = ...var str []runeRound 1: Factor out common literal prefixes.
var strflags Flagsvar istr []runeInvariant: the Regexps that were in sub[0:start] have been used or marked for reuse, and the slice space has been reused for out (len(out) <= start).
Invariant: sub[start:i] consists of regexps that all begin with str as modified by strflags.
var iflags Flagsvar first *Regexpvar ifirst *RegexpInvariant: the Regexps that were in sub[0:start] have been used or marked for reuse, and the slice space has been reused for out (len(out) <= start).
Invariant: sub[start:i] consists of regexps that all begin with ifirst.
var p parserOtherwise, must do real work.
var err errorOtherwise, must do real work.
var c runeOtherwise, must do real work.
var op OpOtherwise, must do real work.
var lastRepeat stringOtherwise, must do real work.
var lit string\Q ... \E: the ... is always literals
var ok1 boolvar c runeNon-capturing group. Might also twiddle Perl flags.
var anyTable = ...var seq, name stringvar seq, name stringvar lo, hi runevar lo, hi runevar code1 = ...var code2 = ...var code3 = ...var perlGroup = ...var code4 = ...var code5 = ...var code6 = ...var code7 = ...var code8 = ...var code9 = ...var code10 = ...var code11 = ...var code12 = ...var code13 = ...var code14 = ...var code15 = ...var code16 = ...var code17 = ...var posixGroup = ...var instOpNames = ...var op EmptyOp = EmptyNoWordBoundaryvar boundary bytevar b strings.Buildervar buf strings.BuilderHave prefix; gather characters.
var flag EmptyOpvar b strings.Buildervar b strings.Buildervar prefix *RegexpBuild leading prefix: xx.
var parseTests = ...var foldcaseTests = ...var literalTests = ...var matchnlTests = ...var nomatchnlTests = ...var b strings.Buildervar opNames = ...var r []runeAppendRange should collapse each of the new ranges into the earlier ones (it looks back two ranges), so that the slice never grows very large. Note that we are not calling cleanClass.
var invalidRegexps = ...var onlyPerl = ...var onlyPOSIX = ...var compileTests = ...var r1 rune = -1var simplifyTests = ...type patchList struct {
head, tail uint32
}A patchList is a list of instruction pointers that need to be filled in (patched). Because the pointers haven't been filled in yet, we can reuse their storage to hold the list. It's kind of sleazy, but works well in practice. See https://swtch.com/~rsc/regexp/regexp1.html for inspiration.
These aren't really pointers: they're integers, so we can reinterpret them this way without using package unsafe. A value l.head denotes p.inst[l.head>>1].Out (l.head&1==0) or .Arg (l.head&1==1). head == 0 denotes the empty list, okay because we start every program with a fail instruction, so we'll never want to point at its output link.
func makePatchList(n uint32) patchListfunc (l patchList) patch(p *Prog, val uint32)func (l1 patchList) append(p *Prog, l2 patchList) patchListtype frag struct {
i uint32 // index of first instruction
out patchList // where to record end instruction
}A frag represents a compiled program fragment.
type compiler struct {
p *Prog
}func (c *compiler) init()func (c *compiler) compile(re *Regexp) fragfunc (c *compiler) inst(op InstOp) fragfunc (c *compiler) nop() fragfunc (c *compiler) fail() fragfunc (c *compiler) cap(arg uint32) fragfunc (c *compiler) cat(f1, f2 frag) fragfunc (c *compiler) alt(f1, f2 frag) fragfunc (c *compiler) quest(f1 frag, nongreedy bool) fragfunc (c *compiler) star(f1 frag, nongreedy bool) fragfunc (c *compiler) plus(f1 frag, nongreedy bool) fragfunc (c *compiler) empty(op EmptyOp) fragfunc (c *compiler) rune(r []rune, flags Flags) fragtype Error struct {
Code ErrorCode
Expr string
}An Error describes a failure to parse a regular expression and gives the offending expression.
func (e *Error) Error() stringtype ErrorCode stringAn ErrorCode describes a failure to parse a regular expression.
func (e ErrorCode) String() stringtype Flags uint16Flags control the behavior of the parser and record information about regexp context.
type parser struct {
flags Flags // parse mode flags
stack []*Regexp // stack of parsed expressions
free *Regexp
numCap int // number of capturing groups seen
wholeRegexp string
tmpClass []rune // temporary char class work space
}func (p *parser) newRegexp(op Op) *Regexpfunc (p *parser) reuse(re *Regexp)func (p *parser) push(re *Regexp) *Regexppush pushes the regexp re onto the parse stack and returns the regexp.
func (p *parser) maybeConcat(r rune, flags Flags) boolmaybeConcat implements incremental concatenation of literal runes into string nodes. The parser calls this before each push, so only the top fragment of the stack might need processing. Since this is called before a push, the topmost literal is no longer subject to operators like * (Otherwise ab* would turn into (ab)*.) If r >= 0 and there's a node left over, maybeConcat uses it to push r with the given flags. maybeConcat reports whether r was pushed.
func (p *parser) literal(r rune)literal pushes a literal regexp for the rune r on the stack.
func (p *parser) op(op Op) *Regexpop pushes a regexp with the given op onto the stack and returns that regexp.
func (p *parser) repeat(op Op, min, max int, before, after, lastRepeat string) (string, error)repeat replaces the top stack element with itself repeated according to op, min, max. before is the regexp suffix starting at the repetition operator. after is the regexp suffix following after the repetition operator. repeat returns an updated 'after' and an error, if any.
func (p *parser) concat() *Regexpconcat replaces the top of the stack (above the topmost '|' or '(') with its concatenation.
func (p *parser) alternate() *Regexpalternate replaces the top of the stack (above the topmost '(') with its alternation.
func (p *parser) collapse(subs []*Regexp, op Op) *Regexpcollapse returns the result of applying op to sub. If sub contains op nodes, they all get hoisted up so that there is never a concat of a concat or an alternate of an alternate.
func (p *parser) factor(sub []*Regexp) []*Regexpfactor factors common prefixes from the alternation list sub. It returns a replacement list that reuses the same storage and frees (passes to p.reuse) any removed *Regexps.
For example,
ABC|ABD|AEF|BCX|BCY
simplifies by literal prefix extraction to
A(B(C|D)|EF)|BC(X|Y)
which simplifies by character class introduction to
A(B[CD]|EF)|BC[XY]
func (p *parser) leadingString(re *Regexp) ([]rune, Flags)leadingString returns the leading literal string that re begins with. The string refers to storage in re or its children.
func (p *parser) removeLeadingString(re *Regexp, n int) *RegexpremoveLeadingString removes the first n leading runes from the beginning of re. It returns the replacement for re.
func (p *parser) leadingRegexp(re *Regexp) *RegexpleadingRegexp returns the leading regexp that re begins with. The regexp refers to storage in re or its children.
func (p *parser) removeLeadingRegexp(re *Regexp, reuse bool) *RegexpremoveLeadingRegexp removes the leading regexp in re. It returns the replacement for re. If reuse is true, it passes the removed regexp (if no longer needed) to p.reuse.
func (p *parser) parseRepeat(s string) (min, max int, rest string, ok bool)parseRepeat parses {min} (max=min) or {min,} (max=-1) or {min,max}. If s is not of that form, it returns ok == false. If s has the right form but the values are too big, it returns min == -1, ok == true.
func (p *parser) parsePerlFlags(s string) (rest string, err error)parsePerlFlags parses a Perl flag setting or non-capturing group or both, like (?i) or (?: or (?i:. It removes the prefix from s and updates the parse state. The caller must have ensured that s begins with "(?".
func (p *parser) parseInt(s string) (n int, rest string, ok bool)parseInt parses a decimal integer.
func (p *parser) parseVerticalBar() errorparseVerticalBar handles a | in the input.
func (p *parser) swapVerticalBar() boolIf the top of the stack is an element followed by an opVerticalBar swapVerticalBar swaps the two and returns true. Otherwise it returns false.
func (p *parser) parseRightParen() errorparseRightParen handles a ) in the input.
func (p *parser) parseEscape(s string) (r rune, rest string, err error)parseEscape parses an escape sequence at the beginning of s and returns the rune.
func (p *parser) parseClassChar(s, wholeClass string) (r rune, rest string, err error)parseClassChar parses a character class character at the beginning of s and returns it.
func (p *parser) parsePerlClassEscape(s string, r []rune) (out []rune, rest string)parsePerlClassEscape parses a leading Perl character class escape like \d from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.
func (p *parser) parseNamedClass(s string, r []rune) (out []rune, rest string, err error)parseNamedClass parses a leading POSIX named character class like [:alnum:] from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.
func (p *parser) appendGroup(r []rune, g charGroup) []runefunc (p *parser) parseUnicodeClass(s string, r []rune) (out []rune, rest string, err error)parseUnicodeClass parses a leading Unicode character class like \p{Han} from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.
func (p *parser) parseClass(s string) (rest string, err error)parseClass parses a character class at the beginning of s and pushes it onto the parse stack.
type charGroup struct {
sign int
class []rune
}type ranges struct {
p *[]rune
}ranges implements sort.Interface on a []rune. The choice of receiver type definition is strange but avoids an allocation since we already have a *[]rune.
func (ra ranges) Less(i, j int) boolfunc (ra ranges) Len() intfunc (ra ranges) Swap(i, j int)type Prog struct {
Inst []Inst
Start int // index of start instruction
NumCap int // number of InstCapture insts in re
}A Prog is a compiled regular expression program.
func Compile(re *Regexp) (*Prog, error)Compile compiles the regexp into a program to be executed. The regexp should have been simplified already (returned from re.Simplify).
func (p *Prog) String() stringfunc (p *Prog) skipNop(pc uint32) *InstskipNop follows any no-op or capturing instructions.
func (p *Prog) Prefix() (prefix string, complete bool)Prefix returns a literal string that all matches for the regexp must start with. Complete is true if the prefix is the entire match.
func (p *Prog) StartCond() EmptyOpStartCond returns the leading empty-width conditions that must be true in any match. It returns ^EmptyOp(0) if no matches are possible.
type InstOp uint8An InstOp is an instruction opcode.
func (i InstOp) String() stringtype EmptyOp uint8An EmptyOp specifies a kind or mixture of zero-width assertions.
func EmptyOpContext(r1, r2 rune) EmptyOpEmptyOpContext returns the zero-width assertions satisfied at the position between the runes r1 and r2. Passing r1 == -1 indicates that the position is at the beginning of the text. Passing r2 == -1 indicates that the position is at the end of the text.
type Inst struct {
Op InstOp
Out uint32 // all but InstMatch, InstFail
Arg uint32 // InstAlt, InstAltMatch, InstCapture, InstEmptyWidth
Rune []rune
}An Inst is a single instruction in a regular expression program.
func (i *Inst) op() InstOpop returns i.Op but merges all the Rune special cases into InstRune
func (i *Inst) MatchRune(r rune) boolMatchRune reports whether the instruction matches (and consumes) r. It should only be called when i.Op == InstRune.
func (i *Inst) MatchRunePos(r rune) intMatchRunePos checks whether the instruction matches (and consumes) r. If so, MatchRunePos returns the index of the matching rune pair (or, when len(i.Rune) == 1, rune singleton). If not, MatchRunePos returns -1. MatchRunePos should only be called when i.Op == InstRune.
func (i *Inst) MatchEmptyWidth(before rune, after rune) boolMatchEmptyWidth reports whether the instruction matches an empty string between the runes before and after. It should only be called when i.Op == InstEmptyWidth.
func (i *Inst) String() stringtype Regexp struct {
Op Op // operator
Flags Flags
Sub []*Regexp // subexpressions, if any
Sub0 [1]*Regexp // storage for short Sub
Rune []rune // matched runes, for OpLiteral, OpCharClass
Rune0 [2]rune // storage for short Rune
Min, Max int // min, max for OpRepeat
Cap int // capturing index, for OpCapture
Name string // capturing name, for OpCapture
}A Regexp is a node in a regular expression syntax tree.
func literalRegexp(s string, flags Flags) *Regexpfunc Parse(s string, flags Flags) (*Regexp, error)Parse parses a regular expression string s, controlled by the specified Flags, and returns a regular expression parse tree. The syntax is described in the top-level comment.
func simplify1(op Op, flags Flags, sub, re *Regexp) *Regexpsimplify1 implements Simplify for the unary OpStar, OpPlus, and OpQuest operators. It returns the simple regexp equivalent to
Regexp{Op: op, Flags: flags, Sub: {sub}}
under the assumption that sub is already simple, and without first allocating that structure. If the regexp to be returned turns out to be equivalent to re, simplify1 returns re instead.
simplify1 is factored out of Simplify because the implementation for other operators generates these unary expressions. Letting them call simplify1 makes sure the expressions they generate are simple.
func (x *Regexp) Equal(y *Regexp) boolEqual reports whether x and y have identical structure.
func (re *Regexp) String() stringfunc (re *Regexp) MaxCap() intMaxCap walks the regexp to find the maximum capture index.
func (re *Regexp) CapNames() []stringCapNames walks the regexp to find the names of capturing groups.
func (re *Regexp) capNames(names []string)func (re *Regexp) Simplify() *RegexpSimplify returns a regexp equivalent to re but without counted repetitions and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/. The resulting regexp will execute correctly but its string representation will not produce the same parse tree, because capturing parentheses may have been duplicated or removed. For example, the simplified form for /(x){1,2}/ is /(x)(x)?/ but both parentheses capture as $1. The returned regexp may share structure with or be the original.
type Op uint8An Op is a single regular expression operator.
func (i Op) String() stringtype parseTest struct {
Regexp string
Dump string
}func minFoldRune(r rune) runeminFoldRune returns the minimum rune fold-equivalent to r.
func repeatIsValid(re *Regexp, n int) boolrepeatIsValid reports whether the repetition re is valid. Valid means that the combination of the top-level repetition and any inner repetitions does not exceed n copies of the innermost thing. This function rewalks the regexp tree and is called for every repetition, so we have to worry about inducing quadratic behavior in the parser. We avoid this by only calling repeatIsValid when min or max >= 2. In that case the depth of any >= 2 nesting can only get to 9 without triggering a parse error, so each subtree can only be rewalked 9 times.
func cleanAlt(re *Regexp)cleanAlt cleans re for eventual inclusion in an alternation.
func isValidCaptureName(name string) boolisValidCaptureName reports whether name is a valid capture name: [A-Za-z0-9_]+. PCRE limits names to 32 bytes. Python rejects names starting with digits. We don't enforce either of those.
func isCharClass(re *Regexp) boolcan this be represented as a character class? single-rune literal string, char class, ., and .|\n.
func matchRune(re *Regexp, r rune) booldoes re match r?
func mergeCharClass(dst, src *Regexp)mergeCharClass makes dst = dst|src. The caller must ensure that dst.Op >= src.Op, to reduce the amount of copying.
func unicodeTable(name string) (*unicode.RangeTable, *unicode.RangeTable)unicodeTable returns the unicode.RangeTable identified by name and the table of additional fold-equivalent code points.
func cleanClass(rp *[]rune) []runecleanClass sorts the ranges (pairs of elements of r), merges them, and eliminates duplicates.
func appendLiteral(r []rune, x rune, flags Flags) []runeappendLiteral returns the result of appending the literal x to the class r.
func appendRange(r []rune, lo, hi rune) []runeappendRange returns the result of appending the range lo-hi to the class r.
func appendFoldedRange(r []rune, lo, hi rune) []runeappendFoldedRange returns the result of appending the range lo-hi and its case folding-equivalent runes to the class r.
func appendClass(r []rune, x []rune) []runeappendClass returns the result of appending the class x to the class r. It assume x is clean.
func appendFoldedClass(r []rune, x []rune) []runeappendFolded returns the result of appending the case folding of the class x to the class r.
func appendNegatedClass(r []rune, x []rune) []runeappendNegatedClass returns the result of appending the negation of the class x to the class r. It assumes x is clean.
func appendTable(r []rune, x *unicode.RangeTable) []runeappendTable returns the result of appending x to the class r.
func appendNegatedTable(r []rune, x *unicode.RangeTable) []runeappendNegatedTable returns the result of appending the negation of x to the class r.
func negateClass(r []rune) []runenegateClass overwrites r and returns r's negation. It assumes the class r is already clean.
func checkUTF8(s string) errorfunc nextRune(s string) (c rune, t string, err error)func isalnum(c rune) boolfunc unhex(c rune) runefunc IsWordChar(r rune) boolIsWordChar reports whether r is consider a `word character' during the evaluation of the \b and \B zero-width assertions. These assertions are ASCII-only: the word characters are [A-Za-z0-9_].
func bw(b *strings.Builder, args ...string)func dumpProg(b *strings.Builder, p *Prog)func u32(i uint32) stringfunc dumpInst(b *strings.Builder, i *Inst)func writeRegexp(b *strings.Builder, re *Regexp)writeRegexp writes the Perl syntax for the regular expression re to b.
func escape(b *strings.Builder, r rune, force bool)func TestParseSimple(t *testing.T)func TestParseFoldCase(t *testing.T)func TestParseLiteral(t *testing.T)func TestParseMatchNL(t *testing.T)func TestParseNoMatchNL(t *testing.T)func testParseDump(t *testing.T, tests []parseTest, flags Flags)Test Parse -> Dump.
func dump(re *Regexp) stringdump prints a string representation of the regexp showing the structure explicitly.
func dumpRegexp(b *strings.Builder, re *Regexp)dumpRegexp writes an encoding of the syntax tree for the regexp re to b. It is used during testing to distinguish between parses that might print the same using re's String method.
func mkCharClass(f func(rune) bool) stringfunc isUpperFold(r rune) boolfunc TestFoldConstants(t *testing.T)func TestAppendRangeCollapse(t *testing.T)func TestParseInvalidRegexps(t *testing.T)func TestToStringEquivalentParse(t *testing.T)func TestCompile(t *testing.T)func BenchmarkEmptyOpContext(b *testing.B)func TestSimplify(t *testing.T)- Variables
- Functions
- func Example()
- func ExampleMatch()
- func ExampleMatchString()
- func ExampleQuoteMeta()
- func ExampleRegexp_Find()
- func ExampleRegexp_FindAll()
- func ExampleRegexp_FindAllSubmatch()
- func ExampleRegexp_FindSubmatch()
- func ExampleRegexp_Match()
- func ExampleRegexp_FindString()
- func ExampleRegexp_FindStringIndex()
- func ExampleRegexp_FindStringSubmatch()
- func ExampleRegexp_FindAllString()
- func ExampleRegexp_FindAllStringSubmatch()
- func ExampleRegexp_FindAllStringSubmatchIndex()
- func ExampleRegexp_FindSubmatchIndex()
- func ExampleRegexp_Longest()
- func ExampleRegexp_MatchString()
- func ExampleRegexp_NumSubexp()
- func ExampleRegexp_ReplaceAll()
- func ExampleRegexp_ReplaceAllLiteralString()
- func ExampleRegexp_ReplaceAllString()
- func ExampleRegexp_ReplaceAllStringFunc()
- func ExampleRegexp_SubexpNames()
- func ExampleRegexp_SubexpIndex()
- func ExampleRegexp_Split()
- func ExampleRegexp_Expand()
- func ExampleRegexp_ExpandString()
- func ExampleRegexp_FindIndex()
- func ExampleRegexp_FindAllSubmatchIndex()
- func ExampleRegexp_FindAllIndex()
var validID = regexp.MustCompile(`^[a-z]+\[[0-9]+\]$`)Compile the expression once, usually at init time. Use raw strings to avoid having to quote the backslashes.
func Example()func ExampleMatch()func ExampleMatchString()func ExampleQuoteMeta()func ExampleRegexp_Find()func ExampleRegexp_FindAll()func ExampleRegexp_FindAllSubmatch()func ExampleRegexp_FindSubmatch()func ExampleRegexp_Match()func ExampleRegexp_FindString()func ExampleRegexp_FindStringIndex()func ExampleRegexp_FindStringSubmatch()func ExampleRegexp_FindAllString()func ExampleRegexp_FindAllStringSubmatch()func ExampleRegexp_FindAllStringSubmatchIndex()func ExampleRegexp_FindSubmatchIndex()func ExampleRegexp_Longest()func ExampleRegexp_MatchString()func ExampleRegexp_NumSubexp()func ExampleRegexp_ReplaceAll()func ExampleRegexp_ReplaceAllLiteralString()func ExampleRegexp_ReplaceAllString()func ExampleRegexp_ReplaceAllStringFunc()func ExampleRegexp_SubexpNames()func ExampleRegexp_SubexpIndex()func ExampleRegexp_Split()func ExampleRegexp_Expand()func ExampleRegexp_ExpandString()func ExampleRegexp_FindIndex()func ExampleRegexp_FindAllSubmatchIndex()func ExampleRegexp_FindAllIndex()