Index (new page,exported)

Index

Package regexp
Package syntax
Package regexp_test

Package regexp (new page,exported)

Package regexp implements regular expression search.

The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by RE2 and described at https://golang.org/s/re2syntax, except for \C. For an overview of the syntax, run

go doc regexp/syntax

The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.) For more information about this property, see

[https://swtch.com/~rsc/regexp/regexp1.html](https://swtch.com/~rsc/regexp/regexp1.html)

or any book about automata theory.

All characters are UTF-8-encoded code points.

There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:

Find(All)?(String)?(Submatch)?(Index)?

If 'All' is present, the routine matches successive non-overlapping matches of the entire expression. Empty matches abutting a preceding match are ignored. The return value is a slice containing the successive return values of the corresponding non-'All' routine. These routines take an extra integer argument, n. If n >= 0, the function returns at most n matches/submatches; otherwise, it returns all of them.

If 'String' is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate.

If 'Submatch' is present, the return value is a slice identifying the successive submatches of the expression. Submatches are matches of parenthesized subexpressions (also known as capturing groups) within the regular expression, numbered from left to right in order of opening parenthesis. Submatch 0 is the match of the entire expression, submatch 1 the match of the first parenthesized subexpression, and so on.

If 'Index' is present, matches and submatches are identified by byte index pairs within the input string: result[2n:2n+1] identifies the indexes of the nth submatch. The pair for n==0 identifies the match of the entire expression. If 'Index' is not present, the match is identified by the text of the match/submatch. If an index is negative or text is nil, it means that subexpression did not match any string in the input. For 'String' versions an empty string means either no match or an empty match.

There is also a subset of the methods that can be applied to text read from a RuneReader:

MatchReader, FindReaderIndex, FindReaderSubmatchIndex

This set may grow. Note that regular expression matches may need to examine text beyond the text returned by a match, so the methods that match text from a RuneReader may read arbitrarily far into the input before returning.

(There are a few other methods that do not match this pattern.)

Index

Constants
Variables
- var bitStatePool
- var t
- var flag
- var onePassPool
- var flag
- var arrayNoInts
- var buf
- var noRune
- var noNext
- var lx
- var rx
- var anyRuneNotNL
- var anyRune
- var instQueue
- var visitQueue
- var check
- var onePassRunes
- var matchSize
- var matchPool
- var lnext
- var buf
- var endPos
- var dstCap
- var width
- var specialBytes
- var i
- var end
- var width
- var dstCap
- var dstCap
- var dstCap
- var dstCap
- var result
- var result
- var result
- var result
- var result
- var result
- var result
- var result
- var goodRe
- var badRe
- var replaceTests
- var replaceLiteralTests
- var replaceFuncTests
- var metaTests
- var literalPrefixTests
- var emptySubexpIndices
- var subexpCases
- var splitTests
- var sink
- var compileBenchData
- var minInputLenTests
- var txt
- var str
- var input
- var inStrings
- var re
- var refull
- var nfail
- var ncase
- var text
- var run
- var match
- var notab
- var x
- var end
- var v
- var err
- var text
- var benchData
- var benchSizes
- var findTests
- var runeMergeTests
- var onePassTests
- var p
- var re
- var err
- var onePassTests1
Types
Functions

Constants (exported)

const visitedBits

const visitedBits = 32

const maxBacktrackProg

const maxBacktrackProg = 500 // len(prog.Inst) <= max

const maxBacktrackVector

const maxBacktrackVector = 256 * 1024 // bit vector size <= max (bits)

const mergeFailed

const mergeFailed = uint32(0xffffffff)

mergeRuneSets merges two non-intersecting runesets, and returns the merged result, and a NextIp array. The idea is that if a rune matches the OnePassRunes at index i, NextIp[i/2] is the target. If the input sets intersect, an empty runeset and a NextIp array with the single element mergeFailed is returned. The code assumes that both inputs contain ordered and non-intersecting rune pairs.

const endOfText

const endOfText rune = -1

const startSize

const startSize = 10 // The size at which to start a slice in the 'All' routines.

Variables (exported)

var bitStatePool

var bitStatePool sync.Pool

var t

var t *thread

var flag

var flag lazyFlag

var onePassPool

var onePassPool sync.Pool

var flag

var flag lazyFlag

var arrayNoInts

var arrayNoInts [0]int

arrayNoInts is returned by doExecute match if nil dstCap is passed to it with ncap=0.

var buf

var buf strings.Builder

Have prefix; gather characters.

var noRune

var noRune = ...

var noNext

var noNext = ...

var lx

var lx, rx int

var rx

var lx, rx int

var anyRuneNotNL

var anyRuneNotNL = ...

var anyRune

var anyRune = ...

var instQueue

var instQueue = newQueue(len(p.Inst))

var visitQueue

var visitQueue = newQueue(len(p.Inst))

var check

var check func(uint32, []bool) bool

var onePassRunes

var onePassRunes = make([][]rune, len(p.Inst))

var matchSize

var matchSize = ...

Pools of machine for use during (Regexp).doExecute, split up by the size of the execution queues. matchPool[i] machines have queue size matchSize[i]. On a 64-bit system each queue entry is 16 bytes, so matchPool[0] has 162128 = 4kB queues, etc. The final matchPool is a catch-all for very large queues.

var matchPool

var matchPool [len(matchSize)]sync.Pool

var lnext

var lnext int

var buf

var buf []byte

var endPos

var endPos int

var dstCap

var dstCap [2]int

var width

var width int

Advance past this match; always advance at least one character.

var specialBytes

var specialBytes [16]byte

Bitmap used by func special to check whether a character needs to be escaped.

var i

var i int

A byte loop is correct because all metacharacters are ASCII.

var end

var end int

var width

var width int

var dstCap

var dstCap [2]int

var dstCap

var dstCap [2]int

var dstCap

var dstCap [4]int

var dstCap

var dstCap [4]int

var result

var result [][]byte

var result

var result [][]int

var result

var result []string

var result

var result [][]int

var result

var result [][][]byte

var result

var result [][]int

var result

var result [][]string

var result

var result [][]int

var goodRe

var goodRe = ...

var badRe

var badRe = ...

var replaceTests

var replaceTests = ...

var replaceLiteralTests

var replaceLiteralTests = ...

var replaceFuncTests

var replaceFuncTests = ...

var metaTests

var metaTests = ...

var literalPrefixTests

var literalPrefixTests = ...

var emptySubexpIndices

var emptySubexpIndices = ...

var subexpCases

var subexpCases = ...

var splitTests

var splitTests = ...

var sink

var sink string

var compileBenchData

var compileBenchData = ...

var minInputLenTests

var minInputLenTests = ...

var txt

var txt io.Reader

var str

var str []string

var input

var input []string

var inStrings

var inStrings bool

var re

var re *Regexp

var refull

var refull *Regexp

var nfail

var nfail int

var ncase

var ncase int

var text

var text string

var run

var run = ...

var match

var match = ...

var notab

var notab = MustCompilePOSIX(`[^\t]+`)

var x

var x []int

var end

var end byte = ')'

var v

var v = -1

var err

var err error

var text

var text []byte

var benchData

var benchData = ...

var benchSizes

var benchSizes = ...

var findTests

var findTests = ...

var runeMergeTests

var runeMergeTests = ...

var onePassTests

var onePassTests = ...

var p

var p *syntax.Prog

var re

var re *syntax.Regexp

var err

var err error

var onePassTests1

var onePassTests1 = ...

TODO(cespare): Unify with onePassTests and rationalize one-pass test cases.

Types (exported)

type job struct

type job struct {
	pc  uint32
	arg bool
	pos int
}

A job is an entry on the backtracker's job stack. It holds the instruction pc and the position in the input.

type bitState struct

type bitState struct {
	end      int
	cap      []int
	matchcap []int
	jobs     []job
	visited  []uint32

	inputs inputs
}

bitState holds state for the backtracker.

func newBitState() *bitState

func newBitState() *bitState

func (b bitState) reset(prog syntax.Prog, end int, ncap int)

func (b *bitState) reset(prog *syntax.Prog, end int, ncap int)

reset resets the state of the backtracker. end is the end position in the input. ncap is the number of captures.

func (b *bitState) shouldVisit(pc uint32, pos int) bool

func (b *bitState) shouldVisit(pc uint32, pos int) bool

shouldVisit reports whether the combination of (pc, pos) has not been visited yet.

func (b bitState) push(re Regexp, pc uint32, pos int, arg bool)

func (b *bitState) push(re *Regexp, pc uint32, pos int, arg bool)

push pushes (pc, pos, arg) onto the job stack if it should be visited.

type queue struct

type queue struct {
	sparse []uint32
	dense  []entry
}

A queue is a 'sparse array' holding pending threads of execution. See https://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html

type entry struct

type entry struct {
	pc uint32
	t  *thread
}

An entry is an entry on a queue. It holds both the instruction pc and the actual thread. Some queue entries are just place holders so that the machine knows it has considered that pc. Such entries have t == nil.

type thread struct

type thread struct {
	inst *syntax.Inst
	cap  []int
}

A thread is the state of a single path through the machine: an instruction and a corresponding capture array. See https://swtch.com/~rsc/regexp/regexp2.html

type machine struct

type machine struct {
	re       *Regexp      // corresponding Regexp
	p        *syntax.Prog // compiled program
	q0, q1   queue        // two queues for runq, nextq
	pool     []*thread    // pool of available threads
	matched  bool         // whether a match was found
	matchcap []int        // capture information for the match

	inputs inputs
}

A machine holds all the state during an NFA simulation for p.

func (m *machine) init(ncap int)

func (m *machine) init(ncap int)

func (m machine) alloc(i syntax.Inst) *thread

func (m *machine) alloc(i *syntax.Inst) *thread

alloc allocates a new thread with the given instruction. It uses the free pool if possible.

func (m *machine) match(i input, pos int) bool

func (m *machine) match(i input, pos int) bool

match runs the machine over the input starting at pos. It reports whether a match was found. If so, m.matchcap holds the submatch information.

func (m machine) clear(q queue)

func (m *machine) clear(q *queue)

clear frees all threads on the thread queue.

func (m machine) step(runq, nextq queue, pos, nextPos int, c rune, nextCond *lazyFlag)

func (m *machine) step(runq, nextq *queue, pos, nextPos int, c rune, nextCond *lazyFlag)

step executes one step of the machine, running each of the threads on runq and appending new threads to nextq. The step processes the rune c (which may be endOfText), which starts at position pos and ends at nextPos. nextCond gives the setting for the empty-width flags after c.

func (m machine) add(q queue, pc uint32, pos int, cap []int, cond lazyFlag, t thread) *thread

func (m *machine) add(q *queue, pc uint32, pos int, cap []int, cond *lazyFlag, t *thread) *thread

add adds an entry to q for pc, unless the q already has such an entry. It also recursively adds an entry for all instructions reachable from pc by following empty-width conditions satisfied by cond. pos gives the current position in the input.

type inputs struct

type inputs struct {
	// cached inputs, to avoid allocation
	bytes  inputBytes
	string inputString
	reader inputReader
}

func (i *inputs) newBytes(b []byte) input

func (i *inputs) newBytes(b []byte) input

func (i *inputs) newString(s string) input

func (i *inputs) newString(s string) input

func (i *inputs) newReader(r io.RuneReader) input

func (i *inputs) newReader(r io.RuneReader) input

func (i *inputs) clear()

func (i *inputs) clear()

func (i *inputs) init(r io.RuneReader, b []byte, s string) (input, int)

func (i *inputs) init(r io.RuneReader, b []byte, s string) (input, int)

type lazyFlag uint64

type lazyFlag uint64

A lazyFlag is a lazily-evaluated syntax.EmptyOp, for checking zero-width flags like ^ $ \A \z \B \b. It records the pair of relevant runes and does not determine the implied flags until absolutely necessary (most of the time, that means never).

func newLazyFlag(r1, r2 rune) lazyFlag

func newLazyFlag(r1, r2 rune) lazyFlag

func (f lazyFlag) match(op syntax.EmptyOp) bool

func (f lazyFlag) match(op syntax.EmptyOp) bool

type onePassMachine struct

type onePassMachine struct {
	inputs   inputs
	matchcap []int
}

func newOnePassMachine() *onePassMachine

func newOnePassMachine() *onePassMachine

type onePassProg struct

type onePassProg struct {
	Inst   []onePassInst
	Start  int // index of start instruction
	NumCap int // number of InstCapture insts in re
}

A onePassProg is a compiled one-pass regular expression program. It is the same as syntax.Prog except for the use of onePassInst.

func onePassCopy(prog syntax.Prog) onePassProg

func onePassCopy(prog *syntax.Prog) *onePassProg

onePassCopy creates a copy of the original Prog, as we'll be modifying it

func makeOnePass(p onePassProg) onePassProg

func makeOnePass(p *onePassProg) *onePassProg

makeOnePass creates a onepass Prog, if possible. It is possible if at any alt, the match engine can always tell which branch to take. The routine may modify p if it is turned into a onepass Prog. If it isn't possible for this to be a onepass Prog, the Prog nil is returned. makeOnePass is recursive to the size of the Prog.

func compileOnePass(prog syntax.Prog) (p onePassProg)

func compileOnePass(prog *syntax.Prog) (p *onePassProg)

compileOnePass returns a new *syntax.Prog suitable for onePass execution if the original Prog can be recharacterized as a one-pass regexp program, or syntax.nil if the Prog cannot be converted. For a one pass prog, the fundamental condition that must be true is: at any InstAlt, there must be no ambiguity about what branch to take.

type onePassInst struct

type onePassInst struct {
	syntax.Inst
	Next []uint32
}

A onePassInst is a single instruction in a one-pass regular expression program. It is the same as syntax.Inst except for the new 'Next' field.

type queueOnePass struct

type queueOnePass struct {
	sparse          []uint32
	dense           []uint32
	size, nextIndex uint32
}

Sparse Array implementation is used as a queueOnePass.

func newQueue(size int) (q *queueOnePass)

func newQueue(size int) (q *queueOnePass)

func (q *queueOnePass) empty() bool

func (q *queueOnePass) empty() bool

func (q *queueOnePass) next() (n uint32)

func (q *queueOnePass) next() (n uint32)

func (q *queueOnePass) clear()

func (q *queueOnePass) clear()

func (q *queueOnePass) contains(u uint32) bool

func (q *queueOnePass) contains(u uint32) bool

func (q *queueOnePass) insert(u uint32)

func (q *queueOnePass) insert(u uint32)

func (q *queueOnePass) insertNew(u uint32)

func (q *queueOnePass) insertNew(u uint32)

type runeSlice []rune

type runeSlice []rune

runeSlice exists to permit sorting the case-folded rune sets.

func (p runeSlice) Len() int

func (p runeSlice) Len() int

func (p runeSlice) Less(i, j int) bool

func (p runeSlice) Less(i, j int) bool

func (p runeSlice) Swap(i, j int)

func (p runeSlice) Swap(i, j int)

type Regexp struct (exported)

type Regexp struct {
	expr           string       // as passed to Compile
	prog           *syntax.Prog // compiled program
	onepass        *onePassProg // onepass program or nil
	numSubexp      int
	maxBitStateLen int
	subexpNames    []string
	prefix         string         // required prefix in unanchored matches
	prefixBytes    []byte         // prefix, as a []byte
	prefixRune     rune           // first rune in prefix
	prefixEnd      uint32         // pc for last rune in prefix
	mpool          int            // pool for machines
	matchcap       int            // size of recorded match lengths
	prefixComplete bool           // prefix is the entire regexp
	cond           syntax.EmptyOp // empty-width conditions required at start of match
	minInputLen    int            // minimum length of the input in bytes

	// This field can be modified by the Longest method,
	// but it is otherwise read-only.
	longest bool // whether regexp prefers leftmost-longest match
}

Regexp is the representation of a compiled regular expression. A Regexp is safe for concurrent use by multiple goroutines, except for configuration methods, such as Longest.

func Compile(expr string) (*Regexp, error) (exported)

func Compile(expr string) (*Regexp, error)

Compile parses a regular expression and returns, if successful, a Regexp object that can be used to match against text.

When matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses the one that a backtracking search would have found first. This so-called leftmost-first matching is the same semantics that Perl, Python, and other implementations use, although this package implements it without the expense of backtracking. For POSIX leftmost-longest matching, see CompilePOSIX.

func CompilePOSIX(expr string) (*Regexp, error) (exported)

func CompilePOSIX(expr string) (*Regexp, error)

CompilePOSIX is like Compile but restricts the regular expression to POSIX ERE (egrep) syntax and changes the match semantics to leftmost-longest.

That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This so-called leftmost-longest matching is the same semantics that early regular expression implementations used and that POSIX specifies.

However, there can be multiple leftmost-longest matches, with different submatch choices, and here this package diverges from POSIX. Among the possible leftmost-longest matches, this package chooses the one that a backtracking search would have found first, while POSIX specifies that the match be chosen to maximize the length of the first subexpression, then the second, and so on from left to right. The POSIX rule is computationally prohibitive and not even well-defined. See https://swtch.com/~rsc/regexp/regexp2.html#posix for details.

func compile(expr string, mode syntax.Flags, longest bool) (*Regexp, error)

func compile(expr string, mode syntax.Flags, longest bool) (*Regexp, error)

func MustCompile(str string) *Regexp (exported)

func MustCompile(str string) *Regexp

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

func MustCompilePOSIX(str string) *Regexp (exported)

func MustCompilePOSIX(str string) *Regexp

MustCompilePOSIX is like CompilePOSIX but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

func compileTest(t testing.T, expr string, error string) Regexp

func compileTest(t *testing.T, expr string, error string) *Regexp

func tryCompile(s string) (re *Regexp, err error)

func tryCompile(s string) (re *Regexp, err error)

func (re Regexp) tryBacktrack(b bitState, i input, pc uint32, pos int) bool

func (re *Regexp) tryBacktrack(b *bitState, i input, pc uint32, pos int) bool

tryBacktrack runs a backtracking search starting at pos.

func (re *Regexp) backtrack(ib []byte, is string, pos int, ncap int, dstCap []int) []int

func (re *Regexp) backtrack(ib []byte, is string, pos int, ncap int, dstCap []int) []int

backtrack runs a backtracking search of prog on the input starting at pos.

func (re *Regexp) doOnePass(ir io.RuneReader, ib []byte, is string, pos, ncap int, dstCap []int) []int

func (re *Regexp) doOnePass(ir io.RuneReader, ib []byte, is string, pos, ncap int, dstCap []int) []int

doOnePass implements r.doExecute using the one-pass execution engine.

func (re *Regexp) doMatch(r io.RuneReader, b []byte, s string) bool

func (re *Regexp) doMatch(r io.RuneReader, b []byte, s string) bool

doMatch reports whether either r, b or s match the regexp.

func (re *Regexp) doExecute(r io.RuneReader, b []byte, s string, pos int, ncap int, dstCap []int) []int

func (re *Regexp) doExecute(r io.RuneReader, b []byte, s string, pos int, ncap int, dstCap []int) []int

doExecute finds the leftmost match in the input, appends the position of its subexpressions to dstCap and returns dstCap.

nil is returned if no matches are found and non-nil if matches are found.

func (re *Regexp) String() string (exported)

func (re *Regexp) String() string

String returns the source text used to compile the regular expression.

func (re Regexp) Copy() Regexp (exported,deprecated)

func (re *Regexp) Copy() *Regexp

Copy returns a new Regexp object copied from re. Calling Longest on one copy does not affect another.

Deprecated: In earlier releases, when using a Regexp in multiple goroutines, giving each goroutine its own copy helped to avoid lock contention. As of Go 1.12, using Copy is no longer necessary to avoid lock contention. Copy may still be appropriate if the reason for its use is to make two copies with different Longest settings.

func (re *Regexp) Longest() (exported)

func (re *Regexp) Longest()

Longest makes future searches prefer the leftmost-longest match. That is, when matching against text, the regexp returns a match that begins as early as possible in the input (leftmost), and among those it chooses a match that is as long as possible. This method modifies the Regexp and may not be called concurrently with any other methods.

func (re Regexp) get() machine

func (re *Regexp) get() *machine

get returns a machine to use for matching re. It uses the re's machine cache if possible, to avoid unnecessary allocation.

func (re Regexp) put(m machine)

func (re *Regexp) put(m *machine)

put returns a machine to the correct machine pool.

func (re *Regexp) NumSubexp() int (exported)

func (re *Regexp) NumSubexp() int

NumSubexp returns the number of parenthesized subexpressions in this Regexp.

func (re *Regexp) SubexpNames() []string (exported)

func (re *Regexp) SubexpNames() []string

SubexpNames returns the names of the parenthesized subexpressions in this Regexp. The name for the first sub-expression is names[1], so that if m is a match slice, the name for m[i] is SubexpNames()[i]. Since the Regexp as a whole cannot be named, names[0] is always the empty string. The slice should not be modified.

func (re *Regexp) SubexpIndex(name string) int (exported)

func (re *Regexp) SubexpIndex(name string) int

SubexpIndex returns the index of the first subexpression with the given name, or -1 if there is no subexpression with that name.

Note that multiple subexpressions can be written using the same name, as in (?Pa+)(?Pb+), which declares two subexpressions named "bob". In this case, SubexpIndex returns the index of the leftmost such subexpression in the regular expression.

func (re *Regexp) LiteralPrefix() (prefix string, complete bool) (exported)

func (re *Regexp) LiteralPrefix() (prefix string, complete bool)

LiteralPrefix returns a literal string that must begin any match of the regular expression re. It returns the boolean true if the literal string comprises the entire regular expression.

func (re *Regexp) MatchReader(r io.RuneReader) bool (exported)

func (re *Regexp) MatchReader(r io.RuneReader) bool

MatchReader reports whether the text returned by the RuneReader contains any match of the regular expression re.

func (re *Regexp) MatchString(s string) bool (exported)

func (re *Regexp) MatchString(s string) bool

MatchString reports whether the string s contains any match of the regular expression re.

func (re *Regexp) Match(b []byte) bool (exported)

func (re *Regexp) Match(b []byte) bool

Match reports whether the byte slice b contains any match of the regular expression re.

func (re *Regexp) ReplaceAllString(src, repl string) string (exported)

func (re *Regexp) ReplaceAllString(src, repl string) string

ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

func (re *Regexp) ReplaceAllLiteralString(src, repl string) string (exported)

func (re *Regexp) ReplaceAllLiteralString(src, repl string) string

ReplaceAllLiteralString returns a copy of src, replacing matches of the Regexp with the replacement string repl. The replacement repl is substituted directly, without using Expand.

func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string (exported)

func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string

ReplaceAllStringFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched substring. The replacement returned by repl is substituted directly, without using Expand.

func (re *Regexp) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []byte

func (re *Regexp) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []byte

func (re *Regexp) ReplaceAll(src, repl []byte) []byte (exported)

func (re *Regexp) ReplaceAll(src, repl []byte) []byte

ReplaceAll returns a copy of src, replacing matches of the Regexp with the replacement text repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte (exported)

func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte

ReplaceAllLiteral returns a copy of src, replacing matches of the Regexp with the replacement bytes repl. The replacement repl is substituted directly, without using Expand.

func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte (exported)

func (re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte

ReplaceAllFunc returns a copy of src in which all matches of the Regexp have been replaced by the return value of function repl applied to the matched byte slice. The replacement returned by repl is substituted directly, without using Expand.

func (re *Regexp) pad(a []int) []int

func (re *Regexp) pad(a []int) []int

The number of capture values in the program may correspond to fewer capturing expressions than are in the regexp. For example, "(a){0}" turns into an empty program, so the maximum capture in the program is 0 but we need to return an expression for \1. Pad appends -1s to the slice a as needed.

func (re *Regexp) allMatches(s string, b []byte, n int, deliver func([]int))

func (re *Regexp) allMatches(s string, b []byte, n int, deliver func([]int))

allMatches calls deliver at most n times with the location of successive matches in the input text. The input text is b if non-nil, otherwise s.

func (re *Regexp) Find(b []byte) []byte (exported)

func (re *Regexp) Find(b []byte) []byte

Find returns a slice holding the text of the leftmost match in b of the regular expression. A return value of nil indicates no match.

func (re *Regexp) FindIndex(b []byte) (loc []int) (exported)

func (re *Regexp) FindIndex(b []byte) (loc []int)

FindIndex returns a two-element slice of integers defining the location of the leftmost match in b of the regular expression. The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.

func (re *Regexp) FindString(s string) string (exported)

func (re *Regexp) FindString(s string) string

FindString returns a string holding the text of the leftmost match in s of the regular expression. If there is no match, the return value is an empty string, but it will also be empty if the regular expression successfully matches an empty string. Use FindStringIndex or FindStringSubmatch if it is necessary to distinguish these cases.

func (re *Regexp) FindStringIndex(s string) (loc []int) (exported)

func (re *Regexp) FindStringIndex(s string) (loc []int)

FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s of the regular expression. The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.

func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int) (exported)

func (re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)

FindReaderIndex returns a two-element slice of integers defining the location of the leftmost match of the regular expression in text read from the RuneReader. The match text was found in the input stream at byte offset loc[0] through loc[1]-1. A return value of nil indicates no match.

func (re *Regexp) FindSubmatch(b []byte) [][]byte (exported)

func (re *Regexp) FindSubmatch(b []byte) [][]byte

FindSubmatch returns a slice of slices holding the text of the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the 'Submatch' descriptions in the package comment. A return value of nil indicates no match.

func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte (exported)

func (re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte

Expand appends template to dst and returns the result; during the append, Expand replaces variables in the template with corresponding matches drawn from src. The match slice should have been returned by FindSubmatchIndex.

In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores. A purely numeric name like $1 refers to the submatch with the corresponding index; other names refer to capturing parentheses named with the (?P...) syntax. A reference to an out of range or unmatched index or a name that is not present in the regular expression is replaced with an empty slice.

In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

To insert a literal $ in the output, use $$ in the template.

func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte (exported)

func (re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte

ExpandString is like Expand but the template and source are strings. It appends to and returns a byte slice in order to give the calling code control over allocation.

func (re *Regexp) expand(dst []byte, template string, bsrc []byte, src string, match []int) []byte

func (re *Regexp) expand(dst []byte, template string, bsrc []byte, src string, match []int) []byte

func (re *Regexp) FindSubmatchIndex(b []byte) []int (exported)

func (re *Regexp) FindSubmatchIndex(b []byte) []int

FindSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression in b and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindStringSubmatch(s string) []string (exported)

func (re *Regexp) FindStringSubmatch(s string) []string

FindStringSubmatch returns a slice of strings holding the text of the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindStringSubmatchIndex(s string) []int (exported)

func (re *Regexp) FindStringSubmatchIndex(s string) []int

FindStringSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression in s and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int (exported)

func (re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int

FindReaderSubmatchIndex returns a slice holding the index pairs identifying the leftmost match of the regular expression of text read by the RuneReader, and the matches, if any, of its subexpressions, as defined by the 'Submatch' and 'Index' descriptions in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAll(b []byte, n int) [][]byte (exported)

func (re *Regexp) FindAll(b []byte, n int) [][]byte

FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllIndex(b []byte, n int) [][]int (exported)

func (re *Regexp) FindAllIndex(b []byte, n int) [][]int

FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllString(s string, n int) []string (exported)

func (re *Regexp) FindAllString(s string, n int) []string

FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int (exported)

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int

FindAllStringIndex is the 'All' version of FindStringIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte (exported)

func (re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte

FindAllSubmatch is the 'All' version of FindSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int (exported)

func (re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int

FindAllSubmatchIndex is the 'All' version of FindSubmatchIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string (exported)

func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string

FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int (exported)

func (re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int

FindAllStringSubmatchIndex is the 'All' version of FindStringSubmatchIndex; it returns a slice of all successive matches of the expression, as defined by the 'All' description in the package comment. A return value of nil indicates no match.

func (re *Regexp) Split(s string, n int) []string (exported)

func (re *Regexp) Split(s string, n int) []string

Split slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.

The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.

Example:

s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]

The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings

type input interface

type input interface {
	step(pos int) (r rune, width int) // advance one rune
	canCheckPrefix() bool             // can we look ahead without losing info?
	hasPrefix(re *Regexp) bool
	index(re *Regexp, pos int) int
	context(pos int) lazyFlag
}

input abstracts different representations of the input text. It provides one-character lookahead.

type inputString struct

type inputString struct {
	str string
}

inputString scans a string.

func (i *inputString) step(pos int) (rune, int)

func (i *inputString) step(pos int) (rune, int)

func (i *inputString) canCheckPrefix() bool

func (i *inputString) canCheckPrefix() bool

func (i inputString) hasPrefix(re Regexp) bool

func (i *inputString) hasPrefix(re *Regexp) bool

func (i inputString) index(re Regexp, pos int) int

func (i *inputString) index(re *Regexp, pos int) int

func (i *inputString) context(pos int) lazyFlag

func (i *inputString) context(pos int) lazyFlag

type inputBytes struct

type inputBytes struct {
	str []byte
}

inputBytes scans a byte slice.

func (i *inputBytes) step(pos int) (rune, int)

func (i *inputBytes) step(pos int) (rune, int)

func (i *inputBytes) canCheckPrefix() bool

func (i *inputBytes) canCheckPrefix() bool

func (i inputBytes) hasPrefix(re Regexp) bool

func (i *inputBytes) hasPrefix(re *Regexp) bool

func (i inputBytes) index(re Regexp, pos int) int

func (i *inputBytes) index(re *Regexp, pos int) int

func (i *inputBytes) context(pos int) lazyFlag

func (i *inputBytes) context(pos int) lazyFlag

type inputReader struct

type inputReader struct {
	r     io.RuneReader
	atEOT bool
	pos   int
}

inputReader scans a RuneReader.

func (i *inputReader) step(pos int) (rune, int)

func (i *inputReader) step(pos int) (rune, int)

func (i *inputReader) canCheckPrefix() bool

func (i *inputReader) canCheckPrefix() bool

func (i inputReader) hasPrefix(re Regexp) bool

func (i *inputReader) hasPrefix(re *Regexp) bool

func (i inputReader) index(re Regexp, pos int) int

func (i *inputReader) index(re *Regexp, pos int) int

func (i *inputReader) context(pos int) lazyFlag

func (i *inputReader) context(pos int) lazyFlag

type stringError struct

type stringError struct {
	re  string
	err string
}

type ReplaceTest struct

type ReplaceTest struct {
	pattern, replacement, input, output string
}

type ReplaceFuncTest struct

type ReplaceFuncTest struct {
	pattern       string
	replacement   func(string) string
	input, output string
}

type MetaTest struct

type MetaTest struct {
	pattern, output, literal string
	isLiteral                bool
}

type subexpIndex struct

type subexpIndex struct {
	name  string
	index int
}

type subexpCase struct

type subexpCase struct {
	input   string
	num     int
	names   []string
	indices []subexpIndex
}

type FindTest struct

type FindTest struct {
	pat     string
	text    string
	matches [][]int
}

For each pattern/text pair, what is the expected output of each function? We can derive the textual results from the indexed results, the non-submatch results from the submatched results, the single results from the 'all' results, and the byte results from the string results. Therefore the table includes only the FindAllStringSubmatchIndex result.

func (t FindTest) String() string

func (t FindTest) String() string

Functions (exported)

func freeBitState(b *bitState)

func freeBitState(b *bitState)

func maxBitStateLen(prog *syntax.Prog) int

func maxBitStateLen(prog *syntax.Prog) int

maxBitStateLen returns the maximum length of a string to search with the backtracker using prog.

func shouldBacktrack(prog *syntax.Prog) bool

func shouldBacktrack(prog *syntax.Prog) bool

shouldBacktrack reports whether the program is too long for the backtracker to run.

func freeOnePassMachine(m *onePassMachine)

func freeOnePassMachine(m *onePassMachine)

func onePassPrefix(p *syntax.Prog) (prefix string, complete bool, pc uint32)

func onePassPrefix(p *syntax.Prog) (prefix string, complete bool, pc uint32)

OnePassPrefix returns a literal string that all matches for the regexp must start with. Complete is true if the prefix is the entire match. Pc is the index of the last rune instruction in the string. The OnePassPrefix skips over the mandatory EmptyBeginText

func onePassNext(i *onePassInst, r rune) uint32

func onePassNext(i *onePassInst, r rune) uint32

OnePassNext selects the next actionable state of the prog, based on the input character. It should only be called when i.Op == InstAlt or InstAltMatch, and from the one-pass machine. One of the alternates may ultimately lead without input to end of line. If the instruction is InstAltMatch the path to the InstMatch is in i.Out, the normal node in i.Next.

func iop(i *syntax.Inst) syntax.InstOp

func iop(i *syntax.Inst) syntax.InstOp

func mergeRuneSets(leftRunes, rightRunes *[]rune, leftPC, rightPC uint32) ([]rune, []uint32)

func mergeRuneSets(leftRunes, rightRunes *[]rune, leftPC, rightPC uint32) ([]rune, []uint32)

func cleanupOnePass(prog onePassProg, original syntax.Prog)

func cleanupOnePass(prog *onePassProg, original *syntax.Prog)

cleanupOnePass drops working memory, and restores certain shortcut instructions.

func minInputLen(re *syntax.Regexp) int

func minInputLen(re *syntax.Regexp) int

minInputLen walks the regexp to find the minimum length of any matchable input

func quote(s string) string

func quote(s string) string

func MatchReader(pattern string, r io.RuneReader) (matched bool, err error) (exported)

func MatchReader(pattern string, r io.RuneReader) (matched bool, err error)

MatchReader reports whether the text returned by the RuneReader contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.

func MatchString(pattern string, s string) (matched bool, err error) (exported)

func MatchString(pattern string, s string) (matched bool, err error)

MatchString reports whether the string s contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.

func Match(pattern string, b []byte) (matched bool, err error) (exported)

func Match(pattern string, b []byte) (matched bool, err error)

Match reports whether the byte slice b contains any match of the regular expression pattern. More complicated queries need to use Compile and the full Regexp interface.

func special(b byte) bool

func special(b byte) bool

special reports whether byte b needs to be escaped by QuoteMeta.

func init()

func init()

func QuoteMeta(s string) string (exported)

func QuoteMeta(s string) string

QuoteMeta returns a string that escapes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.

func extract(str string) (name string, num int, rest string, ok bool)

func extract(str string) (name string, num int, rest string, ok bool)

extract returns the name from a leading "$name" or "${name}" in str. If it is a number, extract returns num set to that number; otherwise num = -1.

func TestGoodCompile(t *testing.T)

func TestGoodCompile(t *testing.T)

func TestBadCompile(t *testing.T)

func TestBadCompile(t *testing.T)

func matchTest(t testing.T, test FindTest)

func matchTest(t *testing.T, test *FindTest)

func TestMatch(t *testing.T)

func TestMatch(t *testing.T)

func matchFunctionTest(t testing.T, test FindTest)

func matchFunctionTest(t *testing.T, test *FindTest)

func TestMatchFunction(t *testing.T)

func TestMatchFunction(t *testing.T)

func copyMatchTest(t testing.T, test FindTest)

func copyMatchTest(t *testing.T, test *FindTest)

func TestCopyMatch(t *testing.T)

func TestCopyMatch(t *testing.T)

func TestReplaceAll(t *testing.T)

func TestReplaceAll(t *testing.T)

func TestReplaceAllLiteral(t *testing.T)

func TestReplaceAllLiteral(t *testing.T)

func TestReplaceAllFunc(t *testing.T)

func TestReplaceAllFunc(t *testing.T)

func TestQuoteMeta(t *testing.T)

func TestQuoteMeta(t *testing.T)

func TestLiteralPrefix(t *testing.T)

func TestLiteralPrefix(t *testing.T)

func TestSubexp(t *testing.T)

func TestSubexp(t *testing.T)

func TestSplit(t *testing.T)

func TestSplit(t *testing.T)

func TestParseAndCompile(t *testing.T)

func TestParseAndCompile(t *testing.T)

The following sequence of Match calls used to panic. See issue #12980.

func TestOnePassCutoff(t *testing.T)

func TestOnePassCutoff(t *testing.T)

Check that one-pass cutoff does trigger.

func TestSwitchBacktrack(t *testing.T)

func TestSwitchBacktrack(t *testing.T)

Check that the same machine can be used with the standard matcher and then the backtracker when there are no captures.

func BenchmarkFind(b *testing.B)

func BenchmarkFind(b *testing.B)

func BenchmarkFindAllNoMatches(b *testing.B)

func BenchmarkFindAllNoMatches(b *testing.B)

func BenchmarkFindString(b *testing.B)

func BenchmarkFindString(b *testing.B)

func BenchmarkFindSubmatch(b *testing.B)

func BenchmarkFindSubmatch(b *testing.B)

func BenchmarkFindStringSubmatch(b *testing.B)

func BenchmarkFindStringSubmatch(b *testing.B)

func BenchmarkLiteral(b *testing.B)

func BenchmarkLiteral(b *testing.B)

func BenchmarkNotLiteral(b *testing.B)

func BenchmarkNotLiteral(b *testing.B)

func BenchmarkMatchClass(b *testing.B)

func BenchmarkMatchClass(b *testing.B)

func BenchmarkMatchClass_InRange(b *testing.B)

func BenchmarkMatchClass_InRange(b *testing.B)

func BenchmarkReplaceAll(b *testing.B)

func BenchmarkReplaceAll(b *testing.B)

func BenchmarkAnchoredLiteralShortNonMatch(b *testing.B)

func BenchmarkAnchoredLiteralShortNonMatch(b *testing.B)

func BenchmarkAnchoredLiteralLongNonMatch(b *testing.B)

func BenchmarkAnchoredLiteralLongNonMatch(b *testing.B)

func BenchmarkAnchoredShortMatch(b *testing.B)

func BenchmarkAnchoredShortMatch(b *testing.B)

func BenchmarkAnchoredLongMatch(b *testing.B)

func BenchmarkAnchoredLongMatch(b *testing.B)

func BenchmarkOnePassShortA(b *testing.B)

func BenchmarkOnePassShortA(b *testing.B)

func BenchmarkNotOnePassShortA(b *testing.B)

func BenchmarkNotOnePassShortA(b *testing.B)

func BenchmarkOnePassShortB(b *testing.B)

func BenchmarkOnePassShortB(b *testing.B)

func BenchmarkNotOnePassShortB(b *testing.B)

func BenchmarkNotOnePassShortB(b *testing.B)

func BenchmarkOnePassLongPrefix(b *testing.B)

func BenchmarkOnePassLongPrefix(b *testing.B)

func BenchmarkOnePassLongNotPrefix(b *testing.B)

func BenchmarkOnePassLongNotPrefix(b *testing.B)

func BenchmarkMatchParallelShared(b *testing.B)

func BenchmarkMatchParallelShared(b *testing.B)

func BenchmarkMatchParallelCopied(b *testing.B)

func BenchmarkMatchParallelCopied(b *testing.B)

func BenchmarkQuoteMetaAll(b *testing.B)

func BenchmarkQuoteMetaAll(b *testing.B)

func BenchmarkQuoteMetaNone(b *testing.B)

func BenchmarkQuoteMetaNone(b *testing.B)

func BenchmarkCompile(b *testing.B)

func BenchmarkCompile(b *testing.B)

func TestDeepEqual(t *testing.T)

func TestDeepEqual(t *testing.T)

func TestMinInputLen(t *testing.T)

func TestMinInputLen(t *testing.T)

func TestRE2Exhaustive(t *testing.T)

func TestRE2Exhaustive(t *testing.T)

This test is excluded when running under the race detector because it is a very expensive test and takes too long.

func TestRE2Search(t *testing.T)

func TestRE2Search(t *testing.T)

TestRE2 tests this package's regexp API against test cases considered during RE2's exhaustive tests, which run all possible regexps over a given set of atoms and operators, up to a given complexity, over all possible strings over a given alphabet, up to a given size. Rather than try to link with RE2, we read a log file containing the test cases and the expected matches. The log file, re2-exhaustive.txt, is generated by running 'make log' in the open source RE2 distribution https://github.com/google/re2/.

The test file format is a sequence of stanzas like:

strings
"abc"
"123x"
regexps
"[a-z]+"
0-3;0-3
-;-
"([0-9])([0-9])([0-9])"
-;-
-;0-3 0-1 1-2 2-3

The stanza begins by defining a set of strings, quoted using Go double-quote syntax, one per line. Then the regexps section gives a sequence of regexps to run on the strings. In the block that follows a regexp, each line gives the semicolon-separated match results of running the regexp on the corresponding string. Each match result is either a single -, meaning no match, or a space-separated sequence of pairs giving the match and submatch indices. An unmatched subexpression formats its pair as a single - (not illustrated above). For now each regexp run produces two match results, one for a full match' that restricts the regexp to matching the entire string or nothing, and one for a partial match' that gives the leftmost first match found in the string.

Lines beginning with # are comments. Lines beginning with a capital letter are test names printed during RE2's test suite and are echoed into t but otherwise ignored.

At time of writing, re2-exhaustive.txt is 59 MB but compresses to 385 kB, so we store re2-exhaustive.txt.bz2 in the repository and decompress it on the fly.

func testRE2(t *testing.T, file string)

func testRE2(t *testing.T, file string)

func runFull(re, refull *Regexp, text string) ([]int, string)

func runFull(re, refull *Regexp, text string) ([]int, string)

func runPartial(re, refull *Regexp, text string) ([]int, string)

func runPartial(re, refull *Regexp, text string) ([]int, string)

func runFullLongest(re, refull *Regexp, text string) ([]int, string)

func runFullLongest(re, refull *Regexp, text string) ([]int, string)

func runPartialLongest(re, refull *Regexp, text string) ([]int, string)

func runPartialLongest(re, refull *Regexp, text string) ([]int, string)

func matchFull(re, refull *Regexp, text string) (bool, string)

func matchFull(re, refull *Regexp, text string) (bool, string)

func matchPartial(re, refull *Regexp, text string) (bool, string)

func matchPartial(re, refull *Regexp, text string) (bool, string)

func matchFullLongest(re, refull *Regexp, text string) (bool, string)

func matchFullLongest(re, refull *Regexp, text string) (bool, string)

func matchPartialLongest(re, refull *Regexp, text string) (bool, string)

func matchPartialLongest(re, refull *Regexp, text string) (bool, string)

func isSingleBytes(s string) bool

func isSingleBytes(s string) bool

func parseResult(t *testing.T, file string, lineno int, res string) []int

func parseResult(t *testing.T, file string, lineno int, res string) []int

func same(x, y []int) bool

func same(x, y []int) bool

func TestFowler(t *testing.T)

func TestFowler(t *testing.T)

TestFowler runs this package's regexp API against the POSIX regular expression tests collected by Glenn Fowler at http://www2.research.att.com/~astopen/testregex/testregex.html.

func testFowler(t *testing.T, file string)

func testFowler(t *testing.T, file string)

func parseFowlerResult(s string) (ok, compiled, matched bool, pos []int)

func parseFowlerResult(s string) (ok, compiled, matched bool, pos []int)

func makeText(n int) []byte

func makeText(n int) []byte

func BenchmarkMatch(b *testing.B)

func BenchmarkMatch(b *testing.B)

func BenchmarkMatch_onepass_regex(b *testing.B)

func BenchmarkMatch_onepass_regex(b *testing.B)

func TestLongest(t *testing.T)

func TestLongest(t *testing.T)

func TestProgramTooLongForBacktrack(t *testing.T)

func TestProgramTooLongForBacktrack(t *testing.T)

TestProgramTooLongForBacktrack tests that a regex which is too long for the backtracker still executes properly.

func build(n int, x ...int) [][]int

func build(n int, x ...int) [][]int

build is a helper to construct a [][]int by extracting n sequences from x. This represents n matches with len(x)/n submatches each.

func TestFind(t *testing.T)

func TestFind(t *testing.T)

func TestFindString(t *testing.T)

func TestFindString(t *testing.T)

func testFindIndex(test FindTest, result []int, t testing.T)

func testFindIndex(test *FindTest, result []int, t *testing.T)

func TestFindIndex(t *testing.T)

func TestFindIndex(t *testing.T)

func TestFindStringIndex(t *testing.T)

func TestFindStringIndex(t *testing.T)

func TestFindReaderIndex(t *testing.T)

func TestFindReaderIndex(t *testing.T)

func TestFindAll(t *testing.T)

func TestFindAll(t *testing.T)

func TestFindAllString(t *testing.T)

func TestFindAllString(t *testing.T)

func testFindAllIndex(test FindTest, result [][]int, t testing.T)

func testFindAllIndex(test *FindTest, result [][]int, t *testing.T)

func TestFindAllIndex(t *testing.T)

func TestFindAllIndex(t *testing.T)

func TestFindAllStringIndex(t *testing.T)

func TestFindAllStringIndex(t *testing.T)

func testSubmatchBytes(test FindTest, n int, submatches []int, result [][]byte, t testing.T)

func testSubmatchBytes(test *FindTest, n int, submatches []int, result [][]byte, t *testing.T)

func TestFindSubmatch(t *testing.T)

func TestFindSubmatch(t *testing.T)

func testSubmatchString(test FindTest, n int, submatches []int, result []string, t testing.T)

func testSubmatchString(test *FindTest, n int, submatches []int, result []string, t *testing.T)

func TestFindStringSubmatch(t *testing.T)

func TestFindStringSubmatch(t *testing.T)

func testSubmatchIndices(test FindTest, n int, expect, result []int, t testing.T)

func testSubmatchIndices(test *FindTest, n int, expect, result []int, t *testing.T)

func testFindSubmatchIndex(test FindTest, result []int, t testing.T)

func testFindSubmatchIndex(test *FindTest, result []int, t *testing.T)

func TestFindSubmatchIndex(t *testing.T)

func TestFindSubmatchIndex(t *testing.T)

func TestFindStringSubmatchIndex(t *testing.T)

func TestFindStringSubmatchIndex(t *testing.T)

func TestFindReaderSubmatchIndex(t *testing.T)

func TestFindReaderSubmatchIndex(t *testing.T)

func TestFindAllSubmatch(t *testing.T)

func TestFindAllSubmatch(t *testing.T)

func TestFindAllStringSubmatch(t *testing.T)

func TestFindAllStringSubmatch(t *testing.T)

func testFindAllSubmatchIndex(test FindTest, result [][]int, t testing.T)

func testFindAllSubmatchIndex(test *FindTest, result [][]int, t *testing.T)

func TestFindAllSubmatchIndex(t *testing.T)

func TestFindAllSubmatchIndex(t *testing.T)

func TestFindAllStringSubmatchIndex(t *testing.T)

func TestFindAllStringSubmatchIndex(t *testing.T)

func TestMergeRuneSet(t *testing.T)

func TestMergeRuneSet(t *testing.T)

func TestCompileOnePass(t *testing.T)

func TestCompileOnePass(t *testing.T)

func TestRunOnePass(t *testing.T)

func TestRunOnePass(t *testing.T)

Package syntax (new page,exported)

Package syntax parses regular expressions into parse trees and compiles parse trees into programs. Most clients of regular expressions will use the facilities of package regexp (such as Compile and Match) instead of this package.

hdr-SyntaxSyntax

The regular expression syntax understood by this package when parsing with the Perl flag is as follows. Parts of the syntax can be disabled by passing alternate flags to Parse.

Single characters:

.              any character, possibly including newline (flag s=true)
[xyz]          character class
[^xyz]         negated character class
\d             Perl character class
\D             negated Perl character class
[[:alpha:]]    ASCII character class
[[:^alpha:]]   negated ASCII character class
\pN            Unicode character class (one-letter name)
\p{Greek}      Unicode character class
\PN            negated Unicode character class (one-letter name)
\P{Greek}      negated Unicode character class

Composites:

xy             x followed by y
x|y            x or y (prefer x)

Repetitions:

x*             zero or more x, prefer more
x+             one or more x, prefer more
x?             zero or one x, prefer one
x{n,m}         n or n+1 or ... or m x, prefer more
x{n,}          n or more x, prefer more
x{n}           exactly n x
x*?            zero or more x, prefer fewer
x+?            one or more x, prefer fewer
x??            zero or one x, prefer zero
x{n,m}?        n or n+1 or ... or m x, prefer fewer
x{n,}?         n or more x, prefer fewer
x{n}?          exactly n x

Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.

Grouping:

(re)           numbered capturing group (submatch)
(?P<name>re)   named & numbered capturing group (submatch)
(?:re)         non-capturing group
(?flags)       set flags within current group; non-capturing
(?flags:re)    set flags during re; non-capturing

Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:

i              case-insensitive (default false)
m              multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s              let . match \n (default false)
U              ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)

Empty strings:

^              at beginning of text or line (flag m=true)
$              at end of text (like \z not \Z) or line (flag m=true)
\A             at beginning of text
\b             at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B             not at ASCII word boundary
\z             at end of text

Escape sequences:

\a             bell (== \007)
\f             form feed (== \014)
\t             horizontal tab (== \011)
\n             newline (== \012)
\r             carriage return (== \015)
\v             vertical tab character (== \013)
\*             literal *, for any punctuation character *
\123           octal character code (up to three digits)
\x7F           hex character code (exactly two digits)
\x{10FFFF}     hex character code
\Q...\E        literal text ... even if ... has punctuation

Character class elements:

x              single character
A-Z            character range (inclusive)
\d             Perl character class
[:foo:]        ASCII character class foo
\p{Foo}        Unicode character class Foo
\pF            Unicode character class F (one-letter name)

Named character classes as character class elements:

[\d]           digits (== \d)
[^\d]          not digits (== \D)
[\D]           not digits (== \D)
[^\D]          not not digits (== \d)
[[:name:]]     named ASCII class inside character class (== [:name:])
[^[:name:]]    named ASCII class inside negated character class (== [:^name:])
[\p{Name}]     named Unicode property inside character class (== \p{Name})
[^\p{Name}]    named Unicode property inside negated character class (== \P{Name})

Perl character classes (all ASCII-only):

\d             digits (== [0-9])
\D             not digits (== [^0-9])
\s             whitespace (== [\t\n\f\r ])
\S             not whitespace (== [^\t\n\f\r ])
\w             word characters (== [0-9A-Za-z_])
\W             not word characters (== [^0-9A-Za-z_])

ASCII character classes:

[[:alnum:]]    alphanumeric (== [0-9A-Za-z])
[[:alpha:]]    alphabetic (== [A-Za-z])
[[:ascii:]]    ASCII (== [\x00-\x7F])
[[:blank:]]    blank (== [\t ])
[[:cntrl:]]    control (== [\x00-\x1F\x7F])
[[:digit:]]    digits (== [0-9])
[[:graph:]]    graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]]    lower case (== [a-z])
[[:print:]]    printable (== [ -~] == [ [:graph:]])
[[:punct:]]    punctuation (== [!-/:-@[-`{-~])
[[:space:]]    whitespace (== [\t\n\v\f\r ])
[[:upper:]]    upper case (== [A-Z])
[[:word:]]     word characters (== [0-9A-Za-z_])
[[:xdigit:]]   hex digit (== [0-9A-Fa-f])

Unicode character classes are those in unicode.Categories and unicode.Scripts.

Index

Constants
Variables
- var c
- var anyRuneNotNL
- var anyRune
- var f
- var f
- var f
- var _Op_index_0
- var str
- var strflags
- var istr
- var iflags
- var first
- var ifirst
- var p
- var err
- var c
- var op
- var lastRepeat
- var lit
- var ok1
- var c
- var anyTable
- var seq
- var name
- var lo
- var hi
- var code1
- var code2
- var code3
- var perlGroup
- var code4
- var code5
- var code6
- var code7
- var code8
- var code9
- var code10
- var code11
- var code12
- var code13
- var code14
- var code15
- var code16
- var code17
- var posixGroup
- var instOpNames
- var op
- var boundary
- var b
- var buf
- var flag
- var b
- var b
- var prefix
- var parseTests
- var foldcaseTests
- var literalTests
- var matchnlTests
- var nomatchnlTests
- var b
- var opNames
- var r
- var invalidRegexps
- var onlyPerl
- var onlyPOSIX
- var compileTests
- var r1
- var simplifyTests
Types
Functions

Constants (exported)

const _Op_name_0

const _Op_name_0 = "NoMatchEmptyMatchLiteralCharClassAnyCharNotNLAnyCharBeginLineEndLineBeginTextEndTextWordBoundaryNoWordBoundaryCaptureStarPlusQuestRepeatConcatAlternate"

const _Op_name_1

const _Op_name_1 = "opPseudo"

const ErrInternalError (exported)

const ErrInternalError ErrorCode = "regexp/syntax: internal error"

Unexpected error

const ErrInvalidCharClass (exported)

const ErrInvalidCharClass ErrorCode = "invalid character class"

Parse errors

const ErrInvalidCharRange (exported)

const ErrInvalidCharRange ErrorCode = "invalid character class range"

const ErrInvalidEscape (exported)

const ErrInvalidEscape ErrorCode = "invalid escape sequence"

const ErrInvalidNamedCapture (exported)

const ErrInvalidNamedCapture ErrorCode = "invalid named capture"

const ErrInvalidPerlOp (exported)

const ErrInvalidPerlOp ErrorCode = "invalid or unsupported Perl syntax"

const ErrInvalidRepeatOp (exported)

const ErrInvalidRepeatOp ErrorCode = "invalid nested repetition operator"

const ErrInvalidRepeatSize (exported)

const ErrInvalidRepeatSize ErrorCode = "invalid repeat count"

const ErrInvalidUTF8 (exported)

const ErrInvalidUTF8 ErrorCode = "invalid UTF-8"

const ErrMissingBracket (exported)

const ErrMissingBracket ErrorCode = "missing closing ]"

const ErrMissingParen (exported)

const ErrMissingParen ErrorCode = "missing closing )"

const ErrMissingRepeatArgument (exported)

const ErrMissingRepeatArgument ErrorCode = "missing argument to repetition operator"

const ErrTrailingBackslash (exported)

const ErrTrailingBackslash ErrorCode = "trailing backslash at end of expression"

const ErrUnexpectedParen (exported)

const ErrUnexpectedParen ErrorCode = "unexpected )"

const FoldCase (exported)

const FoldCase Flags = 1 << iota // case-insensitive match

const Literal (exported)

const Literal // treat pattern as literal string

const ClassNL (exported)

const ClassNL // allow character classes like [^a-z] and [[:space:]] to match newline

const DotNL (exported)

const DotNL // allow . to match newline

const OneLine (exported)

const OneLine // treat ^ and $ as only matching at beginning and end of text

const NonGreedy (exported)

const NonGreedy // make repetition operators default to non-greedy

const PerlX (exported)

const PerlX // allow Perl extensions

const UnicodeGroups (exported)

const UnicodeGroups // allow \p{Han}, \P{Han} for Unicode group and negation

const WasDollar (exported)

const WasDollar // regexp OpEndText was $, not \z

const Simple (exported)

const Simple // regexp contains no counted repetition

const MatchNL (exported)

const MatchNL = ClassNL | DotNL

const Perl (exported)

const Perl = ClassNL | OneLine | PerlX | UnicodeGroups // as close to Perl as possible

const POSIX (exported)

const POSIX Flags = 0 // POSIX syntax

const opLeftParen

const opLeftParen = opPseudo + iota

Pseudo-ops for parsing stack.

const opVerticalBar

const opVerticalBar

Pseudo-ops for parsing stack.

const minFold

const minFold = 0x0041

minimum and maximum runes involved in folding. checked during test.

const maxFold

const maxFold = 0x1e943

const InstAlt (exported)

const InstAlt InstOp = iota

const InstAltMatch (exported)

const InstAltMatch

const InstCapture (exported)

const InstCapture

const InstEmptyWidth (exported)

const InstEmptyWidth

const InstMatch (exported)

const InstMatch

const InstFail (exported)

const InstFail

const InstNop (exported)

const InstNop

const InstRune (exported)

const InstRune

const InstRune1 (exported)

const InstRune1

const InstRuneAny (exported)

const InstRuneAny

const InstRuneAnyNotNL (exported)

const InstRuneAnyNotNL

const EmptyBeginLine (exported)

const EmptyBeginLine EmptyOp = 1 << iota

const EmptyEndLine (exported)

const EmptyEndLine

const EmptyBeginText (exported)

const EmptyBeginText

const EmptyEndText (exported)

const EmptyEndText

const EmptyWordBoundary (exported)

const EmptyWordBoundary

const EmptyNoWordBoundary (exported)

const EmptyNoWordBoundary

const noMatch

const noMatch = -1

const OpNoMatch (exported)

const OpNoMatch Op = 1 + iota // matches no strings

const OpEmptyMatch (exported)

const OpEmptyMatch // matches empty string

const OpLiteral (exported)

const OpLiteral // matches Runes sequence

const OpCharClass (exported)

const OpCharClass // matches Runes interpreted as range pair list

const OpAnyCharNotNL (exported)

const OpAnyCharNotNL // matches any character except newline

const OpAnyChar (exported)

const OpAnyChar // matches any character

const OpBeginLine (exported)

const OpBeginLine // matches empty string at beginning of line

const OpEndLine (exported)

const OpEndLine // matches empty string at end of line

const OpBeginText (exported)

const OpBeginText // matches empty string at beginning of text

const OpEndText (exported)

const OpEndText // matches empty string at end of text

const OpWordBoundary (exported)

const OpWordBoundary // matches word boundary `\b`

const OpNoWordBoundary (exported)

const OpNoWordBoundary // matches word non-boundary `\B`

const OpCapture (exported)

const OpCapture // capturing subexpression with index Cap, optional name Name

const OpStar (exported)

const OpStar // matches Sub[0] zero or more times

const OpPlus (exported)

const OpPlus // matches Sub[0] one or more times

const OpQuest (exported)

const OpQuest // matches Sub[0] zero or one times

const OpRepeat (exported)

const OpRepeat // matches Sub[0] at least Min times, at most Max (Max == -1 is no limit)

const OpConcat (exported)

const OpConcat // matches concatenation of Subs

const OpAlternate (exported)

const OpAlternate // matches alternation of Subs

const opPseudo

const opPseudo Op = 128 // where pseudo-ops start

const meta

const meta = `\.+*?()|[]{}^$`

const testFlags

const testFlags = MatchNL | PerlX | UnicodeGroups

Variables (exported)

var c

var c compiler

var anyRuneNotNL

var anyRuneNotNL = ...

var anyRune

var anyRune = ...

var f

var f frag

var f

var f frag

var f

var f frag

var _Op_index_0

var _Op_index_0 = ...

var str

var str []rune

Round 1: Factor out common literal prefixes.

var strflags

var strflags Flags

var istr

var istr []rune

Invariant: the Regexps that were in sub[0:start] have been used or marked for reuse, and the slice space has been reused for out (len(out) <= start).

Invariant: sub[start:i] consists of regexps that all begin with str as modified by strflags.

var iflags

var iflags Flags

var first

var first *Regexp

var ifirst

var ifirst *Regexp

Invariant: the Regexps that were in sub[0:start] have been used or marked for reuse, and the slice space has been reused for out (len(out) <= start).

Invariant: sub[start:i] consists of regexps that all begin with ifirst.

var p

var p parser

Otherwise, must do real work.

var err

var err error

Otherwise, must do real work.

var c

var c rune

Otherwise, must do real work.

var op

var op Op

Otherwise, must do real work.

var lastRepeat

var lastRepeat string

Otherwise, must do real work.

var lit

var lit string

\Q ... \E: the ... is always literals

var ok1

var ok1 bool

var c

var c rune

Non-capturing group. Might also twiddle Perl flags.

var anyTable

var anyTable = ...

var seq

var seq, name string

var name

var seq, name string

var lo

var lo, hi rune

var hi

var lo, hi rune

var code1

var code1 = ...

var code2

var code2 = ...

var code3

var code3 = ...

var perlGroup

var perlGroup = ...

var code4

var code4 = ...

var code5

var code5 = ...

var code6

var code6 = ...

var code7

var code7 = ...

var code8

var code8 = ...

var code9

var code9 = ...

var code10

var code10 = ...

var code11

var code11 = ...

var code12

var code12 = ...

var code13

var code13 = ...

var code14

var code14 = ...

var code15

var code15 = ...

var code16

var code16 = ...

var code17

var code17 = ...

var posixGroup

var posixGroup = ...

var instOpNames

var instOpNames = ...

var op

var op EmptyOp = EmptyNoWordBoundary

var boundary

var boundary byte

var b

var b strings.Builder

var buf

var buf strings.Builder

Have prefix; gather characters.

var flag

var flag EmptyOp

var b

var b strings.Builder

var b

var b strings.Builder

var prefix

var prefix *Regexp

Build leading prefix: xx.

var parseTests

var parseTests = ...

var foldcaseTests

var foldcaseTests = ...

var literalTests

var literalTests = ...

var matchnlTests

var matchnlTests = ...

var nomatchnlTests

var nomatchnlTests = ...

var b

var b strings.Builder

var opNames

var opNames = ...

var r

var r []rune

AppendRange should collapse each of the new ranges into the earlier ones (it looks back two ranges), so that the slice never grows very large. Note that we are not calling cleanClass.

var invalidRegexps

var invalidRegexps = ...

var onlyPerl

var onlyPerl = ...

var onlyPOSIX

var onlyPOSIX = ...

var compileTests

var compileTests = ...

var r1

var r1 rune = -1

var simplifyTests

var simplifyTests = ...

Types (exported)

type patchList struct

type patchList struct {
	head, tail uint32
}

A patchList is a list of instruction pointers that need to be filled in (patched). Because the pointers haven't been filled in yet, we can reuse their storage to hold the list. It's kind of sleazy, but works well in practice. See https://swtch.com/~rsc/regexp/regexp1.html for inspiration.

These aren't really pointers: they're integers, so we can reinterpret them this way without using package unsafe. A value l.head denotes p.inst[l.head>>1].Out (l.head&1==0) or .Arg (l.head&1==1). head == 0 denotes the empty list, okay because we start every program with a fail instruction, so we'll never want to point at its output link.

func makePatchList(n uint32) patchList

func makePatchList(n uint32) patchList

func (l patchList) patch(p *Prog, val uint32)

func (l patchList) patch(p *Prog, val uint32)

func (l1 patchList) append(p *Prog, l2 patchList) patchList

func (l1 patchList) append(p *Prog, l2 patchList) patchList

type frag struct

type frag struct {
	i   uint32    // index of first instruction
	out patchList // where to record end instruction
}

A frag represents a compiled program fragment.

type compiler struct

type compiler struct {
	p *Prog
}

func (c *compiler) init()

func (c *compiler) init()

func (c compiler) compile(re Regexp) frag

func (c *compiler) compile(re *Regexp) frag

func (c *compiler) inst(op InstOp) frag

func (c *compiler) inst(op InstOp) frag

func (c *compiler) nop() frag

func (c *compiler) nop() frag

func (c *compiler) fail() frag

func (c *compiler) fail() frag

func (c *compiler) cap(arg uint32) frag

func (c *compiler) cap(arg uint32) frag

func (c *compiler) cat(f1, f2 frag) frag

func (c *compiler) cat(f1, f2 frag) frag

func (c *compiler) alt(f1, f2 frag) frag

func (c *compiler) alt(f1, f2 frag) frag

func (c *compiler) quest(f1 frag, nongreedy bool) frag

func (c *compiler) quest(f1 frag, nongreedy bool) frag

func (c *compiler) star(f1 frag, nongreedy bool) frag

func (c *compiler) star(f1 frag, nongreedy bool) frag

func (c *compiler) plus(f1 frag, nongreedy bool) frag

func (c *compiler) plus(f1 frag, nongreedy bool) frag

func (c *compiler) empty(op EmptyOp) frag

func (c *compiler) empty(op EmptyOp) frag

func (c *compiler) rune(r []rune, flags Flags) frag

func (c *compiler) rune(r []rune, flags Flags) frag

type Error struct (exported)

type Error struct {
	Code ErrorCode
	Expr string
}

An Error describes a failure to parse a regular expression and gives the offending expression.

func (e *Error) Error() string (exported)

func (e *Error) Error() string

type ErrorCode string (exported)

type ErrorCode string

An ErrorCode describes a failure to parse a regular expression.

func (e ErrorCode) String() string (exported)

func (e ErrorCode) String() string

type Flags uint16 (exported)

type Flags uint16

Flags control the behavior of the parser and record information about regexp context.

type parser struct

type parser struct {
	flags       Flags     // parse mode flags
	stack       []*Regexp // stack of parsed expressions
	free        *Regexp
	numCap      int // number of capturing groups seen
	wholeRegexp string
	tmpClass    []rune // temporary char class work space
}

func (p parser) newRegexp(op Op) Regexp

func (p *parser) newRegexp(op Op) *Regexp

func (p parser) reuse(re Regexp)

func (p *parser) reuse(re *Regexp)

func (p parser) push(re Regexp) *Regexp

func (p *parser) push(re *Regexp) *Regexp

push pushes the regexp re onto the parse stack and returns the regexp.

func (p *parser) maybeConcat(r rune, flags Flags) bool

func (p *parser) maybeConcat(r rune, flags Flags) bool

maybeConcat implements incremental concatenation of literal runes into string nodes. The parser calls this before each push, so only the top fragment of the stack might need processing. Since this is called before a push, the topmost literal is no longer subject to operators like * (Otherwise ab* would turn into (ab)*.) If r >= 0 and there's a node left over, maybeConcat uses it to push r with the given flags. maybeConcat reports whether r was pushed.

func (p *parser) literal(r rune)

func (p *parser) literal(r rune)

literal pushes a literal regexp for the rune r on the stack.

func (p parser) op(op Op) Regexp

func (p *parser) op(op Op) *Regexp

op pushes a regexp with the given op onto the stack and returns that regexp.

func (p *parser) repeat(op Op, min, max int, before, after, lastRepeat string) (string, error)

func (p *parser) repeat(op Op, min, max int, before, after, lastRepeat string) (string, error)

repeat replaces the top stack element with itself repeated according to op, min, max. before is the regexp suffix starting at the repetition operator. after is the regexp suffix following after the repetition operator. repeat returns an updated 'after' and an error, if any.

func (p parser) concat() Regexp

func (p *parser) concat() *Regexp

concat replaces the top of the stack (above the topmost '|' or '(') with its concatenation.

func (p parser) alternate() Regexp

func (p *parser) alternate() *Regexp

alternate replaces the top of the stack (above the topmost '(') with its alternation.

func (p parser) collapse(subs []Regexp, op Op) *Regexp

func (p *parser) collapse(subs []*Regexp, op Op) *Regexp

collapse returns the result of applying op to sub. If sub contains op nodes, they all get hoisted up so that there is never a concat of a concat or an alternate of an alternate.

func (p parser) factor(sub []Regexp) []*Regexp

func (p *parser) factor(sub []*Regexp) []*Regexp

factor factors common prefixes from the alternation list sub. It returns a replacement list that reuses the same storage and frees (passes to p.reuse) any removed *Regexps.

For example,

ABC|ABD|AEF|BCX|BCY

simplifies by literal prefix extraction to

A(B(C|D)|EF)|BC(X|Y)

which simplifies by character class introduction to

A(B[CD]|EF)|BC[XY]

func (p parser) leadingString(re Regexp) ([]rune, Flags)

func (p *parser) leadingString(re *Regexp) ([]rune, Flags)

leadingString returns the leading literal string that re begins with. The string refers to storage in re or its children.

func (p parser) removeLeadingString(re Regexp, n int) *Regexp

func (p *parser) removeLeadingString(re *Regexp, n int) *Regexp

removeLeadingString removes the first n leading runes from the beginning of re. It returns the replacement for re.

func (p parser) leadingRegexp(re Regexp) *Regexp

func (p *parser) leadingRegexp(re *Regexp) *Regexp

leadingRegexp returns the leading regexp that re begins with. The regexp refers to storage in re or its children.

func (p parser) removeLeadingRegexp(re Regexp, reuse bool) *Regexp

func (p *parser) removeLeadingRegexp(re *Regexp, reuse bool) *Regexp

removeLeadingRegexp removes the leading regexp in re. It returns the replacement for re. If reuse is true, it passes the removed regexp (if no longer needed) to p.reuse.

func (p *parser) parseRepeat(s string) (min, max int, rest string, ok bool)

func (p *parser) parseRepeat(s string) (min, max int, rest string, ok bool)

parseRepeat parses {min} (max=min) or {min,} (max=-1) or {min,max}. If s is not of that form, it returns ok == false. If s has the right form but the values are too big, it returns min == -1, ok == true.

func (p *parser) parsePerlFlags(s string) (rest string, err error)

func (p *parser) parsePerlFlags(s string) (rest string, err error)

parsePerlFlags parses a Perl flag setting or non-capturing group or both, like (?i) or (?: or (?i:. It removes the prefix from s and updates the parse state. The caller must have ensured that s begins with "(?".

func (p *parser) parseInt(s string) (n int, rest string, ok bool)

func (p *parser) parseInt(s string) (n int, rest string, ok bool)

parseInt parses a decimal integer.

func (p *parser) parseVerticalBar() error

func (p *parser) parseVerticalBar() error

parseVerticalBar handles a | in the input.

func (p *parser) swapVerticalBar() bool

func (p *parser) swapVerticalBar() bool

If the top of the stack is an element followed by an opVerticalBar swapVerticalBar swaps the two and returns true. Otherwise it returns false.

func (p *parser) parseRightParen() error

func (p *parser) parseRightParen() error

parseRightParen handles a ) in the input.

func (p *parser) parseEscape(s string) (r rune, rest string, err error)

func (p *parser) parseEscape(s string) (r rune, rest string, err error)

parseEscape parses an escape sequence at the beginning of s and returns the rune.

func (p *parser) parseClassChar(s, wholeClass string) (r rune, rest string, err error)

func (p *parser) parseClassChar(s, wholeClass string) (r rune, rest string, err error)

parseClassChar parses a character class character at the beginning of s and returns it.

func (p *parser) parsePerlClassEscape(s string, r []rune) (out []rune, rest string)

func (p *parser) parsePerlClassEscape(s string, r []rune) (out []rune, rest string)

parsePerlClassEscape parses a leading Perl character class escape like \d from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.

func (p *parser) parseNamedClass(s string, r []rune) (out []rune, rest string, err error)

func (p *parser) parseNamedClass(s string, r []rune) (out []rune, rest string, err error)

parseNamedClass parses a leading POSIX named character class like [:alnum:] from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.

func (p *parser) appendGroup(r []rune, g charGroup) []rune

func (p *parser) appendGroup(r []rune, g charGroup) []rune

func (p *parser) parseUnicodeClass(s string, r []rune) (out []rune, rest string, err error)

func (p *parser) parseUnicodeClass(s string, r []rune) (out []rune, rest string, err error)

parseUnicodeClass parses a leading Unicode character class like \p{Han} from the beginning of s. If one is present, it appends the characters to r and returns the new slice r and the remainder of the string.

func (p *parser) parseClass(s string) (rest string, err error)

func (p *parser) parseClass(s string) (rest string, err error)

parseClass parses a character class at the beginning of s and pushes it onto the parse stack.

type charGroup struct

type charGroup struct {
	sign  int
	class []rune
}

type ranges struct

type ranges struct {
	p *[]rune
}

ranges implements sort.Interface on a []rune. The choice of receiver type definition is strange but avoids an allocation since we already have a *[]rune.

func (ra ranges) Less(i, j int) bool

func (ra ranges) Less(i, j int) bool

func (ra ranges) Len() int

func (ra ranges) Len() int

func (ra ranges) Swap(i, j int)

func (ra ranges) Swap(i, j int)

type Prog struct (exported)

type Prog struct {
	Inst   []Inst
	Start  int // index of start instruction
	NumCap int // number of InstCapture insts in re
}

A Prog is a compiled regular expression program.

func Compile(re Regexp) (Prog, error) (exported)

func Compile(re *Regexp) (*Prog, error)

Compile compiles the regexp into a program to be executed. The regexp should have been simplified already (returned from re.Simplify).

func (p *Prog) String() string (exported)

func (p *Prog) String() string

func (p Prog) skipNop(pc uint32) Inst

func (p *Prog) skipNop(pc uint32) *Inst

skipNop follows any no-op or capturing instructions.

func (p *Prog) Prefix() (prefix string, complete bool) (exported)

func (p *Prog) Prefix() (prefix string, complete bool)

Prefix returns a literal string that all matches for the regexp must start with. Complete is true if the prefix is the entire match.

func (p *Prog) StartCond() EmptyOp (exported)

func (p *Prog) StartCond() EmptyOp

StartCond returns the leading empty-width conditions that must be true in any match. It returns ^EmptyOp(0) if no matches are possible.

type InstOp uint8 (exported)

type InstOp uint8

An InstOp is an instruction opcode.

func (i InstOp) String() string (exported)

func (i InstOp) String() string

type EmptyOp uint8 (exported)

type EmptyOp uint8

An EmptyOp specifies a kind or mixture of zero-width assertions.

func EmptyOpContext(r1, r2 rune) EmptyOp (exported)

func EmptyOpContext(r1, r2 rune) EmptyOp

EmptyOpContext returns the zero-width assertions satisfied at the position between the runes r1 and r2. Passing r1 == -1 indicates that the position is at the beginning of the text. Passing r2 == -1 indicates that the position is at the end of the text.

type Inst struct (exported)

type Inst struct {
	Op   InstOp
	Out  uint32 // all but InstMatch, InstFail
	Arg  uint32 // InstAlt, InstAltMatch, InstCapture, InstEmptyWidth
	Rune []rune
}

An Inst is a single instruction in a regular expression program.

func (i *Inst) op() InstOp

func (i *Inst) op() InstOp

op returns i.Op but merges all the Rune special cases into InstRune

func (i *Inst) MatchRune(r rune) bool (exported)

func (i *Inst) MatchRune(r rune) bool

MatchRune reports whether the instruction matches (and consumes) r. It should only be called when i.Op == InstRune.

func (i *Inst) MatchRunePos(r rune) int (exported)

func (i *Inst) MatchRunePos(r rune) int

MatchRunePos checks whether the instruction matches (and consumes) r. If so, MatchRunePos returns the index of the matching rune pair (or, when len(i.Rune) == 1, rune singleton). If not, MatchRunePos returns -1. MatchRunePos should only be called when i.Op == InstRune.

func (i *Inst) MatchEmptyWidth(before rune, after rune) bool (exported)

func (i *Inst) MatchEmptyWidth(before rune, after rune) bool

MatchEmptyWidth reports whether the instruction matches an empty string between the runes before and after. It should only be called when i.Op == InstEmptyWidth.

func (i *Inst) String() string (exported)

func (i *Inst) String() string

type Regexp struct (exported)

type Regexp struct {
	Op       Op // operator
	Flags    Flags
	Sub      []*Regexp  // subexpressions, if any
	Sub0     [1]*Regexp // storage for short Sub
	Rune     []rune     // matched runes, for OpLiteral, OpCharClass
	Rune0    [2]rune    // storage for short Rune
	Min, Max int        // min, max for OpRepeat
	Cap      int        // capturing index, for OpCapture
	Name     string     // capturing name, for OpCapture
}

A Regexp is a node in a regular expression syntax tree.

func literalRegexp(s string, flags Flags) *Regexp

func literalRegexp(s string, flags Flags) *Regexp

func Parse(s string, flags Flags) (*Regexp, error) (exported)

func Parse(s string, flags Flags) (*Regexp, error)

Parse parses a regular expression string s, controlled by the specified Flags, and returns a regular expression parse tree. The syntax is described in the top-level comment.

func simplify1(op Op, flags Flags, sub, re Regexp) Regexp

func simplify1(op Op, flags Flags, sub, re *Regexp) *Regexp

simplify1 implements Simplify for the unary OpStar, OpPlus, and OpQuest operators. It returns the simple regexp equivalent to

Regexp{Op: op, Flags: flags, Sub: {sub}}

under the assumption that sub is already simple, and without first allocating that structure. If the regexp to be returned turns out to be equivalent to re, simplify1 returns re instead.

simplify1 is factored out of Simplify because the implementation for other operators generates these unary expressions. Letting them call simplify1 makes sure the expressions they generate are simple.

func (x Regexp) Equal(y Regexp) bool (exported)

func (x *Regexp) Equal(y *Regexp) bool

Equal reports whether x and y have identical structure.

func (re *Regexp) String() string (exported)

func (re *Regexp) String() string

func (re *Regexp) MaxCap() int (exported)

func (re *Regexp) MaxCap() int

MaxCap walks the regexp to find the maximum capture index.

func (re *Regexp) CapNames() []string (exported)

func (re *Regexp) CapNames() []string

CapNames walks the regexp to find the names of capturing groups.

func (re *Regexp) capNames(names []string)

func (re *Regexp) capNames(names []string)

func (re Regexp) Simplify() Regexp (exported)

func (re *Regexp) Simplify() *Regexp

Simplify returns a regexp equivalent to re but without counted repetitions and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/. The resulting regexp will execute correctly but its string representation will not produce the same parse tree, because capturing parentheses may have been duplicated or removed. For example, the simplified form for /(x){1,2}/ is /(x)(x)?/ but both parentheses capture as $1. The returned regexp may share structure with or be the original.

type Op uint8 (exported)

type Op uint8

An Op is a single regular expression operator.

func (i Op) String() string (exported)

func (i Op) String() string

type parseTest struct

type parseTest struct {
	Regexp string
	Dump   string
}

Functions (exported)

func minFoldRune(r rune) rune

func minFoldRune(r rune) rune

minFoldRune returns the minimum rune fold-equivalent to r.

func repeatIsValid(re *Regexp, n int) bool

func repeatIsValid(re *Regexp, n int) bool

repeatIsValid reports whether the repetition re is valid. Valid means that the combination of the top-level repetition and any inner repetitions does not exceed n copies of the innermost thing. This function rewalks the regexp tree and is called for every repetition, so we have to worry about inducing quadratic behavior in the parser. We avoid this by only calling repeatIsValid when min or max >= 2. In that case the depth of any >= 2 nesting can only get to 9 without triggering a parse error, so each subtree can only be rewalked 9 times.

func cleanAlt(re *Regexp)

func cleanAlt(re *Regexp)

cleanAlt cleans re for eventual inclusion in an alternation.

func isValidCaptureName(name string) bool

func isValidCaptureName(name string) bool

isValidCaptureName reports whether name is a valid capture name: [A-Za-z0-9_]+. PCRE limits names to 32 bytes. Python rejects names starting with digits. We don't enforce either of those.

func isCharClass(re *Regexp) bool

func isCharClass(re *Regexp) bool

can this be represented as a character class? single-rune literal string, char class, ., and .|\n.

func matchRune(re *Regexp, r rune) bool

func matchRune(re *Regexp, r rune) bool

does re match r?

func mergeCharClass(dst, src *Regexp)

func mergeCharClass(dst, src *Regexp)

mergeCharClass makes dst = dst|src. The caller must ensure that dst.Op >= src.Op, to reduce the amount of copying.

func unicodeTable(name string) (unicode.RangeTable, unicode.RangeTable)

func unicodeTable(name string) (*unicode.RangeTable, *unicode.RangeTable)

unicodeTable returns the unicode.RangeTable identified by name and the table of additional fold-equivalent code points.

func cleanClass(rp *[]rune) []rune

func cleanClass(rp *[]rune) []rune

cleanClass sorts the ranges (pairs of elements of r), merges them, and eliminates duplicates.

func appendLiteral(r []rune, x rune, flags Flags) []rune

func appendLiteral(r []rune, x rune, flags Flags) []rune

appendLiteral returns the result of appending the literal x to the class r.

func appendRange(r []rune, lo, hi rune) []rune

func appendRange(r []rune, lo, hi rune) []rune

appendRange returns the result of appending the range lo-hi to the class r.

func appendFoldedRange(r []rune, lo, hi rune) []rune

func appendFoldedRange(r []rune, lo, hi rune) []rune

appendFoldedRange returns the result of appending the range lo-hi and its case folding-equivalent runes to the class r.

func appendClass(r []rune, x []rune) []rune

func appendClass(r []rune, x []rune) []rune

appendClass returns the result of appending the class x to the class r. It assume x is clean.

func appendFoldedClass(r []rune, x []rune) []rune

func appendFoldedClass(r []rune, x []rune) []rune

appendFolded returns the result of appending the case folding of the class x to the class r.

func appendNegatedClass(r []rune, x []rune) []rune

func appendNegatedClass(r []rune, x []rune) []rune

appendNegatedClass returns the result of appending the negation of the class x to the class r. It assumes x is clean.

func appendTable(r []rune, x *unicode.RangeTable) []rune

func appendTable(r []rune, x *unicode.RangeTable) []rune

appendTable returns the result of appending x to the class r.

func appendNegatedTable(r []rune, x *unicode.RangeTable) []rune

func appendNegatedTable(r []rune, x *unicode.RangeTable) []rune

appendNegatedTable returns the result of appending the negation of x to the class r.

func negateClass(r []rune) []rune

func negateClass(r []rune) []rune

negateClass overwrites r and returns r's negation. It assumes the class r is already clean.

func checkUTF8(s string) error

func checkUTF8(s string) error

func nextRune(s string) (c rune, t string, err error)

func nextRune(s string) (c rune, t string, err error)

func isalnum(c rune) bool

func isalnum(c rune) bool

func unhex(c rune) rune

func unhex(c rune) rune

func IsWordChar(r rune) bool (exported)

func IsWordChar(r rune) bool

IsWordChar reports whether r is consider a `word character' during the evaluation of the \b and \B zero-width assertions. These assertions are ASCII-only: the word characters are [A-Za-z0-9_].

func bw(b *strings.Builder, args ...string)

func bw(b *strings.Builder, args ...string)

func dumpProg(b strings.Builder, p Prog)

func dumpProg(b *strings.Builder, p *Prog)

func u32(i uint32) string

func u32(i uint32) string

func dumpInst(b strings.Builder, i Inst)

func dumpInst(b *strings.Builder, i *Inst)

func writeRegexp(b strings.Builder, re Regexp)

func writeRegexp(b *strings.Builder, re *Regexp)

writeRegexp writes the Perl syntax for the regular expression re to b.

func escape(b *strings.Builder, r rune, force bool)

func escape(b *strings.Builder, r rune, force bool)

func TestParseSimple(t *testing.T)

func TestParseSimple(t *testing.T)

func TestParseFoldCase(t *testing.T)

func TestParseFoldCase(t *testing.T)

func TestParseLiteral(t *testing.T)

func TestParseLiteral(t *testing.T)

func TestParseMatchNL(t *testing.T)

func TestParseMatchNL(t *testing.T)

func TestParseNoMatchNL(t *testing.T)

func TestParseNoMatchNL(t *testing.T)

func testParseDump(t *testing.T, tests []parseTest, flags Flags)

func testParseDump(t *testing.T, tests []parseTest, flags Flags)

Test Parse -> Dump.

func dump(re *Regexp) string

func dump(re *Regexp) string

dump prints a string representation of the regexp showing the structure explicitly.

func dumpRegexp(b strings.Builder, re Regexp)

func dumpRegexp(b *strings.Builder, re *Regexp)

dumpRegexp writes an encoding of the syntax tree for the regexp re to b. It is used during testing to distinguish between parses that might print the same using re's String method.

func mkCharClass(f func(rune) bool) string

func mkCharClass(f func(rune) bool) string

func isUpperFold(r rune) bool

func isUpperFold(r rune) bool

func TestFoldConstants(t *testing.T)

func TestFoldConstants(t *testing.T)

func TestAppendRangeCollapse(t *testing.T)

func TestAppendRangeCollapse(t *testing.T)

func TestParseInvalidRegexps(t *testing.T)

func TestParseInvalidRegexps(t *testing.T)

func TestToStringEquivalentParse(t *testing.T)

func TestToStringEquivalentParse(t *testing.T)

func TestCompile(t *testing.T)

func TestCompile(t *testing.T)

func BenchmarkEmptyOpContext(b *testing.B)

func BenchmarkEmptyOpContext(b *testing.B)

func TestSimplify(t *testing.T)

func TestSimplify(t *testing.T)

Package regexp_test (new page)

Variables

var validID

var validID = regexp.MustCompile(`^[a-z]+\[[0-9]+\]$`)

Compile the expression once, usually at init time. Use raw strings to avoid having to quote the backslashes.

Functions

func Example()

func Example()

func ExampleMatch()

func ExampleMatch()

func ExampleMatchString()

func ExampleMatchString()

func ExampleQuoteMeta()

func ExampleQuoteMeta()

func ExampleRegexp_Find()

func ExampleRegexp_Find()

func ExampleRegexp_FindAll()

func ExampleRegexp_FindAll()

func ExampleRegexp_FindAllSubmatch()

func ExampleRegexp_FindAllSubmatch()

func ExampleRegexp_FindSubmatch()

func ExampleRegexp_FindSubmatch()

func ExampleRegexp_Match()

func ExampleRegexp_Match()

func ExampleRegexp_FindString()

func ExampleRegexp_FindString()

func ExampleRegexp_FindStringIndex()

func ExampleRegexp_FindStringIndex()

func ExampleRegexp_FindStringSubmatch()

func ExampleRegexp_FindStringSubmatch()

func ExampleRegexp_FindAllString()

func ExampleRegexp_FindAllString()

func ExampleRegexp_FindAllStringSubmatch()

func ExampleRegexp_FindAllStringSubmatch()

func ExampleRegexp_FindAllStringSubmatchIndex()

func ExampleRegexp_FindAllStringSubmatchIndex()

func ExampleRegexp_FindSubmatchIndex()

func ExampleRegexp_FindSubmatchIndex()

func ExampleRegexp_Longest()

func ExampleRegexp_Longest()

func ExampleRegexp_MatchString()

func ExampleRegexp_MatchString()

func ExampleRegexp_NumSubexp()

func ExampleRegexp_NumSubexp()

func ExampleRegexp_ReplaceAll()

func ExampleRegexp_ReplaceAll()

func ExampleRegexp_ReplaceAllLiteralString()

func ExampleRegexp_ReplaceAllLiteralString()

func ExampleRegexp_ReplaceAllString()

func ExampleRegexp_ReplaceAllString()

func ExampleRegexp_ReplaceAllStringFunc()

func ExampleRegexp_ReplaceAllStringFunc()

func ExampleRegexp_SubexpNames()

func ExampleRegexp_SubexpNames()

func ExampleRegexp_SubexpIndex()

func ExampleRegexp_SubexpIndex()

func ExampleRegexp_Split()

func ExampleRegexp_Split()

func ExampleRegexp_Expand()

func ExampleRegexp_Expand()

func ExampleRegexp_ExpandString()

func ExampleRegexp_ExpandString()

func ExampleRegexp_FindIndex()

func ExampleRegexp_FindIndex()

func ExampleRegexp_FindAllSubmatchIndex()

func ExampleRegexp_FindAllSubmatchIndex()

func ExampleRegexp_FindAllIndex()

func ExampleRegexp_FindAllIndex()

emidoots/regexp_*.md

Index

Index

emidoots/**regexp_*.md**