Skip to content

Instantly share code, notes, and snippets.

@spr2-dev
Last active January 22, 2025 12:15
Show Gist options
  • Save spr2-dev/46ca9f4a6f933fa266bccd87fd15d09a to your computer and use it in GitHub Desktop.
Save spr2-dev/46ca9f4a6f933fa266bccd87fd15d09a to your computer and use it in GitHub Desktop.
A quick beginner guide on Lua pattern matching

Lua Pattern Matching Cheat Sheet

Table Of Contents

  1. What's a pattern ?
  2. Positional characters
  3. Common matching characters
  4. Ranges, character sets and logic
  5. Quantifiers
  6. Capture groups
  7. Examples

1. What's a pattern ?

A pattern is a string that contains some characters that are recognized and treated by a "regex engine". Patterns are used to match, i.e. check if a "subject string" comply to the pattern.

2. Positional characters

In patterns, there are two positional characters : ^ and $. ^ asserts the start of the subject string. $ asserts the end of the subject string.

3. Common matching characters

  • . : Any character
  • %w : Any character that is alphanumeric (from A to Z and from 0 to 9)
  • %W : Any character that is NOT alphanumeric
  • %d : Any character that is a digit (from 0 to 9)
  • %D : Any character that is NOT a number
  • %s : Any character that is whitespace (" ", tab, line breaks)
  • %S : Any character that is NOT whitespace

4. Ranges, character sets and logic

Inside of a pattern, you can add "ranges" in brackets.

  • [ab] will search for a character that is either "a" OR "b"
  • [^ab] will search for a character that is neither "a" nor "b"
  • [a-i] will search for a character that is between "a" and "i" in the alphabet
  • [6-8] will search for a character that is between "6" and "8"
  • [a-zA-Z0-9] will search for : A character that is between "a" and "z" OR a character that is between "A" and "Z" OR a character that is between "0" and "9" The previous pattern is the same as %w. If you want to match the [ or ] char literally, put a "%" before it.

5. Quantifiers

Quantifiers are special characters that goes after any char, common matching char, range or charset.

  • * will match the previous char 0 to n times.
  • + will match the previous char 1 to n times.
  • ? will match the previous char 0 or 1 times.

6. Capture groups

Without capture groups, the whole matching subject string is returned. Sometimes only certain parts are required. Capture groups defines what to return. Multiple capture groups, multiple values are returned. To make a capture group, enclose what you want to return with parentheses. If you want to match the ( or ) char literally, put a "%" before it.

7. Examples

Match a string that...

  • contains "apples" anywhere in it : apples
  • contains "apples" at the start of the string : ^apples
  • contains "apples" at the end of the string : apples$
  • has any 3 characters between "app" and "les" : app...les
  • contains a letter or a digit at any position : %w
  • contains a digit at the end of the string : %d$
  • contains either "duck" or "luck" at any position : [dl]uck
  • does not contain a number between 0 and 4 at the start of the string : ^[^0-4]
  • contains "]" at the end of the string : %]$
  • is either "apple" or "apples" : apples?
  • is "a", or "aa", or "aaa", or "aaaa"... : a+
@itsvishal7
Copy link

Is there a way to find out how many capturing groups are present in any pattern?

I am generating the pattern for a given string on the fly, which may have n number of capturing group

for eg.

sample1=[[Hello (.*)
My name is (.*)
Nice to chat with (.*)
]]

sample2=[[(.*) is great]]

function findpattern(contents, pattern)
  -- for each match in `string.gmatch(contents, pattern)` do
  --   print(each captured group in this match)
end

findpattern(contents, sample1)
findpattern(contents, sample2)


@itsvishal7
Copy link

Is there a way to find out how many capturing groups are present in any pattern?

I am generating the pattern for a given string on the fly, which may have n number of capturing group

for eg.

sample1=[[Hello (.*)
My name is (.*)
Nice to chat with (.*)
]]

sample2=[[(.*) is great]]

function findpattern(contents, pattern)
  -- for each match in `string.gmatch(contents, pattern)` do
  --   print(each captured group in this match)
end

findpattern(contents, sample1)
findpattern(contents, sample2)

string.gmatch() returns an iterator function yielding individual matches;
I needed to collect these unpacked matches into a table.

 local function get_matches(contents, regex)
         local cal = string.gmatch(contents, regex)

         local results = {}
         local match = { cal() }
         while #match > 0 do
                table.insert(results, match)
                match = { cal() }
         end

         return results
  end

@spr2-dev
Copy link
Author

At first I misinterpreted your question, I thought you asked that, given just a pattern you wanted to know how much capture groups it had. Like "My name is (%w+) and I'm (%d+) years old" passed into the magic function would return 2.

But yes if you want to iterate through all the matches to get their count (and optionally the results themselves) you'd use a loop like you have now.
A simplification would be to use a for ... in loop, that takes in an iterator (like what pairs and gmatch return) and automatically continues the loop until the iterator returns nil. The iteration variable is populated for you and you can use it inside the loop:

local results = {};
for m in string.gmatch(subject, pattern) do
    table.insert(results, m)
end
return results

That way you don't have to call the iterator and check whether there's any more things to iterate yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment