REGEX IN TEXTASTIC

and Workflow iOS apps

ver. 160719_c

NOTE: this is a backup of my own notes.

switch on "Regular Expressions"
now you can go back and use regular expressions
start by finding a number

`([0-9])`

and replacing with

`Number $1`

you'll of course need numerals in your document for this to work
once you've performed the search you'll see a list of found items which you can replace individually or all at the same time, and the found items will also be highlighted in your document

Above: Initial info at top - CREDIT - is from this sketchyTech post article dated Nov 16, 2012.

Backup of Notes - to Remind myself of stuff

Purpose: only to Remind myself of stuff. No Guarantees!! Regular Expressions can radically change your files. Great when it works as expected. Back up your stuff.

No Guarantees!

NON-GEEDY CAPTURE `*? +?`

Use *? or +? for non-greedy capture.

General observations - regex in Textastic, Workflow iOS apps

Regex [ ] ( ) { } are NOT escaped! You are lucky.

For multi-line search use [\s\S]* for the wildcard. in Textastic at least.

Some REGEX EXAMPLES

Keep in mind, in iOS apps with REGEX like Textastic or Workflow:

*? or +? are non-greedy.
Tokens [ ] ( ) { } are not escaped.
any escaped brackets below mean actual text brackets.
use [\s\S]* for crude multi-line search.
Sometimes Carrot ^ not catch line start in WorkFlow, use unicode [\u000A] instead. Catch it () for the replace to keep new line.

(if viewing this as plain text, please ignore the markdown formatting, hash, tic, and star for ex.)

Get a string with maximum 111 chars but ending on a word.

In Workflow, Match text

`\b.{1,111}\b`

and take the first of the match list

Trim string or line (workflow example)

Find:

`^\s(.?)\s*$`

Replace:

`$1`

Regex Lookahead (Workflow app)

get 130 chars of some text, but only enough to end in a space or the end of a string, thus not ending in the middle of a word.

MATCH TEXT
##^.{0,}(?=130| $)

Match Text returns a list of matches, you want the 1st match, being the shortened subset of text.

NOTE: Workflow, This only works with plain Text, not Rich text. Some plain Text copied to clipboard reports type Rich text although it is plain Text. There are essentially 2 types of text. Get Type of "Text" is plain & "Rich text" typically has formatting, but not always as mentionted. Types have Common word "Text" and "text" but case difference, Text vs text. To capture either, use a case insensitive Get Type check for lower case text, then put a "Get text from input", or Match Text [Tt]ext then Get Object of Class NSString to ensure it is plain Text before the lookahead Regex, this will also work for either types or text.

By the way, not relevant here, but if a Match Text list of results is zero, that is obviously no matches.

Trim tab from start of line in Workflow

sometimes carret ^ not work in workflow.

Find:

`(\u000A)\t`

Replace:

`$1`

regex Exif timestamp to pretty (workflow example)

From 2015:09:10 09:50:59 to 2015-09-10 Unnecessary but cool.

`^(\d{4}):(\d\d):(\d\d)\s(.*)$`

Replace:

`$1-$2-$3`

I extracted text from a PDF file, the footnote numbers should be in brackets.
Find a footnote in NHL text, example:
find text [28]

`\[([0-9])([0-9])\]`

FIND a footnote adjacent to a period, example text
.[28]
then (REPLACE Action) add a space between them, ex.
. [28]

Note the text period is escaped, but regex dot would not be.

Search:

`\.\[([0-9])([0-9])\]`

Replace:

`. \[$1$2\]`

Find a comma adjacent to a letter, ex ,x then insert a space between them, like , x

NOTE: careful, this ALSO CATCHES NUMBERS, WATCH FOR LEGIT COMMAS IN NUMBERS 7,000

Search:

`,(\w)`

Replace: ?????

`, $1`

Find unbracketed footnote AT START OF LINE, ex 28 (we know it needs brackets).
In other words, find two digits at the start of a line. I manually did the replace, these are the search:

`^([0-9][0-9]) (\w\w)`

`^([0-9][0-9])`

A Multiline search `[\s\S]*` example

Search:

`(Thread 0 name)[\s\S]*(^Thread 2 name)`

Replace:

`$2`

You just wiped out a lot of text leaving only the last part.

Blank the rest of a massive file maybe tip.

Example, you get a massive html file, you do not need it all.

Put an obvious word phrase like __crap__
at a location to use as a very easy start for multiline pattern to blank the rest of file.

Search:

`(crap)[\s\S]*`

Replace:

`$1`

WARNING THAT WILL BLANK ALL TO THE END OF FILE

find literally "`\n`" (without the quotes) and replace them with actual new lines.

yes, Search for text that is a backslash n: \n

Note: below i did use the symbol \n in replace new line feed as opposed to \r simply because \n did make new lines. Maybe \r would have made new lines too, I don't know. (If it's an issue i can normalize line breaks later).

FIND

`(\\n)`

REPLACE WITH

`\n`

ADD MORE EXAMPLES

REGEX1
USING TEXTASTIC

SEARCH: a line start followed by 4 spaces, there are four spaces in the empty paren:

`(^)( )`

REPLACE: change that to a line start followed by TAB

`$1\t`

Next task, How do you replace each occurrence of ( ) with equal number of tabs, in one single regex expression? -- I DO NOT KNOW.
So, Eventually, i manually searched for, up to 14 groups of 4 spaces, replacing with $1 all the way up to $14, and worked backward.

After that some crap spaces was left over before first non-space.
SEARCH

`(^)(\t)( )+(\S)`

REPLACE

`$1$2$4`

However...

Then the text part of the page was indented 2 tabs too much.
SEARCH

`(^)(\t{4})(\t{2})`

REPLACE

`$1$2`

(oh how i wish for desktop "reformat code" functions, like gVim, etc. in this iOS app).
Stuff in head of html file needs at least 1 tab (formatting), pressed replace (in search replace popup) for each individual match until i got to body tag.

`(^)(\S)`

REPLACE

`$1\t$2`

Then inside text part, another tag pushed it out excessively.
SEARCH

`(^)(\t{6})(\t+)(rel)`

REPLACE

`$1$2$4`

Textastic, put an empty line before a script tag where none exist.

2 SCENARIOS:

SCENARIO 1

find a starting script tag that is not at the start of a line. Make it the start of a line with an extra empty line above it.

example find this text:
</ul><script type="text/javascript">

`([^\n])(<script)`

REPLACE

`$1\n\n$2`

Example result text after replace action:
</ul>

<script type="text/javascript">

SCENARIO 2

This is for if script tag is at start of line but there is not an empty line above it.

`([^\n])(\n)(<script)`

REPLACE

`$1$2$2$3`

place a line break after a closing script tag.
look for closing script tag followed by a non-line break, example find text,
</script></div>

`(</script>)([^\n])`

replace with 2 line breaks, so you get an empty line after the script.
REPLACE

`$1\n\n$2`

Result example:
</script>

</div>

NOTE: if u run REGEX that places blank line before script, THEN run this RegEx, you get what you want (a blank line before and after the script block) -- as a kludge.

Since this one's search does not check for closing script tag that does have a line break, but not if the desired following blank line is present. In other words:
</script>
<script>

You would want a blank line there after the closing tag.

But apparently in textastic, we can only check at one time for the </script></div> in one run, and the
</script>
<script>
in another seperate search, in other words...

We only want to check for how many line breaks are there and so we know how many we need to add to get the number of line breaks we want. Apparently, you can not check for the number of line breaks being none or 1, and wanting 2 of \n , for knowing how many to add to get it, does not seem to work in textastic.

Put a blank line breaks in between these lines

(Notes much of the whitespace are tabs.

`(\t\t\t\w.*\n)(\w)`

THIS SEARCH WORKED TOO

`(\t\w.*\n)(\w)`

REPLACE

`$1\n$2`

SAMPLE:
Start text you have to see this in plain text (or gitbub markdown tripple tic style is ok) to get the point

go			https://duckduckgo.com/?q=%21%20%s														Open first result (DuckDuckGo)
b			https://duckduckgo.com/?q=%21%s															Bang search (DuckDuckGo)
grep			https://www.cueup.com/?q=%s&fq=1														Greplin

End up with:

go			https://duckduckgo.com/?q=%21%20%s														Open first result (DuckDuckGo)

b			https://duckduckgo.com/?q=%21%s															Bang search (DuckDuckGo)`

grep			https://www.cueup.com/?q=%s&fq=1														Greplin

Put some markup asterisks around the name of the book

Note Editorial can do this automatically for you

From this

~ Rainbows End by Vernor Vinge
Glasshouse by Charles Stross

To this

*~ Rainbows End* by Vernor Vinge
Glasshouse by Charles Stross

`\* ([A-Z ~']*) by`

REPLACE

`* $1 by`

Add a tab for code indentstion purposes

`^\t([^\t])`

REPLACE

`\t\t$1`

Example: ( tabs do not show always in MD examples)

`	$someVar `

to   
`		$someVar`

IN WORKFLOW APP: Simplify timestamp into a simpler text.

Clipboard will give name which get name will report text like a times stamp, sometimes a real extension is included, if so use that extension in the replace. Example starting text like Clipboard Apr 16, 2015, 12.12 PM or Clipboard Apr 16, 2015, 1.08 PM AND sometime with useful extension Clipboard Apr 16, 2015, 2.00 PM.txt.

example find text like this:
Clipboard Apr 16, 2015, 12.12 PM
Clipboard Apr 16, 2015, 1.08 PM
**AND sometimes like **
Clipboard Apr 16, 2015, 2.00 PM.txt

`(^Clip.*\.\d\d+\s[AP]M)([.].{2,4})?`

REPLACE

`Clipboard$2`

Example result text after replace action:
Clipboard
Clipboard.txt

Workflow app. Basic url pattern matches.

Basic match text for a URL pattern:

`(^http[s]?[:/]{3})`

FYI, note you CAN use a colon : unescaped
^https?:[/]{2}

`(^https?:[/]{2})`

Match text for a .jpg URL pattern:

`(^http[s]?[:/]{3}.*[.]jp[e]?g)`

Workflow app, add line-breaks between paragraphs.

After end of a text paragraphs, just after punctuation as place to do it.

`(([?!.,":':\w)”])(\n)`

REPLACE

`$1$2$2$3`

WorkFlow, to recognize beginning of line, use Unicode instead of ^ Regex.

`[\u000A]`

In practice, if you replace must capture it with ([\u000A]) and use in replace.

(Note: if you do a replace, and want to keep this line as its own line, you must capture that with ([\u000A]) and put it first thing in your replace.)

`([\u000A])`

Remove one tab from beginning of line, Regex:

SEARCH:

`([\u000A])\t`

REPLACE:

`$1 whatever`

These Workflow App file size formatting is obsoleted

IN WORKFLOW APP. quick format size.

From an un-rounded, unformatted number then use replace text to format, can roughly format all numbers in a text box this way.

Example, first via workflow do Raw image size: 1462321 , divide by 1000, then:

example starting text:
1462.321

example ending text: why, u can format 5 unrounded numbers at one time like this
1,462

`(\d)(\d{3})(\.[0-9].*)`

REPLACE

`$1,$2`

REGEX SUBSTITUTE B in workflow app

Displaying item SIZE as kilobytes in a workflow. Can take as many as 4 separate steps. Reduce it to 2 steps:

Here we use 2 steps.

1st: FORMAT raw size To 1 digit. example below.

2nd: use Replace Text:
SEARCH:

`(,\d\d\d\.\d)$`

REPLACE:

`k`

2nd: better use Replace Text:
SEARCH:

`(,\d{3}.\d)(\s|$)`

REPLACE:

`k$2`

Example start text:
5,618,706.0
Desired finish text:
5,618k

This is variant B. Typically, item size is the only number you want divided by 1,000 prior to display, to show it in k. The two above steps replace of these four workflow steps:

to divide by 1000
and round to zero digits simply to chomp the decimals
and format number
and add a k

It is not trivial to do this perfectly with one workflow regex, given an unknown number of digits AND wanting to place the letter k simultaneously. Here, Format step takes care of all commas, and ensuring a known trailing pattern which we replace with the letter k.

REGEX SUBSTITUTE A in workflow app

Strip last 3 digits off a series of digits. Use Replace Text (the replace is blank):

SEARCH:

`(\d\d\d)$`

REPLACE:

Example start text:
5618706
Desired end text:
5618

Saves one step as it substitutes for these two workflow steps:

divide by 1000
and round to zero digits to chomp the decimals.

This is variant A perhaps useful regards format size of item as kilobytes in workflow using fewer steps.

Replace only the size item in existing pic data , like if you made jpg, size is smaller.

    START
    976k 1,887 x 677
    REGEX
    ^([.\d,]+[k]\b)
    REPLACE:
    With new size

Textastic, MOVE THE CLOSING SPAN TO AFTER THE PRICE, CATCHING PRICES THAT MAY INCUDE CENTS

`</span>.? .? .? \s+(\d+)(\.?)(\d?\d?)</p>`

REPLACE

`    $1$2$3</span></p>`

incomplete Example regex i tried for this

EXAMPLE 1 FROM, TO

`<span class="style_1">Insalata di Cesare</span>    8</p>`

`<span class="style_1">Insalata di Cesare    8</span></p>`

This next price had cents

EXAMPLE 2 FROM

`<span class="style_1">Garlic Bread</span>    5.50</p>`

EXAMPLE 2 TO

`<span class="style_1">Garlic Bread    5.50</span></p>`

add better examples

Angles/Regex-In-Textastic.md

REGEX IN TEXTASTIC

and Workflow iOS apps

([0-9])

Number $1

Backup of Notes - to Remind myself of stuff

No Guarantees!

NON-GEEDY CAPTURE *? +?

General observations - regex in Textastic, Workflow iOS apps

Some REGEX EXAMPLES

Get a string with maximum 111 chars but ending on a word.

\b.{1,111}\b

Trim string or line (workflow example)

^\s*(.*?)\s*$

$1

Regex Lookahead (Workflow app)

Trim tab from start of line in Workflow

(\u000A)\t

$1

regex Exif timestamp to pretty (workflow example)

^(\d{4}):(\d\d):(\d\d)\s(.*)$

$1-$2-$3

\[([0-9])([0-9])\]

\.\[([0-9])([0-9])\]

. \[$1$2\]

,(\w)

, $1

^([0-9][0-9]) (\w\w)

^([0-9][0-9])

A Multiline search [\s\S]* example

(Thread 0 name)[\s\S]*(^Thread 2 name)

$2

Blank the rest of a massive file maybe tip.

(__crap__)[\s\S]*

$1

find literally "\n" (without the quotes) and replace them with actual new lines.

(\\n)

\n

(^)( )

$1\t

(^)(\t)( )+(\S)

$1$2$4

(^)(\t{4})(\t{2})

$1$2

(^)(\S)

$1\t$2

(^)(\t{6})(\t+)(rel)

$1$2$4

SCENARIO 1

([^\n])(<script)

$1\n\n$2

SCENARIO 2

([^\n])(\n)(<script)

$1$2$2$3

(</script>)([^\n])

$1\n\n$2

Put a blank line breaks in between these lines

(\t\t\t\w.*\n)(\w)

(\t\w.*\n)(\w)

$1\n$2

Put some markup asterisks around the name of the book

\* ([A-Z ~']*) by

* *$1* by

Add a tab for code indentstion purposes

^\t([^\t])

\t\t$1

IN WORKFLOW APP: Simplify timestamp into a simpler text.

(^Clip.*\.\d\d+\s[AP]M)([.].{2,4})?

Clipboard$2

Workflow app. Basic url pattern matches.

(^http[s]?[:/]{3})

(^https?:[/]{2})

(^http[s]?[:/]{3}.*[.]jp[e]?g)

Workflow app, add line-breaks between paragraphs.

(([?!.,":':\w)”])(\n)

$1$2$2$3

[\u000A]

([\u000A])

Remove one tab from beginning of line, Regex:

([\u000A])\t

`([0-9])`

`Number $1`

NON-GEEDY CAPTURE `*? +?`

`\b.{1,111}\b`

`^\s(.?)\s*$`

`$1`

`(\u000A)\t`

`$1`

`^(\d{4}):(\d\d):(\d\d)\s(.*)$`

`$1-$2-$3`

`\[([0-9])([0-9])\]`

`\.\[([0-9])([0-9])\]`

`. \[$1$2\]`

`,(\w)`

`, $1`

`^([0-9][0-9]) (\w\w)`

`^([0-9][0-9])`

A Multiline search `[\s\S]*` example

`(Thread 0 name)[\s\S]*(^Thread 2 name)`

`$2`

`(crap)[\s\S]*`

`$1`

find literally "`\n`" (without the quotes) and replace them with actual new lines.

`(\\n)`

`\n`

`(^)( )`

`$1\t`

`(^)(\t)( )+(\S)`

`$1$2$4`

`(^)(\t{4})(\t{2})`

`$1$2`

`(^)(\S)`

`$1\t$2`

`(^)(\t{6})(\t+)(rel)`

`$1$2$4`

`([^\n])(<script)`

`$1\n\n$2`

`([^\n])(\n)(<script)`

`$1$2$2$3`

`(</script>)([^\n])`

`$1\n\n$2`

`(\t\t\t\w.*\n)(\w)`

`(\t\w.*\n)(\w)`

`$1\n$2`

`\* ([A-Z ~']*) by`

`* $1 by`

`^\t([^\t])`

`\t\t$1`

`(^Clip.*\.\d\d+\s[AP]M)([.].{2,4})?`

`Clipboard$2`

`(^http[s]?[:/]{3})`

`(^https?:[/]{2})`

`(^http[s]?[:/]{3}.*[.]jp[e]?g)`

`(([?!.,":':\w)”])(\n)`

`$1$2$2$3`

`[\u000A]`

`([\u000A])`

`([\u000A])\t`

`$1 whatever`

`(\d)(\d{3})(\.[0-9].*)`

`$1,$2`

`(,\d\d\d\.\d)$`

`k`

`(,\d{3}.\d)(\s|$)`

`k$2`

`(\d\d\d)$`

`</span>.? .? .? \s+(\d+)(\.?)(\d?\d?)</p>`

` $1$2$3</span></p>`

`<span class="style_1">Insalata di Cesare</span> 8</p>`

`<span class="style_1">Insalata di Cesare 8</span></p>`

`<span class="style_1">Garlic Bread</span> 5.50</p>`

`<span class="style_1">Garlic Bread 5.50</span></p>`