DV8FromTheWorld · December 22, 2023 17:16
diff --git a/italics-regex-explanation.js b/italics-regex-explanation.js
 // Background:
 //   Word boundaries are 0 width assertions that are between \w (word characters)
 //   and \W (non-word characters) or start/end of the string.
 //   So, in a scenario like "-abc" there are 2 word boundaries:
 //     1. between the '-' and the 'a'
 //     2. between the 'c' and the end of the string
 //
 // Regex explanation:
 // ^\\b_            -> Start matching by ensuring that we are at the beginning of the match, and that the
 //                       underscore we are using as the sentinel to start the italics boundary is proceeded
 //                       by a word boundary to ensure we aren't consuming an underscore from a previous word.
 // (...)            -> A capture group to capture the fully matched content that should be made italicized
 //   (?: ...)+?     -> A non-capturing group. This match is what captures all the content between
 //                       the 2 underscores that denote the italics boundaries.
 //
 //                       Given the trailing +?, this non-capturing group must match
 //                       AT LEAST one time, but it can match multiple times. This allows us to properly
 //                       consume as many underscores as we can. The trailing ? means that we will only
 //                       consume as much as we need while still allowing the closing underscore to be found
 //                       by the later _ match.
 //
 //                       Within the non-capturing group, we have multiple matchable patterns that
 //                       can match content.
 // -----------------------( (START) non-capturing group matching pattern branches ]------------------------
 //   _[_(]          -> Pattern which allows matching of __ and _(
 //                       The __ is needed so that the underline rule can work (__content__)
 //                       The _( is needed because we are using word boundaries (\b) to know when we should
 //                         and should't capture the last _. The ( character is considered a word boundary,
 //                         but it valid in URLs which is where _ is also used, so we don't want to treat
 //                         _( as a valid break case. Example: https://en.wikipedia.org/wiki/Endemic_(epidemiology)
 //   \\\\[\\s\\S]   -> Pattern which allows matching of '\{anyCharacter}'.
 //                       This allows for the markdown escape rule to work.
 //                       This also powers '\_' acting as an escaped underscore.
 //   (?<!_)\\B_\\B  -> Pattern which ensure that we do not prematurely break _within_ a word that contains
 //                       underscores while the word is in italics.
 //                       For example, without this rule, this does not render correctly: _my_cool_thing_
 //
 //                       So, to break this pattern down:
 //                         (?<!_)  -> A negative look behind pattern to ensure that the PREVIOUS character
 //                                      was NOT an underscore. This check exists to make sure that the
 //                                      underline rule will work.
 //                                      Without it, the following examples would fail:
 //                                        - __yo__
 //                                        - ~~_**__Google__**_~~
 //                         \\B     -> Assert that the the following underscore is part of an existing word
 //                                      by ensuring that the previous character and the next underscore do
 //                                      not share a word boundary. (i.e, they are part of one continuous word)
 //                         _       -> Match the _ character literally
 //                         \\B     -> Assert that the item after the underscore is not a word boundary. This ensures that
 //                                      the character that follows the underscore is a word character and thus
 //                                      we are continuing a word
 //    [^\\\\_]      -> This pattern captures ALL CHARACTERS that are not '\' or '_'
 //                       This is the pattern that captures all of the content between the 2 underscore
 //                       sentinels that act as boundaries for the italics.
 //
 //                       This pattern is designed to NOT capture '\' nor '_' characters because those are
 //                       are important characters that could define the end of the boundary or be being used
 //                       in an escape.
 //
 //                       As such, the patterns that come BEFORE this pattern are specializations
 //                       that include '\' or '_' in specialized ways to ensure those characters can be
 //                       captured in the italics when necessary as this pattern purposely doesn't capture them.
 // -----------------------[ (END) non-capturing group matching pattern branches ]------------------------
 // _                -> Match the _ character literally. This is the ending boundary for the italics.
 // (?! ...)         -> A negative lookahead. Make sure that the content immediately following the
 //                       literal _ character (that is acting as the ending boundary) does not match
 //                       the provided pattern
 //   [(]            -> A pattern to check if the immediate next character is the '(' character.
 //                       In conjunction with the negative look ahead, We are making sure that the
 //                       character immediately following the ending sentinel _ is NOT a '('.
 //
 //                       We need this to ensure that the '_[_(]' pattern from the above non-capturing group
 //                       can properly detect the _(. Without this, the outer pattern will match on
 //                       the _( before the non-capturing group can because the non-capturing group is
 //                       defined using the non-greedy '?' (which it needs to be to avoid capturing too match.
 // \\b              -> Lastly, similarly to the starting sentinel, make sure that the underscore we
 //                       captured as our ending sentinel is divided from other following content by
 //                       a word boundary so that we are not consuming part of the way into a word that
 //                       contains an underscore.
 "^\\b_((?:_[_(]|\\\\[\\s\\S]|(?<!_)\\B_\\B|[^\\\\_])+?)_(?![(])\\b" +
	// Background:
	// Word boundaries are 0 width assertions that are between \w (word characters)
	// and \W (non-word characters) or start/end of the string.
	// So, in a scenario like "-abc" there are 2 word boundaries:
	// 1. between the '-' and the 'a'
	// 2. between the 'c' and the end of the string
	//
	// Regex explanation:
	// ^\\b_ -> Start matching by ensuring that we are at the beginning of the match, and that the
	// underscore we are using as the sentinel to start the italics boundary is proceeded
	// by a word boundary to ensure we aren't consuming an underscore from a previous word.
	// (...) -> A capture group to capture the fully matched content that should be made italicized
	// (?: ...)+? -> A non-capturing group. This match is what captures all the content between
	// the 2 underscores that denote the italics boundaries.
	//
	// Given the trailing +?, this non-capturing group must match
	// AT LEAST one time, but it can match multiple times. This allows us to properly
	// consume as many underscores as we can. The trailing ? means that we will only
	// consume as much as we need while still allowing the closing underscore to be found
	// by the later _ match.
	//
	// Within the non-capturing group, we have multiple matchable patterns that
	// can match content.
	// -----------------------( (START) non-capturing group matching pattern branches ]------------------------
	// _[_(] -> Pattern which allows matching of __ and _(
	// The __ is needed so that the underline rule can work (__content__)
	// The _( is needed because we are using word boundaries (\b) to know when we should
	// and should't capture the last _. The ( character is considered a word boundary,
	// but it valid in URLs which is where _ is also used, so we don't want to treat
	// _( as a valid break case. Example: https://en.wikipedia.org/wiki/Endemic_(epidemiology)
	// \\\\[\\s\\S] -> Pattern which allows matching of '\{anyCharacter}'.
	// This allows for the markdown escape rule to work.
	// This also powers '\_' acting as an escaped underscore.
	// (?<!_)\\B_\\B -> Pattern which ensure that we do not prematurely break _within_ a word that contains
	// underscores while the word is in italics.
	// For example, without this rule, this does not render correctly: _my_cool_thing_
	//
	// So, to break this pattern down:
	// (?<!_) -> A negative look behind pattern to ensure that the PREVIOUS character
	// was NOT an underscore. This check exists to make sure that the
	// underline rule will work.
	// Without it, the following examples would fail:
	// - __yo__
	// - ~~___Google___~~
	// \\B -> Assert that the the following underscore is part of an existing word
	// by ensuring that the previous character and the next underscore do
	// not share a word boundary. (i.e, they are part of one continuous word)
	// _ -> Match the _ character literally
	// \\B -> Assert that the item after the underscore is not a word boundary. This ensures that
	// the character that follows the underscore is a word character and thus
	// we are continuing a word
	// [^\\\\_] -> This pattern captures ALL CHARACTERS that are not '\' or '_'
	// This is the pattern that captures all of the content between the 2 underscore
	// sentinels that act as boundaries for the italics.
	//
	// This pattern is designed to NOT capture '\' nor '_' characters because those are
	// are important characters that could define the end of the boundary or be being used
	// in an escape.
	//
	// As such, the patterns that come BEFORE this pattern are specializations
	// that include '\' or '_' in specialized ways to ensure those characters can be
	// captured in the italics when necessary as this pattern purposely doesn't capture them.
	// -----------------------[ (END) non-capturing group matching pattern branches ]------------------------
	// _ -> Match the _ character literally. This is the ending boundary for the italics.
	// (?! ...) -> A negative lookahead. Make sure that the content immediately following the
	// literal _ character (that is acting as the ending boundary) does not match
	// the provided pattern
	// [(] -> A pattern to check if the immediate next character is the '(' character.
	// In conjunction with the negative look ahead, We are making sure that the
	// character immediately following the ending sentinel _ is NOT a '('.
	//
	// We need this to ensure that the '_[_(]' pattern from the above non-capturing group
	// can properly detect the _(. Without this, the outer pattern will match on
	// the _( before the non-capturing group can because the non-capturing group is
	// defined using the non-greedy '?' (which it needs to be to avoid capturing too match.
	// \\b -> Lastly, similarly to the starting sentinel, make sure that the underscore we
	// captured as our ending sentinel is divided from other following content by
	// a word boundary so that we are not consuming part of the way into a word that
	// contains an underscore.
	"^\\b_((?:_[_(]\|\\\\[\\s\\S]\|(?<!_)\\B_\\B\|[^\\\\_])+?)_(?![(])\\b" +