I am pretty sure that I have found the solution (16 hours later), and it is significantly more complicated that I could ever have imagined. But, to hopefully help someone in the future, I'm going to explain how I got there...
Problem #1: How do you represent an "escaped character" in a Regular Expression?
Solution #1: Use positive look-back assertion on the existence of a single backslash prior to any character.
Example #1:- Code: Select all
>>> text=r'\f\o\o'
>>> regex = r'(.(?:(?<=\\).))'
>>> i = re.finditer(regex, text)
>>> for m in i: print m.groups()
...
('\\f',)
('\\o',)
('\\o',)
Problem #2: It is not possible to put the previous regex inside square brackets, so how then do you create a "set" of "escaped characters + non-terminators"?
Solution #2: Use the "?" on each "character" in the set.
Example #2:- Code: Select all
>>> text='{foo:bar}{a:\/foo\.bar}'
>>> regex = r'(\{(?:(?:(?<=\\).)?[^\{\}\[\]\/\.]?)+\})'
>>> i = re.finditer(regex, text)
>>> for m in i: print m.groups()
...
('{foo:bar}',)
('{a:\\/foo\\.bar}',)
Problem #3: How do you keep multiple occurrences from matching?
Solution #3: Explicit match using anchors. In other words, use "^" and "$" to bound your matching appropriately.
Example #3: (Note that I have to remove the duplicate in order to match now)
- Code: Select all
>>> text='{a:\/foo\.bar}'
>>> regex = r'^(\{(?:(?:(?<=\\).)?[^\{\}\[\]\/\.]?)+\})$'
>>> i = re.finditer(regex, text)
>>> for m in i: print m.groups()
...
('{a:\\/foo\\.bar}',)
Problem #4: How do you match practically any character that could occur within 2 slashes?
Solution #4: Exactly like before, except now, the only character we cant have in the middle is another plain slash.
Example #4:- Code: Select all
>>> text='/g[@#$%\/{foo:bar}\/^&*()]/{a:\/foo\.bar}'
>>> regex = r'(\/(?:(?:(?<=\\).)?[^\/]?)+\/)'
>>> i = re.finditer(regex, text)
>>> for m in i: print m.groups()
...
('/g[@#$%\\/{foo:bar}\\/^&*()]/',)
Problem #5: How do we avoid matching when two valid matches are split? Like "/foo/{a:b}/bar/"
Solution #5: Explicitly define the possible combinations of each regex subsection
Example #5:- Code: Select all
>>> text1='{a:\/foo\.bar}/foo\/bar/'
>>> text2='/foo\/bar/{a:\/foo\.bar}'
>>> regex=r'^(\{(?:(?:(?<=\\).)?[^\{\}\[\]\/\.]?)+\})(\/(?:(?:(?<=\\).)?[^\/]?)+\/)$'
>>> i = re.finditer(regex, text1)
>>> for m in i: print m.groups()
...
('{a:\\/foo\\.bar}', '/foo\\/bar/')
>>> regex=r'^(\/(?:(?:(?<=\\).)?[^\/]?)+\/)(\{(?:(?:(?<=\\).)?[^\{\}\[\]\/\.]?)+\})$'
>>> i = re.finditer(regex, text2)
>>> for m in i: print m.groups()
...
('/foo\\/bar/', '{a:\\/foo\\.bar}')
Add in a few loops for simplification of the combination-checking... and voila!
- Code: Select all
>>> sx = r'(\/(?:(?:(?<=\\).)?[^\/]?)+\/)'
>>> ax = r'(\{(?:(?:(?<=\\).)?[^\{\}\[\]\/\.]?)+\})'
>>> list = ["^"+ax+sx+"$", "^"+sx+ax+"$"]
>>> for regex in list:
... for text in [text1, text2]:
... i = re.finditer(regex, text)
... for m in i: print m.groups()
...
('{a:\\/foo\\.bar}', '/foo\\/bar/')
('/foo\\/bar/', '{a:\\/foo\\.bar}')