python - Matching everything after series of hyphens -
i'm trying capture remaining text in file after 3 hyphens @ start of line (---
).
example:
above first set of hyphens should not captured. --- content. should captured. sets of 3 hyphens beyond point should ignored.
everything after first set of 3 hyphens should captured. closest i've gotten using regex [^(---)]+$
works slightly. capture after hyphens, if user places hyphens after point instead captures after last hyphen user placed.
i using in combination python capture text.
if can me sort out regex problem i'd appreciate it.
pat = re.compile(r'(?ms)^---(.*)\z')
the (?ms)
adds multiline
, dotall
flags.
the multiline
flag makes ^
match beginning of lines (not beginning of string.) need because ---
occurs @ beginning of line, not beginning of string.
the dotall
flag makes .
match character, including newlines. need (.*)
can match more 1 line.
\z
matches end of string (as opposed end of line).
for example,
import re text = '''\ above first set of hyphens should not captured. --- content. should captured. sets of 3 hyphens beyond point should ignored. ''' pat = re.compile(r'(?ms)^---(.*)\z') print(re.search(pat, text).group(1))
prints
this content. should captured. sets of 3 hyphens beyond point should ignored.
note when define regex character class brackets, [...]
, stuff inside brackets (in general, except hyphenated ranges a-z
) interpreted single characters. not patterns. [---]
not different [-]
. in fact, [---]
range of characters -
-
, inclusive.
the parenthese inside character class interpreted literal parentheses too, not grouping delimiters. [(---)]
equivalent [-()]
, character class including hyphen , left , right parentheses.
thus character class [^(---)]+
matches character other hyphen or parentheses:
in [23]: re.search('[^(---)]+', 'foo - bar').group() out[23]: 'foo ' in [24]: re.search('[^(---)]+', 'foo ( bar').group() out[24]: 'foo '
you can see going, , why not work problem.
Comments
Post a Comment