python - Matching everything after series of hyphens -
i'm trying capture remaining text in file after 3 hyphens @ start of line (---).
example:
above first set of hyphens should not captured. --- content. should captured. sets of 3 hyphens beyond point should ignored.
everything after first set of 3 hyphens should captured. closest i've gotten using regex [^(---)]+$ works slightly. capture after hyphens, if user places hyphens after point instead captures after last hyphen user placed.
i using in combination python capture text.
if can me sort out regex problem i'd appreciate it.
pat = re.compile(r'(?ms)^---(.*)\z') the (?ms) adds multiline , dotall flags.
the multiline flag makes ^ match beginning of lines (not beginning of string.) need because --- occurs @ beginning of line, not beginning of string.
the dotall flag makes . match character, including newlines. need (.*) can match more 1 line.
\z matches end of string (as opposed end of line).
for example,
import re text = '''\ above first set of hyphens should not captured. --- content. should captured. sets of 3 hyphens beyond point should ignored. ''' pat = re.compile(r'(?ms)^---(.*)\z') print(re.search(pat, text).group(1)) prints
this content. should captured. sets of 3 hyphens beyond point should ignored. note when define regex character class brackets, [...], stuff inside brackets (in general, except hyphenated ranges a-z) interpreted single characters. not patterns. [---] not different [-]. in fact, [---] range of characters - -, inclusive.
the parenthese inside character class interpreted literal parentheses too, not grouping delimiters. [(---)] equivalent [-()], character class including hyphen , left , right parentheses.
thus character class [^(---)]+ matches character other hyphen or parentheses:
in [23]: re.search('[^(---)]+', 'foo - bar').group() out[23]: 'foo ' in [24]: re.search('[^(---)]+', 'foo ( bar').group() out[24]: 'foo ' you can see going, , why not work problem.
Comments
Post a Comment