x
match character 'x' literallyC
match character 'C' literally.
any character (but invisible line breaks)\.
match '.' literally\\
match character '\' literally/
match character '/' literally.*
repeat any character except '\n', zero or more times. ?
a question mark is used for an optional match, e.g., ab?c
matches to abc
and ac
.[0-5]
match any char in 0 to 5 (bracket expression)[a-g]
match any char in {a,b,c,d,e,f,g}[^0-2]
-Matches all execpt 0, 1, 2 (here ^
is the inverse)$
end of line+
-Quantifier: Matches one or more of the preceding tokens[0-9]+
repeat any char in 0 to 9, one or more times^
- Start of a line, but also inverse in combination with [ ]
- see above\<
: the begin of a word. "words" are separated by whitespaces.\>
: the end of a word.Here, the solution should be in extended regular expression , see e.g. https://www.regular-expressions.info/posix.html.
Note: \d
nor \w
work in POSIX regular expressions
(see https://www.regular-expressions.info/posixbrackets.html for POSIX Bracket Expressions.)
To avoid "false positives" (matches that shouldn't match), a regular expression should be as specific as possible.
# don't modify this cell!
should_match(){
regEx=$1
for string in "${@:2}"
do
if [[ ! ($string =~ $regEx) ]]; then
echo "Error: '$string' don't match to the regex '$regEx', but should match!"
else
echo "OK: '$string' match to the regex."
fi
done
}
# don't modify this cell!
should_not_match(){
regEx=$1
for string in "${@:2}"
do
if [[ ($string =~ $regEx) ]]; then
echo "Error: '$string' match to the regex '$regEx', but should not match!"
else
echo "OK: '$string' don't match to the regex."
fi
done
}
regex='YOUR_REG_EX_HERE' # replace the regex string
should_match "$regex" xyz xyzab abxyzcd
should_not_match "$regex" cab abc xcd yz xz
OK: 'xyz' match to the regex. OK: 'xyzab' match to the regex. OK: 'abxyzcd' match to the regex. OK: 'cab' don't match to the regex. OK: 'abc' don't match to the regex. OK: 'xcd' don't match to the regex. OK: 'yz' don't match to the regex. OK: 'xz' don't match to the regex.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
should_match "$regex" 8 "var 43" "z=7" 123 62 11 2i 7 34 5z a73 09 r7a25r 342
should_not_match "$regex" abcd ztd xyz one
OK: '8' match to the regex. OK: 'var 43' match to the regex. OK: 'z=7' match to the regex. OK: '123' match to the regex. OK: '62' match to the regex. OK: '11' match to the regex. OK: '2i' match to the regex. OK: '7' match to the regex. OK: '34' match to the regex. OK: '5z' match to the regex. OK: 'a73' match to the regex. OK: '09' match to the regex. OK: 'r7a25r' match to the regex. OK: '342' match to the regex. OK: 'abcd' don't match to the regex. OK: 'ztd' don't match to the regex. OK: 'xyz' don't match to the regex. OK: 'one' don't match to the regex.
The dot wildcard metacharacter .
(or ?
) match any single character (letter, digit, whitespace, everything).
This overrides the matching of the period character .
. To match a period, you need to escape the dot by using a slash \.
. In general, to match chars that have a special meaning literally, we need to escape them with a \
.
Write a RegEx that matches only if there is a period .
in the text.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
# contains a period?
should_match "$regex" "xyz." "*=-." "179." "17.9." "17.9"
should_not_match "$regex" "xyz1" "*=-:" "1797"
OK: 'xyz.' match to the regex. OK: '*=-.' match to the regex. OK: '179.' match to the regex. OK: '17.9.' match to the regex. OK: '17.9' match to the regex. OK: 'xyz1' don't match to the regex. OK: '*=-:' don't match to the regex. OK: '1797' don't match to the regex.
Write a RegEx that matches only if there is an "a" followed by an arbitrary character and then a "c", i.e, exact one char between "a" and "c":
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
# exact one char between "a" and "c"
should_match "$regex" "axc" 'a!c' "aqc" "aßc" "paoc" "paoci" "opa3cip"
should_not_match "$regex" "ac" "avvc" "agsec" "agsec" "agsec" "wagsec" "cxa"
OK: 'axc' match to the regex. OK: 'a!c' match to the regex. OK: 'aqc' match to the regex. OK: 'aßc' match to the regex. OK: 'paoc' match to the regex. OK: 'paoci' match to the regex. OK: 'opa3cip' match to the regex. OK: 'ac' don't match to the regex. OK: 'avvc' don't match to the regex. OK: 'agsec' don't match to the regex. OK: 'agsec' don't match to the regex. OK: 'agsec' don't match to the regex. OK: 'wagsec' don't match to the regex. OK: 'cxa' don't match to the regex.
Write a RegEx that matches only the string "question?" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "question?" "any question?"
should_not_match "$regex" "questions" "any questions?"
OK: 'question?' match to the regex. OK: 'any question?' match to the regex. OK: 'questions' don't match to the regex. OK: 'any questions?' don't match to the regex.
Write a RegEx that matches only the string "abc\.xyz" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the regex string
should_match "$regex" 'abc\.xyz' 'abc\.xyz is a strange thing'
should_not_match "$regex" 'abc.xyz' 'abcd.xyz' 'abc..xyz'
OK: 'abc\.xyz' match to the regex. OK: 'abc\.xyz is a strange thing' match to the regex. OK: 'abc.xyz' don't match to the regex. OK: 'abcd.xyz' don't match to the regex. OK: 'abc..xyz' don't match to the regex.
The pattern '[xyz]' will only match a single x, y, or z letter and nothing else.
Write a RegEx that matches only the strings "can" "man" "fan" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
should_match "$regex" can man fan "Who can?" woman
should_not_match "$regex" han san ban cax
OK: 'can' match to the regex. OK: 'man' match to the regex. OK: 'fan' match to the regex. OK: 'Who can?' match to the regex. OK: 'woman' match to the regex. OK: 'han' don't match to the regex. OK: 'san' don't match to the regex. OK: 'ban' don't match to the regex. OK: 'cax' don't match to the regex.
Square brackets [
,]
and the hat ^
(first element inside square brackets) for excluding. Examples:
[^ac]
matches to all chars but not to a
and c
.[^A-Z]
matches to all chars but not to capital letters. Write a RegEx that should match an arbitrary char followed by "an", but it should not match to "han", "san" and "ban". Or "an" at the beginning.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string using square brackets and the hat
should_match "$regex" can man fan lan dan 6an tanxx # etc.
should_not_match "$regex" han san ban an hangover # not an "h","s" or "b" before "an"
echo
# to control your solution: you should use square brackets and the hat!
should_match '\[\^.*\]' "$regex"
OK: 'can' match to the regex. OK: 'man' match to the regex. OK: 'fan' match to the regex. OK: 'lan' match to the regex. OK: 'dan' match to the regex. OK: '6an' match to the regex. OK: 'tanxx' match to the regex. OK: 'han' don't match to the regex. OK: 'san' don't match to the regex. OK: 'ban' don't match to the regex. OK: 'an' don't match to the regex. OK: 'hangover' don't match to the regex. OK: '[^hsb]an' match to the regex.
For the following exercises try to write an specific pattern that matches resp. don't matches the example strings.
[x-z]
match x, y or z.
Write a regex that matches first to an A, B,.., E. Then an arbitrary char followed again by an A, B,..,or E.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use char-range two times!
should_match "$regex" AnA BoB CpC AxC BwA BDC BxA EdE DwE "blaDwE "
should_not_match "$regex" aax bby ccC Aay Cpy bob Bob anA AaF FaA
# to control your solution: use "[ .. ]" two times
should_match '\[.*\].\[.*\]' "$regex"
OK: 'AnA' match to the regex. OK: 'BoB' match to the regex. OK: 'CpC' match to the regex. OK: 'AxC' match to the regex. OK: 'BwA' match to the regex. OK: 'BDC' match to the regex. OK: 'BxA' match to the regex. OK: 'EdE' match to the regex. OK: 'DwE' match to the regex. OK: 'blaDwE ' match to the regex. OK: 'aax' don't match to the regex. OK: 'bby' don't match to the regex. OK: 'ccC' don't match to the regex. OK: 'Aay' don't match to the regex. OK: 'Cpy' don't match to the regex. OK: 'bob' don't match to the regex. OK: 'Bob' don't match to the regex. OK: 'anA' don't match to the regex. OK: 'AaF' don't match to the regex. OK: 'FaA' don't match to the regex. OK: '[A-E].[A-E]' match to the regex.
Examples:
e{4,}
at least four 'e'.[ab]{3,4}
matches e.g. 'abb', 'baa', 'abab', 'bbbb', but not 'ab' or 'abxa'. Write a regex that matches first to "wa" followed by 3 or 4 "z" followed by "up".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use curly braces notation.
should_match "$regex" wazzzzup wazzzup
should_not_match "$regex" wazzup wazup wazzzzzup blablub
echo
# to control your solution: it has to use curly brackets
should_match '\{.,.\}' "$regex"
OK: 'wazzzzup' match to the regex. OK: 'wazzzup' match to the regex. OK: 'wazzup' don't match to the regex. OK: 'wazup' don't match to the regex. OK: 'wazzzzzup' don't match to the regex. OK: 'blablub' don't match to the regex. OK: 'waz{3,4}up' match to the regex.
Write a regex that matches first to "wa" followed by 0,1,2 or 3 "z" followed by "up".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use curly braces notation.
# at most three z
should_match "$regex" wazzup wazup waup wazzzup
should_not_match "$regex" wazzzzup wazzzzzup wazzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzup blablub
# to control your solution: use curly brackets
should_match '\{,.\}' "$regex"
OK: 'wazzup' match to the regex. OK: 'wazup' match to the regex. OK: 'waup' match to the regex. OK: 'wazzzup' match to the regex. OK: 'wazzzzup' don't match to the regex. OK: 'wazzzzzup' don't match to the regex. OK: 'wazzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzup' don't match to the regex. OK: 'blablub' don't match to the regex. OK: 'waz{,3}up' match to the regex.
*
: Zero or more, e.g. ab*c
matches 'abc', 'abbbc' and 'ac'.+
: One or more, e.g. ab+c
matches 'abc', 'abbbc', but not 'ac'.Write a regex for:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use Kleene Star and Kleene Plus
# Don't use curly braces notation!
should_match "$regex" aaaabcc aabbbc aaaacccc blaacdc
should_not_match "$regex" a ac aaaabb aaabb
OK: 'aaaabcc' match to the regex. OK: 'aabbbc' match to the regex. OK: 'aaaacccc' match to the regex. OK: 'blaacdc' match to the regex. OK: 'a' don't match to the regex. OK: 'ac' don't match to the regex. OK: 'aaaabb' don't match to the regex. OK: 'aaabb' don't match to the regex.
?
for an optional char, e.g. ab?c
matches 'ac' and 'abc'.Write a Regex that matches to a number followed by " file found?" or "files found?".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "1 file found?" "3 files found?" "message: 24 files found?"
should_not_match "$regex" "No files found." "message: no file found?" "3 files found"
OK: '1 file found?' match to the regex. OK: '3 files found?' match to the regex. OK: 'message: 24 files found?' match to the regex. OK: 'No files found.' don't match to the regex. OK: 'message: no file found?' don't match to the regex. OK: '3 files found' don't match to the regex.
`" (space),
\t(tab) or
\n` (new line)[:space:]
POSIX character class, used inside a bracket expression, e.g. [[:space:]]
. \s
don't work in POSIX regular expressions.regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
At least a number followed by a dot "." and at least a space followed again by "xyz".
should_match "$regex" "1. xyz" "2. xyz" "3. xyz"
should_not_match "$regex" "1.xyz" ". xyz" "3. xz"
OK: '1. xyz' match to the regex. OK: '2. xyz' match to the regex. OK: '3. xyz' match to the regex. OK: '1.xyz' don't match to the regex. OK: '. xyz' don't match to the regex. OK: '3. xz' don't match to the regex.
A whitespace in the middle of the string:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "d xyz" "we xyz " "dgg xz"
should_not_match "$regex" "xyz " "daADahga" " adaagXyz"
OK: 'd xyz' match to the regex. OK: 'we xyz ' match to the regex. OK: 'dgg xz' match to the regex. OK: 'xyz ' don't match to the regex. OK: 'daADahga' don't match to the regex. OK: ' adaagXyz' don't match to the regex.
^
(hat), e.g., ^bla
matches to "blase
" but not to "nabla-operator
". $
(dollar sign), bla$
matches to "nabla
" but not to "blase
". Note that this is different than the hat inside a set of bracket [^...]
for excluding characters.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "Mission: successful"
should_not_match "$regex" "Last Mission: successful" "Mission: successful upon capture of target"
OK: 'Mission: successful' match to the regex. OK: 'Last Mission: successful' don't match to the regex. OK: 'Mission: successful upon capture of target' don't match to the regex.
No whitespace in the middle of the string:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "xyz " "daADahga" " adaagXyz"
should_not_match "$regex" "d xyz" "we xyz " "dgg xz" " bla bla"
OK: 'xyz ' match to the regex. OK: 'daADahga' match to the regex. OK: ' adaagXyz' match to the regex. OK: 'd xyz' don't match to the regex. OK: 'we xyz ' don't match to the regex. OK: 'dgg xz' don't match to the regex. OK: ' bla bla' don't match to the regex.
\<
match the begin of a word. \>
match the end of a word.Write a regex with "man" at the begin of a word.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "A man was here." "... that the mankind survive."
should_not_match "$regex" "A woman was here." "... calling for humans .."
# to control your solution comment it out
#should_match '\\<' "$regex"
OK: 'A man was here.' match to the regex. OK: '... that the mankind survive.' match to the regex. OK: 'A woman was here.' don't match to the regex. OK: '... calling for humans ..' don't match to the regex.
Write a regex with "man" at the end of a word.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "A man was here." "A woman was here."
should_not_match "$regex" "... calling for humans .." "... that the mankind survive."
# to control your solution
should_match '\\>' "$regex"
OK: 'A man was here.' match to the regex. OK: 'A woman was here.' match to the regex. OK: '... calling for humans ..' don't match to the regex. OK: '... that the mankind survive.' don't match to the regex. OK: 'man\>' match to the regex.
(
.. )
will be captured as a group.# don't modify this
groups_should_match(){
regEx=$1
groups=$2
for string in "${@:3}"; do
if [[ ! ($string =~ $regEx) ]]; then
echo "Error: '$string' don't match to '$regEx', but should match!"
else
# we need to store BASH_REMATCH in another variable, because
# we do a nested regex-match which overrides the content of BASH_REMATCH
match=("${BASH_REMATCH[@]}") # copy of an array
i=1
for group_match in ${groups[@]}; do
if [[ ! ${match[$i]} =~ $group_match ]]; then
echo "Error: Group $i don't match to '${match[$i]}'."
fi
((i=i+1))
echo extracted: ${BASH_REMATCH[@]}
done
echo
fi
done
}
regex='YOUR_REG_EX_HERE' # replace the regex string.
Your solution should match all "pdf"-files (files with suffix ".pdf") that begin
with "file". Extract the filename without the suffix .pdf
inside the first group.
groups=('^file.*')
groups_should_match $regex $groups "file_record_transcript.pdf" "file_07241999.pdf"
should_not_match "$regex" "file_fake.pdf.tmp" "starts_not_with_file.pdf"
extracted: file_record_transcript extracted: file_07241999 OK: 'file_fake.pdf.tmp' don't match to the regex. OK: 'starts_not_with_file.pdf' don't match to the regex.
Your solution should match all "txt"-files (files with suffix .txt
) that begin
with file
. After file
there should be arbitrary chars followed by an _
and a number.
.txt
in the third group.E.g. file_record_transcript_66.txt
should give the following groups:
file_record_
66
.txt
regex='YOUR_REG_EX_HERE' # replace the regex string.
groups=('^file.*' '^[[:digit:]]+$' '^\.txt$')
groups_should_match $regex $groups "file_record_transcript_66.txt" "file_a_7_and_more_07241999.txt"
should_not_match "$regex" "file_fake.txt.tmp" "starts_not_with_file.txt"
extracted: file_record_transcript_ extracted: 66 extracted: .txt extracted: file_a_7_and_more_ extracted: 07241999 extracted: .txt OK: 'file_fake.txt.tmp' don't match to the regex. OK: 'starts_not_with_file.txt' don't match to the regex.
(cats|dogs)
can be used to match 'cats' or 'dogs'.regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" 'I love cats' 'I love bats' 'I love dogs' 'I love hogs'
should_not_match "$regex" 'I love rats' 'I love rogs' 'I love vogs' 'I love mats'
OK: 'I love cats' match to the regex. OK: 'I love bats' match to the regex. OK: 'I love dogs' match to the regex. OK: 'I love hogs' match to the regex. OK: 'I love rats' don't match to the regex. OK: 'I love rogs' don't match to the regex. OK: 'I love vogs' don't match to the regex. OK: 'I love mats' don't match to the regex.