x
match character 'x' literallyC
match character 'C' literally.
any character (but invisible line breaks)\.
match '.' literally\\
match character '\' literally/
match character '/' literally.*
repeat any character except '\n', zero or more times. ?
a question mark is used for an optional match, e.g., ab?c
matches to abc
and ac
.[0-5]
match any char in 0 to 5 (bracket expression)[a-g]
match any char in {a,b,c,d,e,f,g}[^0-2]
-Matches all execpt 0, 1, 2 (here ^
is the inverse)$
end of line+
-Quantifier: Matches one or more of the preceding tokens[0-9]+
repeat any char in 0 to 9, one or more times^
- Start of a line, but also inverse in combination with [ ]
- see above\<
: the begin of a word. "words" are separated by whitespaces.\>
: the end of a word.Here, the solution should be in extended regular expression , see e.g. https://www.regular-expressions.info/posix.html.
Note: \d
nor \w
work in POSIX regular expressions
(see https://www.regular-expressions.info/posixbrackets.html for POSIX Bracket Expressions.)
To avoid "false positives" (matches that shouldn't match), a regular expression should be as specific as possible.
# don't modify this cell!
should_match(){
regEx=$1
for string in "${@:2}"
do
if [[ ! ($string =~ $regEx) ]]; then
echo "Error: '$string' don't match to the regex '$regEx', but should match!"
else
echo "OK: '$string' match to the regex."
fi
done
}
# don't modify this cell!
should_not_match(){
regEx=$1
for string in "${@:2}"
do
if [[ ($string =~ $regEx) ]]; then
echo "Error: '$string' match to the regex '$regEx', but should not match!"
else
echo "OK: '$string' don't match to the regex."
fi
done
}
regex='YOUR_REG_EX_HERE' # replace the regex string
should_match "$regex" xyz xyzab abxyzcd
should_not_match "$regex" cab abc xcd yz xz
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
should_match "$regex" 8 "var 43" "z=7" 123 62 11 2i 7 34 5z a73 09 r7a25r 342
should_not_match "$regex" abcd ztd xyz one
The dot wildcard metacharacter .
(or ?
) match any single character (letter, digit, whitespace, everything).
This overrides the matching of the period character .
. To match a period, you need to escape the dot by using a slash \.
. In general, to match chars that have a special meaning literally, we need to escape them with a \
.
Write a RegEx that matches only if there is a period .
in the text.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
# contains a period?
should_match "$regex" "xyz." "*=-." "179." "17.9." "17.9"
should_not_match "$regex" "xyz1" "*=-:" "1797"
Write a RegEx that matches only if there is an "a" followed by an arbitrary character and then a "c", i.e, exact one char between "a" and "c":
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
# exact one char between "a" and "c"
should_match "$regex" "axc" 'a!c' "aqc" "aßc" "paoc" "paoci" "opa3cip"
should_not_match "$regex" "ac" "avvc" "agsec" "agsec" "agsec" "wagsec" "cxa"
Write a RegEx that matches only the string "question?" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "question?"
should_not_match "$regex" "questions"
Write a RegEx that matches only the string "abc\.xyz" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the regex string
should_match "$regex" 'abc\.xyz'
should_not_match "$regex" 'abc.xyz' 'abcd.xyz' 'abc..xyz'
The pattern '[xyz]' will only match a single x, y, or z letter and nothing else.
Write a RegEx that matches only the strings "can" "man" "fan" and nothing else.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string
should_match "$regex" can man fan
should_not_match "$regex" han san ban cax
Square brackets [
,]
and the hat ^
(first element inside square brackets) for excluding. Examples:
[^ac]
matches to all chars but not to a
and c
.[^A-Z]
matches to all chars but not to capital letters. Write a RegEx that should match an arbitrary char followed by "an", but it should not match to "han", "san" and "ban".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string using square brackets and the hat
should_match "$regex" can man fan lan dan 6an tanxx # etc.
should_not_match "$regex" han san ban an # not an "h","s" or "b" before "an"
# to control your solution: you should use square brackets and the hat!
should_match '\[\^.*\]' "$regex"
For the following exercises try to write an specific pattern that matches resp. don't matches the example strings.
[x-z]
match x, y or z.
Write a regex that matches first to an A, B,..,or E. Then an arbitrary char followed again by an A, B,..,or E.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use char-range two times!
should_match "$regex" AnA BoB CpC AxC BwA BDC BxA EdE DwE
should_not_match "$regex" aax bby ccC Aay Cpy bob Bob anA AaF FaA
# to control your solution: use "[ .. ]" two times
should_match '\[.*\].\[.*\]' "$regex"
Examples:
e{4,}
at least four 'e'.[ab]{3,4}
matches e.g. 'abb', 'baa', 'abab', 'bbbb', but not 'ab' or 'abxa'. Write a regex that matches first to "wa" followed by 3 or 4 "z" followed by "up".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use curly braces notation.
should_match "$regex" wazzzzup wazzzup
should_not_match "$regex" wazzup wazup wazzzzzup
# to control your solution: it has to use curly brackets
should_match '\{.,.\}' "$regex"
Write a regex that matches first to "wa" followed by 0,1,2 or 3 "z" followed by "up".
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use curly braces notation.
# at most three z
should_match "$regex" wazzup wazup waup wazzzup
should_not_match "$regex" wazzzzup wazzzzzup wazzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzup
# to control your solution: use curly brackets
should_match '\{,.\}' "$regex"
*
: Zero or more, e.g. ab*c
matches 'abc', 'abbbc' and 'ac'.+
: One or more, e.g. ab+c
matches 'abc', 'abbbc', but not 'ac'.Write a regex for:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string. Use Kleene Star and Kleene Plus
# Don't use curly braces notation!
should_match "$regex" aaaabcc aabbbc aaaacccc
should_not_match "$regex" a ac aaaabb aaabb
?
for an optional char, e.g. ab?c
matches 'ac' and 'abc'.regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "1 file found?" "3 files found?" "24 files found?"
should_not_match "$regex" "No files found." "no file found?" "3 files found"
`" (space),
\t(tab) or
\n` (new line)[:space:]
POSIX character class, used inside a bracket expression, e.g. [[:space:]]
. \s
don't work in POSIX regular expressions.regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
At least a number followed by a dot "." and at least a space followed again by "xyz".
should_match "$regex" "1. xyz" "2. xyz" "3. xyz"
should_not_match "$regex" "1.xyz" ". xyz" "3. xz"
A whitespace in the middle of the string:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "d xyz" "we xyz " "dgg xz"
should_not_match "$regex" "xyz " "daADahga" " adaagXyz"
^
(hat), e.g., ^bla
matches to "blase
" but not to "nabla-operator
". $
(dollar sign), bla$
matches to "nabla
" but not to "blase
". Note that this is different than the hat inside a set of bracket [^...]
for excluding characters.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "Mission: successful"
should_not_match "$regex" "Last Mission: successful" "Mission: successful upon capture of target"
No whitespace in the middle of the string:
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "xyz " "daADahga" " adaagXyz"
should_not_match "$regex" "d xyz" "we xyz " "dgg xz" " bla bla"
\<
match the begin of a word. \>
match the end of a word.Write a regex with "man" at the begin of a word.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "A man was here." "... that the mankind survive."
should_not_match "$regex" "A woman was here." "... calling for humans .."
# to control your solution comment it out
#should_match '\\<' "$regex"
Write a regex with "man" at the end of a word.
regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" "A man was here." "A woman was here."
should_not_match "$regex" "... calling for humans .." "... that the mankind survive."
# to control your solution
should_match '\\>' "$regex"
(
.. )
will be captured as a group.# don't modify this
groups_should_match(){
regEx=$1
groups=$2
for string in "${@:3}"; do
if [[ ! ($string =~ $regEx) ]]; then
echo "Error: '$string' don't match to '$regEx', but should match!"
else
# we need to store BASH_REMATCH in another variable, because
# we do a nested regex-match which overrides the content of BASH_REMATCH
match=("${BASH_REMATCH[@]}") # copy of an array
i=1
for group_match in ${groups[@]}; do
if [[ ! ${match[$i]} =~ $group_match ]]; then
echo "Error: Group $i don't match to '${match[$i]}'."
fi
((i=i+1))
echo extracted: ${BASH_REMATCH[@]}
done
echo
fi
done
}
regex='YOUR_REG_EX_HERE' # replace the regex string.
Your solution should match all "pdf"-files (files with suffix ".pdf") that begin
with "file". Extract the filename without the suffix .pdf
inside the first group.
groups=('^file.*')
groups_should_match $regex $groups "file_record_transcript.pdf" "file_07241999.pdf"
should_not_match "$regex" "file_fake.pdf.tmp" "starts_not_with_file.pdf"
Your solution should match all "txt"-files (files with suffix .txt
) that begin
with file
. After file
there should be arbitrary chars followed by an _
and a number.
.txt
in the third group.E.g. file_record_transcript_66.txt
should give the following groups:
file_record_
66
.txt
regex='YOUR_REG_EX_HERE' # replace the regex string.
groups=('^file.*' '^[[:digit:]]+$' '^\.txt$')
groups_should_match $regex $groups "file_record_transcript_66.txt" "file_a_7_and_more_07241999.txt"
should_not_match "$regex" "file_fake.txt.tmp" "starts_not_with_file.txt"
(cats|dogs)
can be used to match 'cats' or 'dogs'.regex='YOUR_REG_EX_HERE' # replace the reg-ex string.
should_match "$regex" 'I love cats' 'I love bats' 'I love dogs' 'I love hogs'
should_not_match "$regex" 'I love rats' 'I love rogs' 'I love vogs' 'I love mats'