Pages

Friday, December 7, 2012

Regular expreesion in Linux Bash and KSH Shells

Regular expreesion in Linux Bash and KSH Shells


Many people thinks that RegExp is alian to Bash/KSH Scripting and depends on GREP or SED to use regexp extensively.  But from Version 3 of Bash we can use regular expression with out using grep or sed. This will save us lot of time and reduce number of lines of script we write.

Lets take some example, I want to ask a user to enter "Yes" to do some activity depending on his input. He may end up with entering YES or Yes or YEs or some other form of YES or even just Y or y.

Earlier versions of BASH and KSH we have to depend on grep to put a check on user's different inputs as shown below code.


read option1

echo $option1 | grep -E '[Yy][Ee][Ss]|[Yy]'

if [[ "$?" -eq 0 ]]

then

echo "Do some thing"

else

echo "Do other thing"

fi

output:
Do Some thing

If you see above code we are using 3 commands(echo, grep and if statement) to accomplish our task.

Which will use system resources a bit more when we can accomplish with a just BASH/KSH alone.


From BASH version 3 there is an inbuilt regex operator( =~ ) which will help us to solve this problem



If you are new to regular expressions please click here

Below are the types of regular expressions we have and we will go with each and every regexp with an example.

Basic Regular expressions


^ --Caret symbol, Match beginning of the line/Variable.

$ --Match End of the line/variable.

. --Match Any single character.

* -- Match 0 or more occurrence of previous character(This is bit tricky).

[] – Match Range of characters, just single occurrence.

[a-z] –Match small letters

[A-Z] –Match cap letters


[0-9] – Match numerical.

[^] – Match Negate a sequence of char followed by ^.

\ -- Match Escape character, for matching special chars like ., * etc.


Extended Regular expressions


{n} --Match number of occurrence of a character

{n,m} --Match a character which is repeated n to m times.

{n,} --Match a repeated character which is repeated n or more times.

+ --Match one or more occurrences of previous character.

? – Match 0 or 1 occurrence of previous character.

-- Match Either character

() –match a group of characters


Let us start with an example

I have a variable var1 whose value is "abbbcd acdef 123 acd cda2". And we are going to use same variable for below examples.

Example1: Check var1 start with "a" or not

if [[ "$var1" =~ ^a ]]
then
echo "Yes, it start with a"
else
echo "It does not start with a"
fi

Output:

Yes, it start with a

Note: Dont give quotes around regexp, Your bash/ksh will take care of that.

Example2: Check if var1 ends with 2 or not


if [[ "$var1" =~ 2$ ]]
then
echo "Yes, its end with 2"
else
echo "It does not end with 2"
fi

Output:

Yes, it end with 2

Example3: Check if var1 have string which start with a and ends with d and should contain only 1 letter between them.

if [[ "$var1" =~ a.d ]]
then
echo "Yes, its present"
else
echo "It not there"
fi

Output:

Yes, its present

Example4: Check if var1 have a string which have may contain's 0 or more times of b

if [[ "$var1" =~ b* ]]
then
echo "Yes, its true"
else
echo "It not there"
fi

Output:

Yes, its true.

Note:The output of this script is always true because it we can have var1 with b's and with out b's. We have to depend on + for a meaning full checking like at-least 1 b occurrence.

Example5: Check if var1 contain contains either abc or adc word in it.

if [[ "$var1" =~ a[bd]c ]]
then
echo "Yes, its present"
else
echo "It not there"
fi

Output:
Yes, its present

This will match either abc or adc and gives the output as it present.

Example6: Check if var1 contains abc or bbc or cbc or dbc to zbc

if [[ "$var1" =~ [a-z]bc ]]
then
echo "Yes, its present"
else
echo "It not there"
fi

Output:

Yes, its present

I will leave [A-Z] and [0-9] option to reader it self. Explore your self on this.

Example7: Check if var1 do not contain z in it


if [[ "$var1" =~ [^z] ]]
then
echo "Its not there"
else
echo "z is present"
fi

Output:

Its not there

Example8: Check if a string "123 acd" is present in var1 or not

if [[ "$var1" =~ 123\ acd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

Note: If you observe regexp, We did not used quotes, your bash/ksh will take care of it.

Output:
Yes, its present.

Example9: Check if var1 contain a string which have 2 b's in between a and cd.

if [[ "$var1" =~ ab{2}cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

the output will be "Its not present" because our var1 variable(abbbcd acdef 123 acd cda2) have 3 b's between a nd cd.


Example10: Check if var1 contain a string which have 2 or 3 b's between a and cd.

if [[ "$var1" =~ ab{2,3}cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

The output will be "Yes, Its present", because it try to match abbbcd in the string, if one matches.

Example11: Check if var1 contain a string which have 2 or more b's between a and cd.

if [[ "$var1" =~ ab{2,}cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

Output:

Yes, its present

Example12: Check if var1 contain 1 or more b's between a and cd

if [[ "$var1" =~ ab+cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi


Output:

Yes, its present


Example13: Check if var1 contain 0 or 1 b between a and cd

if [[ "$var1" =~ ab?cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi


Output:

Yes, its present


Example14: Check if var1 contain either acdef or acd in it?


if [[ "$var1" =~ acdef|acd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

If either one of acdef or acd is there, if loop will be true.

Example15: Check if var1 contain abbbcd or acd in it. We can combain the letters which are common in both the words with () and

if [[ "$var1" =~ a(bbbc|c)d ]]
then
echo "Yes, its present"
else
echo "its not present"
fi


Output:

Yes, its present


if you want to make this more simple you can write the above regexp as below.

if [[ "$var1" =~ a(bbb|)cd ]]
then
echo "Yes, its present"
else
echo "its not present"
fi


Now come back to our old code on getting user input of Yes with out using echo and grep commands..

echo $result | grep -E '[Yy][Ee][Ss]|[Yy]'
if [[ "$?" -eq 0 ]]
then
echo "Do some thing"
else
echo "Do other thing"
fi

The above code can be written as

if [[ "$var1" =~ ([Yy]|([Ee][Ss])) ]]
then
echo "Yes, its present"
else
echo "its not present"
fi

So that user can type Y or y or yes or Yes or YEs or YES or YeS or yEs or yES or yeS. All these things will be matched.

Keep visiting us to get more update on Shell scripting.

1 comment:

  1. if [[ "$var1" =~ ([Yy]|([Ee][Ss])) ]] will match "es" as well. Great Tutorial!

    ReplyDelete