Basic Linux Shell Scripting Part 2

This article will focus on an important part of Shell Scripting – Redirection and Piping. Simply put, redirection allows you to use the output of one command as the input for another. This allows you to perform tasks such as keyword searching a directory listing, or making decisions based on the content of a file. There are a few basic things you must understand to perform redirection.

Standard Input

Standard Input or [stdin] refers to the place where input for a command usually comes, you keyboard. Unless you specify otherwise, input always comes from standard input. Standard input is redirected using the < symbol.

Standard Output

Line standard input, standard output is the place where, unless modified, the output of commands is sent. This is by default your console, tty, or vty session. To send the output of a command elsewhere, you can use the > or >> symbols. Note that the >> symbol will append data, while the > symbol will overwrite what is already there.

Standard Error

Standard error is the place where errors are displayed for the user to review. This also defaults to your current display session. To record the errors of a command, you can use the 2> symbol. This can be useful if you want to run a command and keep a log of all errors the command generates.

Consider this example of redirection. The first command generates a list of all files on the computer, redirecting the contents to the file dirlist.out and the errors to the file dirlist.err. It is also set to run as a background process, allowing us to continue to work. During it’s execution I ran [ps] twice. The first time you can see the job running, the second one indicates its completion. Then I made a typo. Finally is a list of the two files used in the first command.

Piping

Piping allows you to take the output of one command and use it as the input to another. This can be used to “glue” commands together to perform combined operations. You have already done this to some extent when you have run the [ls | more] command to page through longer directory listings. The pipe operator is the [|] symbol, usually located above backslash “\”. This symbol performs a dual redirection, redirecting the output of the command on the left side of the symbol to the input for the command on the right side. Consider this example.

Regular Expressions

Regular expressions were once the primary way that searches were performed in programming languages. They are common in languages such as C and PERL. Regular expressions are becoming increasingly common in modern programming languages such as Java and Visual Basic.NET. Modern Regular Expressions are even standardized in ISO/IEC 9945-2:1993.

Regular Expressions: Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale.

Mastering regular expressions is a worthwhile goal if you plan on working extensively with UNIX, or programming in any modern language. There have been many books published on this topic, with focus books on the various regular expression packages such as those that come with Java, Perl (perlre), Visual Basic.NET and C#.

The GREP utility is produced by GNU and it’s primary function is to search through input for defined expressions. When using the GREP utility, and expression is encased within two forward slashes, //. The values between these symbols represent the search argument. In it’s simplest form, a regular expression could be /findme/. This would locate the word “findme”. It’s not until we begin adding wildcard symbols and logic expression that the regular expression begins gain power.

Consider the following characters.
1. . (period) – means any single character in this position
2. * – 0 or more occurrences of the preceding character
3. [ ] – a range of acceptable characters
4. [^ ] – a range of unacceptable characters
5. ^ – characters must be at the beginning of the line
6. $ – characters must be at the end of the line
7. \< \> – indicated that the characters are treated as a word

Consider the following examples.
1. d.g will find dog but not doug.
2. B* will find all words with the letter B.
3. [df].g will find dog and fog, but not bog.
4. [^df].g will find log and bog, but not dog and fog.
5. ^ef will find effort but not referee.
6. $in will find bin but not inside
7. [^\] will not find concatenate.

Using GREP

One of the most powerful commands you can have in your Linux command arsenal is GREP. GREP supports a number of input parameters, each of which can be used to alter the method of search. The man page and the –help argument can be used to obtain detailed description of each option. GREP is most often used as the target of a pipe from another command. Here is an example of GREP being used to parse through a file to locate a particular line. This example locates all commands the history commands that contain [ls].

Here are some common questions and answers about grep usage.**

1. How can I list just the names of matching files?

grep -l ‘main’ *.c

lists the names of all C files in the current directory whose contents mention `main’.

2. How do I search directories recursively?

grep -r ‘hello’ /home/gigi

searches for `hello’ in all files under the directory `/home/gigi’. For more control of which files are searched, use find, grep and xargs. For example, the following command searches only C files:

find /home/gigi -name ‘*.c’ -print | xargs grep ‘hello’ /dev/null

This differs from the command:

grep -r ‘hello’ *.c

which merely looks for `hello’ in all files in the current directory whose names end in `.c’. Here the `-r’ is probably unnecessary, as recursion occurs only in the unlikely event that one of `.c’ files is a directory.

3. What if a pattern has a leading `-‘?

grep -e ‘–cut here–‘ *

searches for all lines matching `–cut here–‘. Without `-e’, grep would attempt to parse `–cut here–‘ as a list of options.

4. Suppose I want to search for a whole word, not a part of a word?

grep -w ‘hello’ *

searches only for instances of `hello’ that are entire words; it does not match `Othello’. For more control, use `\<' and `\>‘ to match the start and end of words. For example:

grep ‘hello\>’ *

searches only for words ending in `hello’, so it matches the word `Othello’.

5. How do I output context around the matching lines?

grep -C 2 ‘hello’ *

prints two lines of context around each matching line.

6. How do I force grep to print the name of the file?

Append `/dev/null’:

grep ‘eli’ /etc/passwd /dev/null

gets you:

/etc/passwd:eli:DNGUTF58.IMe.:98:11:Eli Smith:/home/do/eli:/bin/bash

7. Why do people use strange regular expressions on ps output?

ps -ef | grep ‘[c]ron’

If the pattern had been written without the square brackets, it would have matched not only the ps output line for cron, but also the ps output line for grep. Note that some platforms ps limit the ouput to the width of the screen, grep does not have any limit on the length of a line except the available memory.

8. Why does grep report “Binary file matches”?

If grep listed all matching “lines” from a binary file, it would probably generate output that is not useful, and it might even muck up your display. So GNU grep suppresses output from files that appear to be binary files. To force GNU grep to output lines even from files that appear to be binary, use the `-a’ or `–binary-files=text’ option. To eliminate the “Binary file matches” messages, use the `-I’ or `–binary-files=without-match’ option.

9. Why doesn’t `grep -lv’ print nonmatching file names?

`grep -lv’ lists the names of all files containing one or more lines that do not match. To list the names of all files that contain no matching lines, use the `-L’ or `–files-without-match’ option.

10. I can do OR with `|’, but what about AND?

grep ‘paul’ /etc/motd | grep ‘franc,ois’

finds all lines that contain both `paul’ and `franc,ois’.

11. How can I search in both standard input and in files?

Use the special file name `-‘:

cat /etc/passwd | grep ‘alain’ – /etc/motd

12. How to express palindromes in a regular expression?

It can be done by using the back referecences, for example a palindrome of 4 chararcters can be written in BRE.

grep -w -e ‘\(.\)\(.\).\2\1’ file

It matches the word “radar” or “civic”.

Guglielmo Bondioni proposed a single RE that finds all the palindromes up to 19 characters long.

egrep -e ‘^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1$’ file

Note this is done by using GNU ERE extensions, it might not be portable on other greps.

13. Why are my expressions whith the vertical bar fail?

/bin/echo “ba” | egrep ‘(a)\1|(b)\1’

The first alternate branch fails then the first group was not in the match this will make the second alternate branch fails. For example, “aaba” will match, the first group participate in the match and can be reuse in the second branch.

14. What do grep, fgrep, egrep stand for ?

grep comes from the way line editing was done on Unix. For example, ed uses this syntax to print a list of matching lines on the screen.

global/regular expression/print

g/re/p

fgrep stands for Fixed grep, egrep Extended grep.

** Source: http://www.gnu.org/software/grep/doc/grep_13.html#SEC1

Conclusion

So as you can probably imagine, what you can do with tools such as GREP, Input and Output redirection and Piping is only limited by your creativity and knowledge of the system you are working on. In my next article we will take our basic shell scripts, as well as these commands and begin creating some functional administrative scripts.

** Source: http://www.gnu.org/software/grep/doc/grep_13.html#SEC1