cut, join, comm, fmt, grep, egrep, sed, awk
Any of these commands (and many others) can be used within your shellscripts to manupulate data.
Some of these are programming languages themselves. Sed is fairly complex, and AWK is actually its own mini-programming language. So I'll just skim over some basic hints and tricks.
in one file, andjohn_s John Smith
will be joined to make a single line,john_s 1234 marlebone rd
john_s John Smith 1234 marlebone rd
If the files do not already have a common field, you could either use the paste utility to join the two files, or give each file line numbers before joining them, with
cat -n file1 >file1.numbered
For example
means "Do not show me lines ONLY in file1." Which is the same thing as saying "Show me lines that are ONLY in file2", and also "Show me lines that are in BOTH file1 and file2".comm -1 file1 file2
pr is similarly useful. But where fmt was more oriented towards paragaphs, pr is more specifically toward page-by-page formatting.
will give you the line in the passwd file that is about account 'joeuser'. If you are suitable paranoid, you would actually usegrep joeuser /etc/passwd
to make sure it did not accidentally pick up information about 'joeuser2' as well.grep '^joeuser:' /etc/passwd
(Note: this is just an example: often, awk is more suitable than grep, for /etc/passwd fiddling)
This will look at every line of input, and change the FIRST instance of "oldstring" to "newstring".sed 's/oldstring/newstring/'
If you want it to change EVERY instance on a line, you must use the 'global' modifier at the end:
sed 's/oldstring/newstring/g'
If you want to substitute either an oldstring or a newstring that has slashes in it, you can use a different separator character:
sed 's:/old/path:/new/path:'
But if you dont have time to look through it, the most common use for AWK is to print out specific columns of a file. You can specify what character separates columns. The default is 'whitespace' (space, or TAB). But the cannonical example, is "How do I print out the first and fifth columns/fields of the password file?"
"-F:" defines the "field separator" to be ':'awk -F: '{print $1,$5}' /etc/passwd
The bit between single-quotes is a mini-program that awk interprets. You can tell awk filename(s), after you tell it what program to run. OR you can use it in a pipe.
You must use single-quotes for the mini-program, to avoid $1 being expanded by the shell itself. In this case, you want awk to literally see '$1'
"$x" means the 'x'th column
The comma is a quick way to say "put a space here".
If you instead did
awk would not put any space between the columns!awk -F: '{print $1 $5}' /etc/passwd
If you are interested in learning more about AWK, read my AWK tutorial