Friday, November 28, 2008

Your Little Black Book - Using a Text File and grep

This tutorial incorporates grep, nano and awk to search a text file of contacts to create a CLI PIM manager or address book and demonstrates well the power of the CLI. You do not need to have any knowledge of these commands to
follow the tutorial, but an understanding of such commands will aid in understanding the concepts being applied.

This address book and contact manager will have the following characteristics:

  1. be able to search and view records whenever you need them, from the command line and a text editor.
  2. perfectly accessible over a slow text-only network line
  3. be able to cut and paste addresses and contact information from the Web, email, and any other applications
  4. have no concept of "required fields" -- each record can contain as much, or as little, information as you need to have.
  5. can be taken with you and used on any Linux or Unix machine

Creating the Addresses file

The first step is to make a new file (call it something like "addresses"), you can do so by typing:

nano addresses
Now, you can begin adding records to it.

Records have to be delimited somehow; I would suggest putting ### on a line by itself between each set of contact information. Format the records themselves however you like, with name and address and whatever information you have. I used to keep a completely free-form contacts file, so that each record contained completely unformatted data in whatever way that I happened to get it, but this practice quickly shows its limitations -- I've found
that it helps immensely to label certain fields, such as telephone number and email address. I use this format:

Phone: phone number
Fax: fax number
Email: email address
Comments: additional information
As an example, a few records might look like this:


Acme Industries, Inc.
4211 E Broadway
New York, NY 10026
Phone: (212) 555-1032
Fax: (212) 555-1038


capri pizza
Phone: 555-8250


jane smith
Phone: 555-3104
Comments: friend of susan's

Searching, browsing, and exporting records

Create the following script to make searching for a contact easy.

awk 'BEGIN { RS = "###" } /'$*'/' ~/addresses
Name the script "contact" and make it executable (chmod +x contact). Make sure that your Address file and your contact script are in the same directory or place the script somewhere within your path.

At the command line simply type:
contact jane


./contact jane

This should bring up Jane Smith's contact information as found in the text file. You can search using any of the information in the text file. Here are some examples:

will give you all contacts with a '' email address.

contact 90028
will give you all the contacts living in the that zip code. You search options are limitless.

More Advanced Searching using grep and awk

fgrep outputs single lines of the file that match a string you give, and is good for when you just want to see if you have such-and-such a record in your file. Use the -i option to do a case-insensitive search -- for instance, here's how to see if you have contact information for Acme

fgrep -i 'acme industries' addresses
Should display:

Acme Industries, Inc.
The output gave the name -- but you want the phone number too. Output the search with a few lines after the match with the -A option:

fgrep -i -A5 'acme industries' addresses
Which will display the next 5 lines following the name 'acme industries', like this:

Acme Industries, Inc.
4211 E Broadway
New York, NY 10026
Phone: (212) 555-1032
Fax: (212) 555-1038
And here's where using labels really pays off. When you need, say, all the email addresses that have "" in them, you can find them with a plain grep command:

$ grep '^Email:' addresses | fgrep ''
Then you can simply copy and paste the listed emails into your email client.

Harvesting the actual addresses themselves is also a trivial matter:

$ grep '^Email:' addresses|egrep -o '[^ ]+$'
You can use awk to output entire records containing a particular match. The simplest way is to change the awk record separator, RS, to ### and then enclose the pattern to match in slashes. For example, here's how to export all records containing the string "acme" somewhere in the record:

awk 'BEGIN { RS = "###" } /acme/' addresses
and you will get:

Acme Industries, Inc.
4211 E Broadway
New York, NY 10026
Phone: (212) 555-1032
Fax: (212) 555-1038
The contact script we first used in this tutorial is based off of this awk command.

Because the file has labels, you can limit your search to them. For example, you can search for all email addresses that have "smith" in them, and output the entire records:

awk 'BEGIN { RS = "###"; FS = "Email: " } ($2 ~ "smith") { print $0 }' addresses
You can use the same awk pattern to do any number of things. For instance, in conjunction with the grep examples above you can output all the email addresses in records that have "friend" in the comment field:

$ awk 'BEGIN { RS = "###"; FS = "Comments: " } ($2 ~ "friend") { print }' addresses | grep '^Email:' | egrep -o '[^ ]+$'
Adding and importing records

Adding records to your contacts file is easy. The file doesn't need to be sorted, so append new records by either editing the file in nano or using redirection on the command line:

cat >> addresses
then type in your new contact, like this:

new name
Cell: 801-123-4567

This will add the 'new name' to the bottom of your addresses file.

Rarely do I actually type out any new contact information myself -- that only happens when I'm transcribing something from paper, or when I'm getting a number from someone on the phone. Nine times out of 10 I'm just cutting and pasting the text from the Web or email into an editor window that has
the contacts file open. It's painless and fast -- there are no forms to have to fill out for each part of the record. But you have to keep two things in mind: separate the records with hash marks, and insert labels for numbers, email, and comment fields, if you want to use them.

If you already have a set of address records formatted some other way, awk can import it so that it's in the right format.

Let's say you have a file named address.txt where all the records are kept one to a line in this common format:


Here's an awk one-liner to take that input and spit it out into the bottom of the addresses file, in just the right format:

$ awk 'BEGIN { FS = "," } { print "\n###\n\n" $2, $1, "\n" $3 "\n" $4 ", " $5, $6, "\nPhone: " $7, "\nEmail: " $8 }' address.txt >> addresses

This tutorial is a re-publication from an article from a few years back with some slight modifications from me. I'm unable to find the original article to link. If anyone knows who originally published this please let me know and I'll make the necessary reference.


Surly Teabag said...

The original article on this approach was here. It has some potentially helpful comments.


Jared said...

Thank you Surly for that link. I want to give credit where credit is due.