May 16, 2011
Every couple of weeks I see blog posts (yes, like this one) where some asshat tries to promote themselves by spouting off on their knowledge of linux commands. They usually end up on the front page of reddit with titles like “Stupid Linux Tricks,” “15 Advanced Shell Commands Every Sysadmin Should Know,” and my new favorite “Top 30 UNIX command Interview Questions asked in Investment Banks”.
Besides being obvious attempts at padding resumes (and oh god, do I hate padded resumes), they rarely show off anything new, and never go into detail about why the commands they’re listing do what they claim. Often, they don’t even fact-check themselves and end up posting commands which plainly don’t work:
6. There is a file Unix_Test.txt which contains words Unix, how will you replace all Unix to UNIX? You can answer this Unix Command Interview question by using SED command in UNIX for example you can execute sed s\Unix\UNIX\g fileName.
Let’s test that theory:
[hephaestus@fhtagn ~]$ cat lala unix unix UNIX Unix unix Unix unix [hephaestus@fhtagn ~]$ sed s\Unix\UNIX\g lala sed: -e expression #1, char 10: unterminated `s' command
The command doesn’t work, and there are four major reasons for it.
Firstly, in bash (it’s usually safe to assume a bash audience when pandering to sysadmins), backslashes are the escape character. That means the line s\Unix\UNIX\g is being interpreted by bash as sUnixUNIXg (escape character means ‘literally whatever follows’, see the Bash Beginners Guide for more information). In sed, the first character following a command (the ‘s’ part) is used as the delimiter for the command, so the command sUnixUNIXg is essentially the equivalent of s/nix/NIXg, except using ‘U’ as the delimiter. We get our unterminated ‘s’ command error because we need a final delimiter to let sed know the replacement command is finished: sUnixUNIXUg will parse correctly, because we added a finalizing delimiter, but it only replaces ‘nix’ with ‘NIX’ because we’ve used ‘U’ as the delimiter. Oops.
Let’s look at the command if we fix the escaping problem (by using forward slashes, which aren’t interpreted by the shell) but forget to quote the command:
[hephaestus@fhtagn ~]$ sed s/Unix/UNIX/ig lala UNIX UNIX UNIX UNIX UNIX UNIX UNIX [hephaestus@fhtagn ~]$
It worked! But what if we only want to uppercase every time we see two “unix”es in a row?
[hephaestus@fhtagn ~]$ sed s/unix unix/UNIX UNIX/g lala sed: -e expression #1, char 6: unterminated `s' command [hephaestus@fhtagn ~]$
Same error as before, but with a different cause. Most programs aren’t very intelligent about parsing their own command lines. Usually they take arguments in from their parent process (in this case, bash) as space-separated variables. That means for the command: “sed s/unix unix/UNIX UNIX/g lala”, the shell passes the following variables:
$0 = 'sed' $1 = 's/unix' $2 = 'unix/UNIX' $3 = 'UNIX/g' $4 = 'lala'
Since ‘s/unix’ in of itself isn’t a valid sed command, we get an error. This would be fixed by quoting the regular expression so that it’s all passed by the shell as a single argument: “sed ‘s/unix unix/UNIX UNIX/g’ lala”, or escaping the spaces (remember the backslash, literal interpretation): “sed s/unix\ unix/UNIX\ UNIX/g lala” both result in:
$0 = 'sed' $1 = 's/unix unix/UNIX UNIX/g' $2 = 'lala'
3. Wrong Goddamn Delimiter
Never use backslash unless you mean it. The standard non-string, non-escaped, delimiter character for sed or perl or anything else that uses regular expressions is the forward slash (‘/’). To teach people otherwise is to invite failure down the road. As I pointed out, you can get around the problems introduced by using the backslash by single-quoting the argument, or understanding how escaping is interpreted from the shell, but you wouldn’t have to if you just learn to use forward slashes from the start.
4. Okay, you printed to STDOUT. Now what?
The question was replacing the contents of a file. Even if you fix the escaping, quoting, and delimiter problem, you’re still not replacing anything inside the file, just printing the replacement to STDOUT. If you want a sed command to operate on a file in-place, use the ‘-i’ flag: “sed -i ‘s/Unix/UNIX/g’ lala”. If you want to write the edited file to a new file, use a shell redirect: “sed ‘s/Unix/UNIX/g’ lala > newfile”.
As I said before, I hate padded resumes. If I asked this interview question and you answered “sed backslash Unix backslash UNIX backslash g filename”, you wouldn’t get a job unless you at least mentioned “in quotes” somewhere in there. And then I’d ask you what kind of quotes. And then I’d ask why you didn’t actually edit the file.