Beginners Guide To Piping?

They don’t.

Output of one, becomes the input of the other.

When you’re using them on a command line, this output/input is usually text, and people tend to stick to a few utilities and a couple of flag combinations they pass to those utilities. Often they may or may not remember what each one of the flags does - that’s ok, it’s easy to look these things up in man pages.

Sometimes you have “records” (if you want to call them like that, it’s not really an official concept) but usually they’re lines containing multiple related pieces of information, usually space separated.

So, utilities you’re invoking are usually basically just for loops on lines of text usually. Sometimes this one line you end up treating as a whole, sometimes you split it into fields using a space.

If you want to make each line a record with fields that contain spaces, it’s common to switch the field separator or a delimiter to a tab or a comma (e.g. like .csv) - most command line utilities let you pick a separator using a command line flag.
Similarly, if you want to have any new line characters, there’s some utilities (e.g. find and xargs) that let you split lines on a NUL aka \0 aka \x00 .


I’d recommend learning regex basics, and looking into grep/awk/sed, those three will get you pretty far.

There’s also head, tail, cut , sort, split, uniq and comm that are also pretty useful, you’ll pick those up anyway, they’re simple.

You can typically insert a ...| tee /dev/stderr | ... into the middle of a pipeline while you’re experimenting/debugging it. The example above will dump a copy of data on stderr where you can see it, while still forwarding one copy of data to the next thing in your shell pipeline.


Usually folks manipulate short lists of stuff in text form (short == less than a million lines typically). If the data is in XML and JSON, there’s some cursing and some a-ha moments. For JSON there’s the ./jq utility that will let you parse JSON and pipeline it or process it, I avoid XML but I know there’s a similar thing.


Sometimes, people will manipulate binary data, for example:

tar -c /my/project | gzip | openssl ... | ssh host "cat > my.secret.backup"

dd if=/dev/sda bs=1m | gzip | ssh other "cat > binary.disk.backup"
head -c 65536 > /dev/sda # bad idea

... ffmpeg .. | x264 ... | mkvtool...


Internally, a pipe is a pair of file descriptors where each process or command has one end of it, they just read and write to it as if it were a file. They can’t seek, or truncate it, I don’t think anything other than read/write/close works - but you can do plenty with just those.

So, in that vein, you can create a more permanent named pipe, you can use mkfifo my.fifo.pipeand it’ll create a permanent thing in a filesystem that programs can open read/write/close. This is useful if you’re scripting.


speaking of scripting, you can have one script talk to the other script. Or you can have multiple functions in e.g. a bash script talk to each other. You can also use a bash built-in exec to do funky stuff with file descriptors … basically have as many pipes as you want, and then you can have strange looking pipelines in shell. Where you have more than one reader writer, and can pass file descriptor id numbers over a pipe, so that the recipient can switch where it reads from or writes to.

This kind of thing is generally complicated, and shell scripting languages can in theory do complicated stuff, but this is typically when one starts to curse in frustration for not doing this in Python for example. (then years will pass and then folks will curse and replace Python with e.g. c++ or go maybe).

5 Likes

Whoa now that’s some comprehensive information guys thanks for your help, I have a point of reference to work from now. Not going to lie still slightly confused but I will get there :stuck_out_tongue:

1 Like