Unix Pipes

From OSDev Wiki
Jump to: navigation, search

Pipes, socketpairs and FIFOs are techniques that allow two processes to exchange data through a stream of bytes. Unlike files however, pipes and friends do not consume disk space but instead have a (usually circular) buffer within kernel. If the buffers become overloaded (e.g. if the consumer is too slow), the producer(s) will be turned to waiting state by the system.

Pipes are usually one-way streams with a producer side and a consumer side. Note that it is perfectly possible to have multiple producers and/or multiple consumers (though multiple consumers tend to make things hard to use). There is a thread in the forum on this subject; Stream-oriented programming.

Usage

Beside the usual shell-scripting use of pipes (where programs such as grep and less are linked together so that grep's output become less's input with grep kernel | less), several unix programs use such techniques to pass data around pre-build process, one of the most notable example of it being qmail. When running GCC, for example, using the -pipe flag will use pipes instead of temporary files for intermediate compilation results, thus speeding up compilation.

Pipes can be used by terminal emulators too to connect the teletype device (for example /dev/tty1) with the shell's standard input and output. When a command is executed, the shell uses the inherited file descriptor with the child process' new file descriptor to create a new pipe and connect the child's input and output with the teletype device. For example if you issue a cat command in a terminal, that's how the file content from cat's stdout gets to the /dev/tty1, which is read by the tty kernel subsystem to render the appropriate characters on the console.

Implementation

There are two commands for creating pipes on a Unix system: pipe() and mkfifo(). Both act in similar ways - the first creates an unnamed or anonymous pipe, the second creates a named pipe. Named pipes appear in the filesystem and exist until explicitly deleted; unnamed pipes do not and only exist while at least one filedescriptor is open on it.

The standard way to create an (unnamed) pipe on Unix, is to call pipe() which "returns" two file descriptors. One is used for writing to the pipe, the other for reading from the pipe.

Internally the pipe is (normally) a circular buffer/queue and two or more semaphores for locking. The size (on Linux) is typically around 4 kilobytes.

Such a pipe exists as long as either end of the pipe is open. It is cleaned after you close() both ends. There are a couple of ways to move file descriptors from a process to another on a typical unix system; the most common and 'correct' way being to pass the file descriptor from a parent to a child over a fork() call.

If one of the file descriptors is closed, and the other is tried to be used (either for read or write), you'll get an errno of EPIPE (Broken pipe).

When you write a pipe command line, it's the shell that calls pipe() and then when it fork()s the processes to do the work, it does some filedescriptor renaming (see dup()) to get the right descriptors at the right places before it finally exec()s the programs. Remember fork()ed childs inherit the parents open file descriptors by default.

Named pipes are similar, except they sit somewhere in the filesystem, and if you open the named pipe repeatedly, you get the same pipe every time. Once created, the named pipe remains in the filesystem until deleted like any other file.

See Also

Personal tools
Namespaces
Variants
Actions
Navigation
About
Toolbox