Socat address chains

Introduction

Socat version 2 can concatenate multiple modules and transfer data between them bidirectionally.

Example 1: OpenSSL via HTTP proxy

socat - "OPENSSL,verify=0 | PROXY:secure.domain.com:443 | TCP:proxy.domain.com:8080"

This command does the following: socat connects to proxy.domain.com on port 8080 and sends a proxy CONNECT request for secure.domain.com port 443; this is similar to the proxy address available in version 1. Once the proxy server acknowledges successful connection to the target (SSL) server, socat starts SSL negotiation and then transfers data between its stdio and the SSL server.

Address chain basics

socat version 1 was able to open two addresses and transfer data between them. "Addresses" could be just sockets or other file descriptors, or could be a little more complex like proxy client or OpenSSL server and client. It was, though desirable, practically not possible to combine complex address types, or to use other socket types than the predefined ones (usually TCP) with complex addresses.

socat version 2 has been designed to overcome these limitations. First, the complex address types are now separated from the underlying file descriptor types. Second, complex addresses that are now called inter addresses can be concatenated to an address chain; however, an endpoint address that just provides file descriptors must be the last component of an address chain.

The socat invocation takes two address chains, opens them, and transfers data between them.

An address chain consists of zero or more inter addresses and one endpoint address, all separated by the pipe character '|'. When starting socat from the command line these characters and the optional spaces must be protected from the shell; it is recommended to put each address chain under double quotes.

The (bidirectional) inter addresses that are available with a socat implementation can be listed with the following command:

socat -h |egrep 'b ..b groups='

A full socat 2.0.0-b3 program provides the following inter addresses:

name	description
NOP	transfers data unmodified
OPENSSL-CLIENT	performs OpenSSL client negotiation, then encrypts/decrypts data
OPENSSL-SERVER	performs OpenSSL server negotiation, then encrypts/decrypts data
PROXY	performs proxy CONNECT client negotiation, then transfers data unmodified
SOCKS4	performs socks 4 client negotiation, then transfers data unmodified
SOCKS4A	performs socks 4a client negotiation, then transfers data unmodified
SOCKS5	performs socks 5 TCP client negotiation, then transfers data unmodified
TEST	appends > to forward, and < to reversely transferred blocks
EXEC	invokes a program (see socat-exec.html), then transfers data unmodified
SYSTEM	invokes the shell (see socat-exec.html), then transfers data unmodified

Reverse address use

Inter addresses have two interfaces. In most cases one of these can be seen as a data interface, where arbitrary data traffic may occur, and the other as protocol interface where the transferred data has to follow some rules like socks and HTTP protocol, or valid encryption.

Bidirectional inter addresses are usually implemented such that their data interface is on the "left" side, and the protocol interface on the "right" side.

It may be convenient to build an address chain where one or more inter addresses work in the reverse direction, so their protocol side is connected to left neighbor in the chain using the protocol, and the data side is connected to the right neighbor for raw data transfer. socat allows to use inter addresses in reverse direction by preceding their keyword with ˆ.

Example 2:

Endpoint addresses that fork should usually build the first socat address chain, without inter addresses. For creating an SSL to TCP gateway that handles multiple connections the following command line does the job:

socat TCP-LISTEN:443,reuseaddr,fork "^OPENSSL-SERVER,cert=server.pem | TCP:somehost:80"

Without the reverse usage of the SSL server address, socat would "speak" clear text with the clients that connected to its left address, and SSL to somehost.

Unidirectional data transfer

Like in socat version 1, it is possible to specify unidirectional transfers with version 2. Use socat options -u or -U.

Unidirectional transfer must be supported by the involved inter addresses; e.g., SSL requires a bidirectional channel for negotiation of encryption parameters etc.

It is possible to mix uni- and bidirectional transfers within one address chain: Think of a simple file transfer over SSL.

The socat help function can tell us which address types support which kinds of transfer:

socat -h |egrep 'openssl-server'

gives the following output:

      openssl-server                      rwb   b groups=CHILD,RETRY,OPENSSL
      openssl-server:<port>               rwb     groups=FD,SOCKET,LISTEN,CHILD,RETRY,RANGE,IP4,IP6,TCP,OPENSSL

The rwb b flags mean that this address type can handle readonly, writeonly, and bidirectional transfers on its left (data) side, but only bidirectional on its right (protocol) side.

The second line describes the (version 1) endpoint form: no right side traffic kinds are specified because this address type establishes its protocol communication itself.

Dual inter addresses

In socat version 1 it was already possible to combine two unidirectional addresses to one bidirectional address. This idea has been extended in version 2: Two unidirectional inter addresses can be combined to one bidirectional transfer unit.

Note: in version 1, the dual specification was like righttoleft!!lefttoright. In version 2, it is: lefttoright%righttoleft. This is the only major incompatibility between versions 1 and 2.

With the few already available inter address types, this feature has no practical use except with exec and system type addresses. However, the general function shall be described using the hypothetical inter address types gzip and gunzip.

Let us design these inter address types: gzip is a module that reads arbitrary data on its left ("data") side, compresses it, and writes the compressed data to its right (protocol side) neighbor.

gunzip reads gzip compressed data on its left side and writes the raw uncompressed data on its right side.

socat can combine these to provide a bidirectional compress/decompress function:
gzip%gunzip

Data coming from the left is passed through gzip and sent to the right; data coming from the right is passed through gunzip and sent to the left.

When the reverse functionality is desired this arrangement does the job:
gunzip%gzip

fork

socat provides the fork address option for uses like network servers where multiple clients can connect and are handled in parallel in different socat sub processes.

When the sub processes should work independently (share no socat file descriptors) the fork option must be applied to the last component of the first address chain. For better readability it is advisable to have only the "left" endpoint address in the left chain and put all intermediate addresses into the right chain.

Understanding chain implementation

The idea of concatenated modules in socat is not new. But a few attempts to completely rewrite and enhance the socat transfer engine were never completed. At last, it was decided to choose an approach that requires only moderate changes to socats transfer engine and the existing address types.

Think of several socat1 like processes somehow combined - with an abstract operator || :

socat - openssl || socat - proxy:secure.domain.com || socat - tcp:proxy.domain.com:8080

The solution was to put all these into one process but have each socat engine run in its own thread. The transfer between the engines goes over socket pairs, so the engines see file descriptors as usual. The main work then was to implement the functionality for opening address chains which includes parsing, creating socket pairs and threads, combining the addresses, taking care of unidirectional, dual, and reverse addresses etc.

Here is the socat version 2 command line of example 1:
socat - "OPENSSL,verify=0 | PROXY:secure.domain.com:443 | TCP:proxy.domain.com:8080"

A schematic representation of how this is realized in socat:
STDIO - engine[thread 0] - OPENSSL - socket pair - (FD) - engine[thread 1] - PROXY - socket pair - (FD) - engine[thread 2] - TCP

where FD means a trivial address similar to the FD (file descriptor) address type.

For debugging address chains it proved useful to write down two lines and to note the actual file descriptor numbers:

 STDIO ^ OPENSSL |    ^ PROXY |    ^ TCP
 0,1   ^       6 | 7  ^     4 | 5  ^   3

The symbol ˆ means a socat transfer engine.

Now the implementation of the reverse address feature should be easier to understand. While a forward address is put to the right side of its engine, a reverse address is just put to the left side. Example 2 can be explained so:

Example 2 command line:
socat TCP-LISTEN:443,reuseaddr,fork "^OPENSSL-SERVER,cert=server.pem | TCP:somehost:80"

Schematic representation:
TCP-LISTEN - engine[thread 0] - (FD) - socket pair - OPENSSL-SERVER - engine[thread 1] - TCP

Debug schema:

 TCP-L ^    | SSL-SERV ^ TCP
 3     ^  5 | 6        ^   4

Communication types

For communication between the address modules of consecutive transfer engines socat provides pairs (or quadruples) of file descriptors. You may think about these as two normal UNIX pipes (fifos), one for left-to-right and the other for right-to-left data transfer.

There are a few requirements that these file descriptors should fulfill, however they are different depending on the libraries used by the inter address modules (e.g. libopenssl) or by external programs that are involved (see socat-exec.html).

The factors to consider for these file dscriptors are:

Half close: when a module terminates communication on its write channel, its read channel should still stay open.
Half close method: A module might half close a connection using close() or shutdown() methods.
Buffering: The output buffering behaviour of some modules can be influenced by the type of file descriptor
INET: Some external programs require a TCP/IPv4 file descriptor

This table lists the available communication types and their properties:

comm.type	half close with close()	allows shutdown	avoids buffering	TCP/IPv4
socketpairs	OK	OK	no	no
socketpair	no	OK	no	no
pipes	OK	no	no	no
ptys	OK	no	yes	no
tcp	no	yes	no	yes

The default is socketpairs.

The overall communication type can be chosen using the -c socat option. With socat 2.0.0-b3 it is not possible to use different communication types in one process (exception: right side of exec/system modules)