Skip to content

Instantly share code, notes, and snippets.

@domfarolino
Last active May 22, 2023 13:12
Show Gist options
  • Save domfarolino/4293951bd95082125f2b9931cab1de40 to your computer and use it in GitHub Desktop.
Save domfarolino/4293951bd95082125f2b9931cab1de40 to your computer and use it in GitHub Desktop.
Sending file descriptors over UNIX sockets

Sending file descriptors over UNIX sockets

When a process executes another process, either by one of the exec() family of functions, fork(), etc., the child process inherits all of the parent's file descriptors. This means if a parent has two file descriptors that are entangled (i.e., a read end and a write end), the parent process can simply write to the write end, and the child process can read from the read end.

                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   Process A    β”‚
                   β”‚                β”‚
                   β”‚ fd1=1          β”‚
                   β”‚ fd2=2          β”‚
                   β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚   Process B    β”‚β—„β”€β”€β”€β”€β”˜
β”‚                β”‚
β”‚ fd1=1          β”‚
β”‚ fd2=2          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this example, A can write to fd1, and B can read from fd2, establishing basic interprocess communication (IPC). But in large, long-running production software, processes are often spun up dynamically and need to communicate with other processes that they didn't inherit any I/O primitives from. They need to learn about i.e., file descriptors connected to peer processes on the fly.

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Process A    β”‚
                    β”‚                β”‚
                    β”‚ fd1=1    fd3=3 β”‚
                    β”‚ fd2=2    fd4=4 β”‚
                    β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
                        β”‚        β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   Process B    β”‚β—„β”€β”€β”€β”€β”˜        └────►│   Process C    β”‚
 β”‚                β”‚                    β”‚                β”‚
 β”‚ fd1=1          β”‚                    β”‚ fd1=1          β”‚
 β”‚ fd2=2          β”‚                    β”‚ fd2=2          β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this example, C needs to communicate with B, but has no I/O primitives connected to B to facilitate this. But A can teach C about B by sending it a file descriptor that's connected to B. Specifically, A might create a brand new pair of entangled file descriptors, and send each end to B and C respectively, establishing a direct line of communication between them.

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Process A    β”‚
                    β”‚                β”‚
                    β”‚ fd1=1    fd3=3 β”‚
                    β”‚ fd2=2    fd4=4 β”‚
                    β”‚                β”‚
                    β”‚     fd5=5──────┼─────────┐
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€fd6=6      β”‚         β”‚
         β”‚          β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜         β”‚
         β”‚              β”‚        β”‚             β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   Process B    β”‚β—„β”€β”€β”€β”€β”˜        └────►│   Process C    β”‚
 β”‚                β”‚                    β”‚                β”‚
 β”‚ fd1=1          β”‚                    β”‚ fd1=1          β”‚
 β”‚ fd2=2          β”‚                    β”‚ fd2=2          β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

...and finally


                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Process A    β”‚
                    β”‚                β”‚
                    β”‚ fd1=1    fd3=3 β”‚
                    β”‚ fd2=2    fd4=4 β”‚
                    β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   Process B    β”‚β—„β”€β”€β”€β”€β”˜        └────►│   Process C    β”‚
 β”‚                β”‚                    β”‚                β”‚
 β”‚ fd1=1    fd3=3 β”‚                    β”‚ fd1=1    fd3=3 β”‚
 β”‚ fd2=2      β–²   β”‚                    β”‚ fd2=2      β”‚   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”˜
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

But as you know, file descriptors are literally just ints, so it's not enough for A to send C simply the number 5 over a random message, which happens to be the value of the A:fd5 descriptor. That value means nothing to the child process. Integer file descriptors alone aren't powerful at all, they just represent an underlying kernel resource, so we need to somehow tell the kernel to send this resource, the underlying socket identified by its descriptor, to another process.

We do this by sending a "control" message from e.g., A to C. Instead of just sending normal bytes to C, a "control" message contains information in a format that the kernel understands, directing it send powerful capabilities in a message, and recover them on the other end (in C). Control messages must be sent over Unix domain sockets via sendmsg and recovered via recvmsg.

All message data is encoded in the msghdr struct, including:

  • The raw bytes of the message data (i.e., some text perhaps), in the msghdr.io_vec member
  • The control data in msghdr.msg_control

The control data is called "ancillary data", and shouldn't be accessed directly via the msghdr struct. It should only be accessed by the CMSG_FISRTHDR(struct* msghdr) and CMSG_NXTHDR(...) macros, which both return pointers to cmsghdr structs "inside" the outer msghdr. The control cmsghdr structs are where control data is written to and read from.

The cmsghdr struct definition can be found here. To send a file descriptor to another process you need to do three things:

  1. Set cmsg_level = SOL_SOCKET
  2. Set cmsg_type = SCM_RIGHTS
  3. Set the actual data to the file descriptor value, via *reinterpret_cast<int*>(CMSG_DATA(cmsg)) = fd_to_send

The CMSG_DATA() macro is how you get a pointer to the actual data component of the cmsghdr where you write your file descriptor values to the message. The level and type describe to the kernel exactly what data you're sending. A cmsg_type of SCM_RIGHTS allows you to send "a set of open file descriptors"; you can see the other message types here1.

Both the msghdr.msg_controllen and cmsghdr->cmsg_len must be set accordingly. See documentation in the code examples for more info.

Example

The send_fd.cc and receive_fd.cc files in this collection represent two binaries. The sender is the parent process which runs the child process, and in this example, the flow of events is:

  • Parent creates a UNIX domain socket pair
  • Parent runs child binary
  • Child binary runs with the inherited socket pair, acting as a primordial IPC connection
  • Parent creates another socket pair that the child clearly won't inherit
  • Parent sends a message over the original socket with:
    • Some friendly text in iovec
    • Control ancillary data in msg_control, sending one end of the recently-created socket pair to the child
  • Child reads the message text from the parent, and recovers the new socket descriptor
  • Child reads another simple message from the new socket that it just recovered

To run the example, do:

$ g++ send_fd.cc -std=c++0x -o send_fd
$ g++ receive_fd.cc -std=c++0x -o receive_fd
$ ./send_fd

This is a minimal toy example that simply demonstrates sending a file descriptor to another process. As a toy demo, it isn't that useful since the process receiving the descriptor already has a direct connection to the one sending it. A more interesting example would be a concrete implementation of the diagrams drawn above. Try and implement it yourself!

Footnotes

  1. Similar documentation can be found here, but it seems lacking as it does not include the SCM_SECURITY entry. ↩

#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <string>
#include <sys/socket.h>
#include <unistd.h>
#include <vector>
static const size_t MESSAGE_SIZE = 2;
int main(int argc, char** argv) {
assert(argc == 2);
int inherited_fd = std::stoi(argv[1]);
// The buffer that the raw message contents (sent to us over `inherited_fd`)
// will be recovered to.
std::vector<char> message_contents(MESSAGE_SIZE);
struct iovec iov = {message_contents.data(), message_contents.size()};
// The buffer that the control data in the message will be recovered to.
// This won't be accessed directly; we're just making space for it here for
// the OS to write to. We retrieve the control data by calling
// `CMSG_FIRSTHDR()` and `CMSG_DATA()` below.
std::vector<char> control_buffer(CMSG_SPACE(sizeof(int)));
struct msghdr msg = {};
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = control_buffer.data();
msg.msg_controllen = control_buffer.size();
size_t rv = recvmsg(inherited_fd, &msg, MSG_WAITALL);
assert(rv == MESSAGE_SIZE);
printf("Child process reading a message from inherited_fd:\n");
for (char c : message_contents) {
printf("%c", c);
}
printf("\n");
printf("Child recovering a file descriptor from message\n");
struct cmsghdr* cmsg = CMSG_FIRSTHDR(&msg);
assert(cmsg != nullptr);
assert(cmsg->cmsg_level == SOL_SOCKET);
assert(cmsg->cmsg_type == SCM_RIGHTS);
assert(cmsg->cmsg_len == CMSG_SPACE(sizeof(int)));
int dynamic_fd = *reinterpret_cast<int*>(CMSG_DATA(cmsg));
printf("Recovered dynamic_fd = %d\n", dynamic_fd);
std::vector<char> second_message_buffer(MESSAGE_SIZE);
printf("Child process reading a message from dynamic_fd:\n");
rv = read(dynamic_fd, second_message_buffer.data(), second_message_buffer.size());
assert(rv == MESSAGE_SIZE);
for (char c : second_message_buffer) {
printf("%c", c);
}
printf("\n");
printf("Child process exiting\n");
close(inherited_fd);
close(dynamic_fd);
return 0;
}
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <string>
#include <sys/socket.h>
#include <unistd.h>
#include <vector>
static const size_t MESSAGE_SIZE = 2;
int main() {
int fds[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
fcntl(fds[0], F_SETFL, O_NONBLOCK);
fcntl(fds[1], F_SETFL, O_NONBLOCK);
// Run the child binary.
pid_t pid = fork();
if (pid == 0) {
// Child process.
execl("./receive_fd", "--inherited-socket=", std::to_string(fds[1]).c_str(), NULL);
}
// Create new file descriptors that only exist in the parent process.
int dynamic_fds[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, dynamic_fds);
fcntl(dynamic_fds[0], F_SETFL, O_NONBLOCK);
fcntl(dynamic_fds[1], F_SETFL, O_NONBLOCK);
// The message text contents in a single iovec.
char message_contents[] = "ab";
struct iovec io = { .iov_base = message_contents, .iov_len = MESSAGE_SIZE };
// The control buffer where we'll store bytes for the control data (file
// descriptors). The size of this buffer is related to the number of file
// descriptors we're sending, but is not the exact `sizeof` the file
// descriptors, due to CMSG_SPACE().
std::vector<char> control_buffer(CMSG_SPACE(sizeof(dynamic_fds[1])));
struct msghdr msg = {};
msg.msg_iov = &io;
msg.msg_iovlen = 1;
msg.msg_control = control_buffer.data();
msg.msg_controllen = control_buffer.size();
struct cmsghdr* cmsg = CMSG_FIRSTHDR(&msg);
assert(cmsg != nullptr);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(dynamic_fds[1]));
*reinterpret_cast<int*>(CMSG_DATA(cmsg)) = dynamic_fds[1];
size_t rv = sendmsg(fds[0], &msg, 0);
assert(rv == MESSAGE_SIZE);
write(dynamic_fds[0], "cd", MESSAGE_SIZE);
// Sleep, giving the child a chance to read from the dynamic file descriptor.
// If we close and exit immediately, for some reason the child reads 0 bytes.
sleep(1);
close(fds[0]);
close(fds[1]);
close(dynamic_fds[0]);
close(dynamic_fds[1]);
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment