mhttpd

Robbert Haarman

2010-12-11

Introduction

mhttpd (Multi-user HTTP Daemon) is a webserver that keeps individual sites isolated from one another by running each one under its own user and group. This means that scripts on one site cannot access files on another site, solving many of the security risks typical to shared hosting. A secondary goal in the design of mhttpd was performance. The server avoids forking and even function calls, and does not implement some parts of the CGIstandard, and various features can be disabled at compile time. Unfortunately, the code is horribly ugly, so ugly that I thought it wiser to start over. mhttpd is now deprecated in favor of muhttpd, which focuses on cleanliness and simplicity. mhttpd is still offered for download to those who wish to experiment with its unique architecture.

The Shared Hosting Security Problem

Low-cost web hosting services generally employ a technique known as shared (or virtual) hosting. This means that many websites are hosted on the same webserver (both physical computer and server software). The leading Apache server, as well as any other server I am aware of, runs all sites with the same user id and group id1. This means that scripts running one one site have access to data on all other sites. If your site can read the database password from a configuration file, then so can every other site on the same server. This is obviously an undesirable situation.

The obvious solution to sites being able to access each other's data is to separate the sites in some way. Since the sites are typically administered by different users, it makes sense to draw the borders along the same lines. The problem is that websites are expected to be accessible through port 80 (the standard port for the HTTP protocol), and only one process can bind to the same port2 - and that one process has to be able to serve all sites. Of course, if it can serve all the sites, it can also access files from site A while serving site B.

A straightforward way to have one process serve every site is to make that process switch user ids depending on the site being served. The problem is that typical systems allow only processes with user id 0 (the superuser, or root) to switch user ids. In addition to switching user ids, this user is allowed to do anything else, including deleting all files on the system and powering down the machine. Running the server as uid 0 is an obvious security risk. Nevertheless, there are a number of solutions based on this theme. One is to run the main process with uid 0, and then, as each request comes in, fork a child process with the right user id and group id to handle the request. This is better than running the whole server as root, but leaves the most vulnerable part (parsing the request) with superuser privileges; a fairly high security risk. Another way is to run the server under a non-privileged user id, and have an external program perform the switching when a script is run. This method is implemented by Apache's suexec wrapper. This approach has a number of drawbacks. Running such a script for each request is slow. For this reason, Apache only uses suexec for CGI scripts. Unfortunately, this means the popular solutions with PHP or perl as server modules remain unprotected. Also, the web directories for each site are still accessible from other sites as well, meaning that webmasters still have to take extra care to protect their data.

The approach taken by mhttpd is to start a server for each site, running as the user and group of that site, and then have a main process delegate incoming requests to the designated server for that site. This avoids the performance hit of invoking a suexec-like wrapper for each script, and allows sites to be completely isolated from one another. Since the servers are started once at startup, and reused after each request, both the performance hit of having to fork at runtime and the security risk of having a process with uid 0 around are avoided.

Architecture of mhttpd

A running mhttpd consists of one central process that accepts requests and transfers data, and one process for each site to handle the requests. Forking is only performed for starting CGI scripts. Multiple clients can be served simultaniously through the use of asynchronous I/O.

When the server is started, the main process reads the configuration file, spawns the subservers, binds to the requested port, and switches to the requested user and group. Only after that, it starts accepting requests. The subservers each process their own configuration files, switch to the requested user and group, and optionally chroot() to their specific webroot.

Clients send their requests to the main process. The request has to be contained in a single packet (this limitation could be overcome by implementing a gathering mechanism that collects the entire request from multiple packets). The request is then delegated to one of the subservers for actual processing. This is done by passing the file descriptor of the connection through a UNIX socket. The subserver handles the request and returns a data source (file descriptor or pipe from a CGI script to the main process. The main process polls each data source for available data and sends it on to clients asynchronously.

Limitations

mhttpd suffers from a number of limitations. Most of these are deficiencies in the current implementation, and could be fixed in later releases. Some are fundamental issues with the architecture of mhttpd, and fixing them would require architectural changes.

One issue is that requests are not assembled from multiple packets. All request headers have to be contained entirely in the first packet received from the client. This has not led to problems in actual use so far, but could lead to issues with certain client software. This issue could easily be fixed by deferring the handling of a request until all headers have been received.

Another issue is that mhttpd does not implement the full CGI standard. Again, this has not led to practical problems, but may break certain scripts. There does not seem to be any obstacle to implementing the missing parts of the specification, but so far the need has not arisen.

Conditional requests, range requests, etc. are not implemented. This should not lead to any problems, but it means that mhttpd lacks important mechanisms for reducing network traffic. These could be implemented in the future.

One final limitation is that mhttpd does not support keep-alive connections. This leads to extra overhead on both the client and the server, as a new TCP/IP connection has to be set up for every object requested. Unfortunately, the architecture of mhttpd makes keep-alive connections difficult to implement for CGI scripts. When the script terminates, the file descriptor used for the connection will be closed, rendering it unusable.

1 Apache 2 has a perchild module that allows sites to run under different user and group ids. However, long after mhttpd was written, the perchild mpm continued to be so unstable as to be completely unusable.

2 Actually, one process can bind to each adress:port combination. This means that if the machine is assigned an IP address for each site, each site can be served by a different process. However, IP (version 4) addresses are scarce (and thus expensive). IPv6 addresses are widely available, but nowhere near universally accessible yet, making this solution problematic.