Proxygen source code review

8 minute read

Published:

Source code review of Proxygen, a C++ HTTP library.

Proxygen source code review

Interfaces and how to use them

Proxygen is Facebook’s C++ HTTP libraries. The basic components are HTTPServer, RequestHandlerFactory, and RequestHandler under the /proxygen/httpserver directory.

The official version has incorporated sample usage, which is under /proxygen/httpserver/samples/ and it includes echo, and static http server samples.

I would like to take static server as an example. The main() function is in StaticServer.cpp. It first initializes folly and IP configs. Then you should define the options structure. handlerFactories is important in options.

options.handlerFactories = RequestHandlerChain()
      .addThen<StaticHandlerFactory>()
      .build();

The StaticHandlerFactory class is implemented by user in StaticServer.cpp, and it returns a new static handler. Every specific handler factory should inherit RequestHandlerFactory, which is a virtual class that defines some interfaces.

class StaticHandlerFactory : public RequestHandlerFactory {
 public:
  void onServerStart(folly::EventBase* /*evb*/) noexcept override {}

  void onServerStop() noexcept override {}

  RequestHandler* onRequest(RequestHandler*, HTTPMessage*) noexcept override {
    return new StaticHandler;
  }
};

The role of handler in the whole system is that, it is attached with each request, and interacts with operation system, or upstream servers requested by the clients.

Then options and IPconfigs will be passed to the interfaces. Your simple static http server is completed!

HTTPServer server(std::move(options));
  server.bind(IPs);

  // Start HTTPServer mainloop in a separate thread
  std::thread t([&] () {
    server.start();
  });

  t.join();
  return 0;

Introductions of each component

HTTPServer

HTTPServer.cpp abstracts actions of server like start and stop. By using bind(), codecFactory, When starting a server, it first gets an event base to manage events in a thread pool. Most thread io, and event management in Proxygen are based on the folly library developed by Facebook.

After getting an event base, it does following things in startTcpServer():

  1. Register HandlerCallbacks as the observer of thread pool. HandlerCallbacks implements interfaces like threadStarted and threadStopped.
  2. AcceptorFactory new a HTTPServerAcceptor, and pass it into bootstrap_.
  3. bootstrap_ sets childHandler, and thread pools for accepting connections and IO handling.

Then the server will start the main event loop.

RequestHandler

RequestHandler is an abstract class and its interfaces are implemented by specific handlers like staticHandler to handle static requests.

ResponseHandler acts as clients for RequestHandler subclasses and provides methods to send back the responses.

Methods of RequestHandler are registed as callback functions in HTTPSession module, to finish tasks in different stages. Some important methods include:

onRequest()
onBody()
onUpgrade()
onEOM()
onError()

RequestHandlerAdaptor

This class acts as an adaptor that converts HTTPTransactionHandler to RequestHandler.

public:
  explicit RequestHandlerAdaptor(RequestHandler* requestHandler);

It implements two abstract classes: HTTPTransactionHandler and ResponseHandler. It can send response directly back to clients, or interacts with Transport.

Overview of Proxygen architecture

The architecture of Proxygen is as this picture in its official README shows. A complete diagram of how it processes a http request is in HTTPTransaction.h:

/**
 * An HTTPTransaction represents a single request/response pair
 * for some HTTP-like protocol.  It works with a Transport that
 * performs the network processing and wire-protocol formatting
 * and a Handler that implements some sort of application logic.
 *
 * The typical sequence of events for a simple application is:
 *
 *   * The application accepts a connection and creates a Transport.
 *   * The Transport reads from the connection, parses whatever
 *     protocol the client is speaking, and creates a Transaction
 *     to represent the first request.
 *   * Once the Transport has received the full request headers,
 *     it creates a Handler, plugs the handler into the Transaction,
 *     and calls the Transaction's onIngressHeadersComplete() method.
 *   * The Transaction calls the Handler's onHeadersComplete() method
 *     and the Handler begins processing the request.
 *   * If there is a request body, the Transport streams it through
 *     the Transaction to the Handler.                               
 *   * When the Handler is ready to produce a response, it streams
 *     the response through the Transaction to the Transport.
 *   * When the Transaction has seen the end of both the request
 *     and the response, it detaches itself from the Handler and
 *     Transport and deletes itself.
 *   * The Handler deletes itself at some point after the Transaction
 *     has detached from it.
 *   * The Transport may, depending on the protocol, process other
 *     requests after -- or even in parallel with -- that first
 *     request.  Each request gets its own Transaction and Handler.
 *
 * For some applications, like proxying, a Handler implementation
 * may obtain one or more upstream connections, each represented
 * by another Transport, and create outgoing requests on the upstream
 * connection(s), with each request represented as a new Transaction.
 *
 * With a multiplexing protocol like SPDY on both sides of a proxy,
 * the cardinality relationship can be:
 *
 *                 +-----------+     +-----------+     +-------+
 *   (Client-side) | Transport |1---*|Transaction|1---1|Handler|
 *                 +-----------+     +-----------+     +-------+
 *                                                         1
 *                                                         |
 *                                                         |
 *                                                         1
 *                                   +---------+     +-----------+
 *                (Server-side)      |Transport|1---*|Transaction|
 *                                   +---------+     +-----------+
 *
 * A key design goal of HTTPTransaction is to serve as a protocol-
 * independent abstraction that insulates Handlers from the semantics
 * different of HTTP-like protocols.
 */

From starting listening to new connections coming in

In last section, I introduced several components to build a simple http server using Proxygen. I would like to explain what is happening behind those interfaces from starting listening to accepting new connections.

In the main function, we can configure our options and pass it into server. options has a list of HandlerFactory that can handle different events.

options.handlerFactories = RequestHandlerChain()
      .addThen<StaticHandlerFactory>()
      .build();

AcceptorFactory accepts HandlerFactory as the parameter of its constructor.

// AcceptorFactory
auto factory = std::make_shared<AcceptorFactory>(
          options_, // include HandlerFactory
          codecFactory,
          accConfig,
          sessionInfoCb_);

Then a bootstrap is started. bootstrap is in the wangle library. What bootstrap does here are mainly completed by three functions:

bootstrap_[i].childHandler(factory);
bootstrap_[i].group(accExe, exe); 
bootstrap_[i].bind(addresses_[i].address);

bootstrap.group() sets the thread pool for acception and IO. In group(), AcceptorFactory will new an acceptor and binds it with a new thread pool through newAcceptor(eventBase) in wangle::threadStarted().

In bind(), new asynServerSocket is built by newSocket() of socketFactory. Asynchronous callback function for accepting new connections is binded with asynServerSocket, and thread is set with it:

socketFactory -> addAcceptCB(socket, worker, worker -> getEventBase());

After a new connection coming in, acceptCB() will be called and several functions in wangle will be executed. The last function is onNewConnection() in HTTPServerAcceptor.

In Proxygen, HTTPServerAcceptor::onNewConnection() will call HTTPSessionAcceptor::onNewConnection(). In this function, new session is started:

session -> setSessionStats(downStreamSessionStats_);
Acceptor::addConnection(session);
session -> StartNow();

Handle a HTTP request

When a new HTTP request is read in, Proxygen uses an external HTTP parser library to parse it, and save it as a HTTPMessage object. The HTTP parser it uses is based on the parser used by Nginx.

HTTPCodec implements the callback functions defined in the http parser library. onIngress() is the first function of HTTPCodec to be called. It starts parsing the message by

size_t bytesParsed = http_parser_execute(&parser_,
                                             getParserSettings(),
                                             (const char*)buf.data(),
                                             buf.length());

Callback functions are set in getParserSettings(), and those functions are called when http_parser_execute() is called. Please be sure that all of the callback functions should return 0 when successed and return 1 when failed. Unexpected errors might happen if the return values are misdefined.

const http_parser_settings* HTTP1xCodec::getParserSettings() {
  static http_parser_settings parserSettings = [] {
    http_parser_settings st;
    st.on_message_begin = HTTP1xCodec::onMessageBeginCB;
    st.on_url = HTTP1xCodec::onUrlCB;
    st.on_header_field = HTTP1xCodec::onHeaderFieldCB;
    st.on_header_value = HTTP1xCodec::onHeaderValueCB;
    st.on_headers_complete = HTTP1xCodec::onHeadersCompleteCB;
    st.on_body = HTTP1xCodec::onBodyCB;
    st.on_message_complete = HTTP1xCodec::onMessageCompleteCB;
    st.on_reason = HTTP1xCodec::onReasonCB;
    st.on_chunk_header = HTTP1xCodec::onChunkHeaderCB;
    st.on_chunk_complete = HTTP1xCodec::onChunkCompleteCB;
    return st;
  }();
  return &parserSettings;
}

The functions on...() in HTTPSession will also be called in the functions of HTTPCodec. The diagram is as followed:

HTTP1xCodec::onHeadersComplete() -> HTTPSession::onHeadersComplete() -> HTTPDownstreamSession::setupOnHeadersComplete()

In HTTPSession, a new controller is allocated by calling HTTPSessionBase::getController(). In HTTPDownstreamSession, a new handler is created by the controller, by calling acceptor_ -> newHandler(), and is attached to a HTTPTransaction by calling

handler = SimpleController::getRequestHandler();
Transaction -> setHandler(handler);

In newHandler() of HTTPServerAcceptor, each HandlerFactory’s onRequest() will be called, and a new RequestHandlerAdaptor is returned;

HTTPTransactionHandler* HTTPServerAcceptor::newHandler(
    HTTPTransaction& txn,
    HTTPMessage* msg) noexcept {

  SocketAddress clientAddr, vipAddr;
  txn.getPeerAddress(clientAddr);
  txn.getLocalAddress(vipAddr);
  msg->setClientAddress(clientAddr);
  msg->setDstAddress(vipAddr);

  // Create filters chain
  RequestHandler* h = nullptr;
  for (auto& factory: handlerFactories_) {
    h = factory->onRequest(h, msg);
  }

  return new RequestHandlerAdaptor(h);
}

RequestHandlerAdaptor will interact with ResponseHandler and send responses to clients.