Tag Archives: reverse proxy

Persistent Connections and F5 iRules

At Rackspace Cloud Office we rely on the amazing power of F5’s BigIP network devices for most of our application routing. In case you don’t know, BigIP (usually referred to simply as ‘F5’) is an application gateway device that can perform a variety of functions that bridge the gap between your network and the application. In our case, we use F5 to route traffic to a variety of web applications, but really it can be any type of network application. A user’s HTTP request is routed into our data center and terminates at the F5. The resolvable public IP for our URLs is owned by the F5 and also owns the SSL cert for the domain. The F5 will then “reverse proxy” the traffic to an internal server. This may be a single server, but more likely it’s one of multiple servers in a pool (i.e. web farm), and the F5 will round-robin traffic to those servers. The F5 can also execute user-created scripts, called iRules, for each request to allow you to make intelligent decisions based on data in the request, e.g. User-Agent header, or the path in the URL. One of our applications uses an iRule to select a destination based on the path in the URL: one destination is a pool of servers running an ASP.NET application, the other is a pool of servers that simply serve static files. The iRule is executed for every HTTP request and if, for example, the path starts with `/static/`, then the static-content pool is selected for the request.
Here is an example of an iRule that works as described.

when HTTP_REQUEST {
  if { [string tolower [HTTP::uri]] starts_with "/static/" } {
     pool pool-static
  }
}

If no pool is selected by the iRule, then a configured default pool is selected, which for this example would be the application-server pool. Once the pool is selected, the F5 then selects a server from the pool and forwards the HTTP request to the server, and the response from that server is sent back to the requester. Simple enough.

But what about persistent connections (aka Keep-Alive)? Think about that for a second and you can see how persistent connections could throw a few wrenches into how it should work. Just to review: the HTTP 1.1 protocol declares that the underlying TCP connection should stay open until the client or server closes it. This saves a lot of time especially since a client can invoke multiple calls to the server to fetch sub-resources for a page. In this day & age of SSL-everywhere, persistent connections is a godsend. (Side-note: HTTP/2 spec allows for *multiple* concurrent persistent connections!) Thankfully the F5 handles HTTP persistent connections quite well, but in order to stay out of trouble you need to know how it works under the hood. More on that in a minute.

Quick digression: The interesting part about how application gateways work, is they are sometimes acting as a true network device, providing a NAT into or out of a network. This what a consumer wifi router does to provide Internet access to your home network, and quite often this exact same functionality is used to provide internet access to an isolated application network. However, there are other times when an application gateway is simulating the job of a network device. This is the case as outlined above: incoming HTTP requests from the client are terminated at the F5, including any encryption via SSL. Then once a pool & destination server is selected, a totally separate TCP connection is created on a different (internal) network from the F5 to the destination server. Once the second connection is up, a new HTTP request is created that is basically a copy of the incoming request, and sent to the destination server. On the return trip, the response from the server is copied to create an HTTP response back to the client to fulfill the original request. This sounds like a lot of work, and it is! (F5s manage to do a surprising amount of this work at the hardware level.) But the net effect is that it appears as though the F5 is routing the HTTP request at a network layer, however routing is a network layer 3 function, and the F5 is operating on layer 5/6. A more correct way to think about it is that it’s proxying the request, but even that, as we will see, isn’t entirely accurate.

So what happens to our simulated request routing when we have persistent connections? At this point in the discussion it’s not too complicated: as long as the TCP connection from the client stays open, then the F5 will keep the second TCP connection to the internal server open. Any subsequent HTTP request that comes in over the existing external connection will be proxy’d over the existing internal connection. The F5 keeps a map of which external connections map to which internal connections, and internal connections are never re-used (a.k.a. connection pooling) due to obvious security concerns. If the client closes the external connection, then the corresponding internal connection is also closed. I’m almost positive the opposite is true as well: if the internal server closes its connection, the external connection to the client is closed as well. (I haven’t had time to verify this, but I can think of some serious security concerns if this wasn’t the case.)

Now let’s take it a step further: What if we have an iRule that can split incoming requests between two pools of servers? What if, for every request, an iRule has to determine whether we are proxying to an application server pool, or a static-content server pool. What happens to the persistent connection? This is where the F5 has to behave in a way that is transparent to the client, but may have an impact on how the request is routed to an internal pool.

Here’s where I have to put a great big disclaimer up: I’m not an expert at F5 routing, nor have I had time to exhaustively research this. What I’m about to state is based on circumstantial evidence from troubleshooting this issue for the past couple of days. I’ll update accordingly if an expert tells me I’m wrong, which is likely. 🙂

Let’s take it step-by-step:

  1. A client creates a connection to the F5 and makes an HTTP request.
  2. The F5 runs an iRule which explicitly selects an application pool server, which fulfills the request. The internal connection is left open and is mapped as the “persistent connection” to the external connection.
  3. Using the existing external connection, the client makes another HTTP request for static content.
  4. The F5 runs the same iRule as before, which explicitly selects a static-content pool server, which fulfills the request. This new internal connection to the static-content pool server is separate and distinct from the internal connection created in step 2. Because it is the internal connection used for the most recent request for this external connection, the F5 now maps the internal connection to the static-content server as the “persistent connection” to the external connection. The other internal connection to the application server is no longer mapped as the “persistent connection”.
  5. The client makes a third request, again over the existing external connection.
  6. Again, the F5 runs the same iRule, however this time a pool is not explicitly selected. Typically that’s not a big deal, because the endpoint should always have a default pool. The default pool is used if the iRule didn’t explicitly select one. However this scenario isn’t typical: there’s an existing connection. If there’s an existing established connection that’s mapped, the default pool will not be used. The F5 will use the existing mapped internal connection unless the iRule explicitly selects a different pool.

This is why I said that proxying isn’t exactly an accurate description. Application gateways have to be far more intelligent about how they handle things. It’s not just forwarding bits and it’s not just a store-and-forward algorithm. It has to track state and make some unique choices about how it implements the HTTP standard, but do it in a way that is compatible with the usability and flexibility offered by iRules and other configuration.

I had to figure this out when the web application in our dev environment starting returning random 404s. After some investigation we determined that the requests returning 404 had been routed to the static-content server. This was very odd because the url for these requests didn’t match the criteria in the iRule for that pool. After quite a bit of digging (i.e. wireshark) we found that the requests were going to the static-content server because an existing connection existed to that server. The iRule wasn’t explicitly selecting the application server pool at the end. We didn’t think we needed to since the application server pool was the default pool for this endpoint. However the default pool wasn’t being used because there was already an existing connection. When we changed the iRule to select the default pool at the end of the script, the random 404s stopped occurring. Example:

when HTTP_REQUEST {
  if { [string tolower [HTTP::uri]] starts_with "/static/" } {
     pool pool-static
  }
  else {
     pool pool-application
  }
}

TL;DR; flowchart:
iRuleFlowchart