Thursday, April 6, 2017

On The Merits of QUIC for HTTP


I am often asked why the Internet HTTP community is working on an IETF based QUIC when HTTP/2 (RFC 7540) is less than 2 years old. There are good answers! This work is essentially part two of a planned evolution to improve speed, reliability, security, responsiveness, and architecture. These needs were understood when HTTP/2 was standardized but the community decided to defer larger architectural changes to the next version. That next version is arriving now in the form of QUIC.

Flashback to HTTP/2 Development


HTTP/2 development consciously constrained its own work in two ways. One was to preserve HTTP/1 semantics, and the other was to limit the work to what could be done within a traditional TLS/TCP stack.

The choice to use TLS/TCP in HTTP/2 was not a forgone conclusion. During the pre-HTTP/2 bakeoff phase there was quite a bit of side channel chatter about whether anyone would propose a different approach such as the SCTP/UDP/[D]TLS stack RTC DataChannels was contemporaneously considering for standardization.

In the end folks felt that experience and running code were top tier properties for the next revision of HTTP. That argued for TCP/TLS. Using the traditional stack lowered the risk of boiling the ocean, improved the latency of getting something deployed, and captured the meaningful low hanging fruit of multiplexing and priority that HTTP/2 was focused on.

Issues that could not be addressed with TCP/TLS 1.2 were deemed out of scope for that round of development. The mechanisms of ALPN and Alternative Services were created to facilitate successor protocols.

I think our operational experience has proven HTTP/2's choices to be a good decision. It is a big win for a lot of cases, it is now the majority HTTPS protocol, and I don't think we could have done too much better within the constraints applied.

The successor protocol turns out to be QUIC. It is a good sign for the vibrancy of HTTP/2 that the QUIC charter explicitly seeks to map HTTP/2 into its new ecosystem. QUIC is taking on exactly the items that were foreseen when scoping HTTP/2.

This post aims to highlight the benefits of QUIC as it applies to the HTTP ecosystem. I hope it is useful even for those that already understand the protocol mechanics. It, however, does not attempt to fully explain QUIC. The IETF editor's drafts or Jana's recent tutorial might be good references for that.

Fixing the TCP In-Order Penalty


The chief performance frustration with HTTP/2 happens during higher than normal packet loss. The in-order property of TCP spans multiplexed HTTP messages. A single packet loss in one message prevents subsequent unrelated messages from being delivered until the loss is repaired. This is because TCP delays received data in order to provide in-order delivery of the whole stream.

For a simple example imagine images A and B each delivered in two parts in this order: A1, A2, B1, and B2. If only A1 were to suffer a packet loss under TCP that would also delay A2, B1, and B2. While image A is unavoidably damaged by this loss, image B is also impacted even though all of its data was successfully transferred the first time.

This is something that was understood during the development of RFC 7540 and was correctly identified as a tradeoff favoring connections with lower loss rates. Our community has seen some good data on how this has played out "in the wild" recently from both Akamai and Fastly. For most HTTP/2 connections this strategy has been beneficial, but there is a tail population that actually regresses in performance compared to HTTP/1 under high levels of loss because of this.

QUIC fixes this problem through multistreaming onto one connection in a way very familiar to 7540 but with an added twist. It also gives each stream its own ordering context analagous to a TCP sequence number. These streams can be delivered independently to the application because in-order only applies to each stream instead of the whole connection in QUIC.

I believe fixing this issue is the highest impact feature of QUIC.

Starting Faster


HTTP/2 starts much faster than HTTP/1 due to its ability to send multiple requests in the first round trip and its superior connection management results in fewer connection establishments. However, new connections using the TCP/TLS stack still incur 2 or 3 round trips of delay before any HTTP/2 data can be sent.

In order to address this QUIC eschews layers in favor of a component architecture that allows sending encrypted application data immediately. QUIC still uses TLS for security establishment and has a transport connection concept, but these components are not forced into layers that require their own round trips to be initialized. Instead the transport session, the HTTP requests, and the TLS context are all combined into the first packet flight when doing session resumption (i.e. you are returning to a server you have seen before). A key part of this is integration with TLS 1.3  and in particular the 0-RTT (aka Early Data) handshake feature.

The HTTP/2 world, given enough time, will be able to capture some of the same benefits using both TLS 1.3 and TCP Fast Open. Some of that work is made impractical by dependencies on Operating System configurations and the occasional interference from middleboxes unhappy with TCP extensions.

However, even at full deployment of TLS 1.3 and TCP Fast Open, that approach will lag QUIC performance because QUIC can utilize the full flight of data in the first round trip while Fast Open limits the amount of data that can be carried to the roughly 1460 bytes available in a single TCP SYN packet. That packet also needs to include the TLS Client Hello and HTTP SETTINGS information along with any HTTP requests. That single packet runs out of room quickly if you need to encode more than one request or any message body. Any excess needs to wait a round trip.

Harmonizing with TLS


When used with HTTP/1 and HTTP/2, TLS generally operates as a simple pipe. During encryption cleartext streams of bytes go in one side and a stream of encrypted bytes come out the other and are then fed to TCP. The reverse happens when decrypting. Unfortunately, the TLS layer operates internally on multi-byte records instead of a byte stream and the mismatch creates a significant performance problem.

The records can be up to 64KB and a wide variety of sizes are used in practice. In order to enforce data integrity, one of the fundamental security properties of TLS, the entire record must be received before it can be decoded. When the record spans multiple packets a problem similar to the "TCP in-order penalty" discussed earlier appears.

A loss to any packet in the record delays decoding and delivery of the other correctly delivered packets while the loss is repaired. In this case the problem is actually a bit worse as any loss impacts the whole record not just the portion of the stream following the loss. Further, because application delivery of the first byte of the record is always dependent on the receipt of the last byte of the record simple serialization delays or common TCP congestion-control stalls add latency to application delivery even with 0% packet loss.

The 'obvious fix' of placing an independent record in each packet turns out to work much better with QUIC than TCP. This is because TCP's API is a simple byte stream. Applications, including TLS, have no sense of where packets begin or end and have no reliable control over it. Furthermore, TCP proxies or even HTTP proxies commonly rearrange TCP packets while leaving the byte stream in tact (a proof of the practical value of end to end integrity protection!).

Even the absurd-um solution of 1 byte records does not work because the record overhead creates multibyte sequences that will still span packet boundaries. Such a naive approach would also drown in its own overhead.

QUIC shines here by using its component architecture rather than the traditional layers. The QUIC transport layer receives plaintext from the application and consults its own transport information regarding packet numbers, PMTU, and the TLS keying information. It combines all of this to form the encrypted packets that can be decrypted atomically with the equivalent of one record per UDP packet. Intermediaries are unable to mess with the framing because even the transport layer is integrity protected in QUIC during this same process - a significant security bonus! Any loss events will only impact delivery of the lost packet.

No More TCP RST Data Loss


As many HTTP developers will tell you, TCP RST is one of the most painful parts of the existing ecosystem. Its pain comes in many forms, but the data loss is the worst.

The circumstances for an operating system generating a RST and how they respond to them can vary by implementation. One common scenario is a server close()ing a connection that has received another request that the HTTP server has not yet read and is unaware of. This is a routine case for HTTP/1 and HTTP/2 applications. Most kernels will react to the closing of a socket with unconsumed data by sending a RST to the peer.

That RST will be processed out of order when received. In practice this means if the original client does a recv() to consume ordinary data that was sent by the server before the server invoked close() the client will incur an unrecoverable failure if the RST has also already arrived and that data cannot ever be read. This is true even if the kernel has sent a TCP ack for it! The problem gets worse when combined with larger TLS record sizes as often the last bit of data is what is needed to decode the whole record and substantial data loss of up to 64KB occurs.

The QUIC RST equivalent is not part of the orderly shutdown of application streams and it is not expected to ever force the loss of already acknowledged data.

Better Responsiveness through Buffer Management


The primary goal of HTTP/2 was the introduction of multiplexing into a single connection and it was understood that you cannot have meaningful multiplexing without also introducing a priority scheme. HTTP/1 illustrates the problem well - it multiplexes the path through unprioritized TCP parallelism which routinely gives poor results. The final RFC contained both multiplexing and priority mechanisms which for the most part work well.

However, successful prioritization requires you to buffer before serializing the byte stream into TLS and TCP because once sent to TCP those messages cannot be reordered in the case of higher priority data presenting itself.  Unfortunately high latency TCP, requires a significant amount of buffering at the socket layer in order to run as fast as possible. These two competing interests make it difficult to judge how much buffering an HTTP/2 sender should use. While there are some Operating System specific oracles that give some clues, TCP itself does not provide any useful guidance to the application for reasonably sizing its socket buffers.

This combination has made it challenging for applications to determine the appropriate level of socket buffering and in turn they sometimes have overbuffered in order to make TCP run at line rate. This results in poor responsiveness to the priority schedule and the inability for a server to recognize individual streams being canceled (which happens more than you may think) because they have already been buffered.

The blending of transport and application components creates the opportunity for QUIC implementations to do a better job on priority. They do this by buffering application data with its priority information outside of the transmission layer. This allows the late binding of the packet transmission to the data that is highest priority at that moment.

Relatedly, whenever a retransmission is required QUIC retransmits the original data in one or more new packets (with new packet identifiers) instead of retransmitting a copy of the lost packet as TCP does. This creates an opportunity to reprioritize, or even drop canceled streams, during retransmission. This compares favorably to TCP which is sentenced to retransmitting the oldest (and perhaps now irrelevant) data first due to its single sequence number and in-order properties.

UDP means Universal DePloyment


QUIC is not inherently either a user space or kernel space protocol - it is quite possible to deploy it in either configuration. However, UDP based applications are often deployed in userspace configurations and do not require special configurations or permissions to run there. It is fair to expect a number of user space based QUIC implementations.

Time will tell exactly what that looks like, but I anticipate it will be a combination of self-updating evergreen applications such as web servers and browsers and also a small set of well maintained libraries akin to the role openssl plays in distributions.

Decoupling functionality traditionally performed by TCP from the operating system creates an opportunity for deploying software faster, updating it more regularly, and iterating on its algorithms in a tight loop. The long replacement and maintenance schedules of operating systems, sometimes measured in decades, create barriers to deploying new approaches to networking.

This new freedom applies both QUIC itself, but also to some of its embedded techniques that have equivalents in the TCP universe that have traditionally been difficult to deploy. Thanks to user space distribution packet pacing, fast open, and loss discovery improvements like RACK will see greater deployment than ever before.

Userspace will mean faster evolution in networking and greater distribution of the resulting best practices.
 
---

The IETF QUIC effort is informed by Google's efforts on its own preceding protocol. While it is not the same effort it does owe a debt to a number of Google folk. I'm not privy to all of the internal machinations at G but, at the inevitable risk of omitting an important contribution, it is worth calling out Jim Roskind, Jana Iyengar, Ian Swett, Adam Langley, and Ryan Hamilton both for their work and their willingness to evangelize and discuss it with me. Thanks! We're making a better Internet together.

This post was originally drafted as a post to the IETF HTTP Working Group by Patrick McManus ,

Wednesday, May 11, 2016

Cache-Control: immutable

About one year ago our friends at Facebook brought an interesting issue to the IETF HTTP Working Group - a lot (20%!) of their transactions for long lived resources (e.g. css, js) were resulting in 304 Not Modified. These are documents that have long explicit cache lifetimes of a year or more and are being revalidated well before they had even existed for that long. The problem remains unchanged.

After investigation this was attributed to people often hitting the reload button. That makes sense on a social networking platform - show me my status updates! Unfortunately, when transferring updates for the dynamic objects browsers were also revalidating hundreds of completely static resources on the same page. While these do generate 304's instead of larger 200's, this adds up to a lot of time and significant bandwidth. It turns out it significantly delays the delivery of the minority content that did change.
Facebook, like many sites, uses versioned URLs - these URLs are never updated to have different content and instead the site changes the subresource URL itself when the content changes. This is a common design pattern, but existing caching mechanisms don't express the idea and therefore when a user clicks reload we check to see if anything has been updated.

IETF standards activity is probably premature without data or running code - so called hope based standardization is generally best avoided. Fortunately, HTTP already provides a mechanism for deploying experiments: Cache-Control extensions.

I put together a test build of Firefox using a new extended attribute - immutable. immutable indicates that the response body will not change over time. It is complementary to the lifetime cachability expressed by max-age and friends.

Cache-Control: max-age=365000000, immutable

When a client supporting immutable sees this attribute it should assume that the resource, if unexpired, is unchanged on the server and therefore should not send a conditional revalidation for it (e.g. If-None-Match or If-Modified-Since) to check for updates. Correcting possible corruption (e.g. shift reload in Firefox) never uses conditional revalidation and still makes sense to do with immutable objects if you're concerned they are corrupted.

This Makes a Big Difference

The initial anecdotal results are encouraging enough to deploy the experiment. This is purely performance, there is no web viewable semantic here, so it can be withdrawn at any time if that is the appropriate thing to do.

For the reload case, immutable saves hundreds of HTTP transactions and improves the load time of the dynamic HTML by hundreds of milliseconds because that no longer competes with the multitude of 304 responses.

Facebook reload without immutable

Facebook reload with immutable

Next Steps

I will land immutable support in Firefox 49 (track the bug). I expect Facebook to be part of the project as we move forward, and any content provider can join the party by adding the appropriate cache-control extension to the response headers of their immutable objects. If you do implement it on the server side drop me a note at mcmanus@ducksong.com with your experience. Clients that aren't aware of extensions must ignore them by HTTP specification and in practice they do - this should be safe to add to your responses. Immutable in Firefox is only honored on https:// transactions.

If the idea pans out I will develop an Internet Draft and bring it back in the standards process - this time with some running code and data behind it.

Friday, September 25, 2015

Thanks Google for Open Source TCP Fix!

The Google transport networking crew (QUIC, TCP, etc..) deserve a shout out for identifying and fixing a nearly decade old Linux kernel TCP bug that I think will have an outsized impact on performance and efficiency for the Internet.

Their patch addresses a problem with cubic congestion control, which is the default algorithm on many Linux distributions. The problem can be roughly summarized as the controller mistakenly characterizing the lack of congestion reports over a quiescent period as positive evidence that the network is not congested and therefore it should send at a faster rate when sending resumes. When put like this, its obvious that an endpoint that is not moving any traffic cannot use the lack of errors as information in its feedback loop.

The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state. This consequence of this is self induced packet loss along with retransmissions, wasted bandwidth, out of order packet delivery, and application level stalls.

Unfortunately a number of common web use cases are clear triggers here. Any use of persistent connections, where the burst of web data on two separate pages is interspersed with time for the user to interpret the data is an obvious trigger. A far more dangerous class of triggers is likely to be the various HTTP based adaptive streaming media formats where a series of chunks of media are transferred over time on the same HTTP channel. And of course, Linux is a very popular platform for serving media.

As with many bugs, it all seems so obvious afterwards - but tracking this stuff down is the work of quality non-glamorous engineering. Remember that TCP is robust enough that it seems to work anyhow - even at the cost of reduced network throughput in this case. Kudos to the google team for figuring it out, fixing it up, and especially for open sourcing the result. The whole web, including Firefox users, will benefit.

Thanks!

Tuesday, September 22, 2015

Brotli Content-Encoding for Firefox 44

The best way to make data appear to move faster over the Web is to move less of it and lossless compression has always been a core tenet of good web design.

Sometimes that is done via over the top gzip of text resources (html, js, css), but other times it is accomplished via the compression inherent in the file format of media elements. Modern sites apply gzip to all of their text as a best practice.

Time marches on, and it turns out we can often do a better job than the venerable gzip. Until recently, new formats struggled with matching the decoding rates of gzip, but lately a new contender named brotli has shown impressive results. It has been able to improve on gzip anywhere from 20% to 40% in terms of compression ratios while keeping up on the decoding rate. Have a look at the author's recent comparative results.

The deployed WOFF2 font file format already uses brotli internally.

If all goes well in testing, Firefox 44 (ETA January 2016) will negotiate brotli as a content-encoding for https resources. The negotiation will be done in the usual way via the Accept-Encoding request header and the token "br". Servers that wish to encode a response with brotli can do so by adding "br" to the Content-Encoding response header. Firefox won't decode brotli outside of https - so make sure to use the HTTP content negotiation framework instead of doing user agent sniffing.

[edit note - around Oct 6 2015 the token was changed to br from brotli. The token brotli was only ever deployed on nightly builds of firefox 44.]

We expect Chrome will deploy something compatible in the near future.

The brotli format is defined by this document working its way through the IETF process. We will work with the authors to make sure the IANA registry for content codings is updated to reference it.

You can get tools to create brotli compressed content here and there is a windows executable I can't vouch for linked here

Friday, March 27, 2015

Opportunistic Encryption For Firefox

Firefox 37 brings more encryption to the web through opportunistic encryption of some http:// based resources. It will be released the week of March 31st.

OE provides unauthenticated encryption over TLS for data that would otherwise be carried via clear text. This creates some confidentiality in the face of passive eavesdropping, and also provides you much better integrity protection for your data than raw TCP does when dealing with random network noise. The server setup for it is trivial.

These are indeed nice bonuses for http:// - but it still isn't as nice as https://. If you can run https you should - full stop. Don't make me repeat it :) Only https protects you from active man in the middle attackers.

But if you have long tail of legacy content that you cannot yet get migrated to https, commonly due to mixed-content rules and interactions with third parties, OE provides a mechanism for an encrypted transport of http:// data. That's a strict improvement over the cleartext alternative.

Two simple steps to configure a server for OE
  1. Install a TLS based h2 or spdy server on a separate port. 443 is a good choice :). You can use a self-signed certificate if you like because OE is not authenticated.
  2. Add a response header Alt-Svc: h2=":443" or spdy/3.1 if you are using a spdy enabled server like nginx. 
When the browser consumes that response header it will start to verify the fact that there is a HTTP/2 service on port 443. When a session with that port is established it will start routing the requests it would normally send in cleartext to port 80 onto port 443 with encryption instead. There will be no delay in responsiveness because the new connection is fully established in the background before being used. If the alternative service (port 443) becomes unavailable or cannot be verified Firefox will automatically return to using cleartext on port 80. Clients that don't speak the right protocols just ignore the header and continue to use port 80.

This mapping is saved and used in the future. It is important to understand that while the transaction is being routed to a different port the origin of the resource hasn't changed (i.e. if the cleartext origin was http://www.example.com:80 then the origin, including the http scheme and the port 80, are unchanged even if it routed to port 443 over TLS). OE is not available with HTTP/1 servers because that protocol does not carry the scheme as part of each transaction which is a necessary ingredient for the Alt-Svc approach.

You can control some details about how long the Alt-Svc mappings last and some other details. The Internet-Draft is helpful as a reference. As the technology matures we will be tracking it;  the recent HTTP working group meeting in Dallas decided this was ready to proceed to last call status in the working group.

Wednesday, February 18, 2015

HTTP/2 is Live in Firefox

The Internet is chirping loudly today with news that draft-17 of the HTTP/2 specification has been anointed proposed standard. huzzah! Some reports talk about it as the future of the web - but the truth is that future is already here today in Firefox.

9% of all Firefox release channel HTTP transactions are already happening over HTTP/2.  There are actually more HTTP/2 connections made than SPDY ones. This is well exercised technology.
  • Firefox 35, in current release, uses a draft ID of h2-14 and you will negotiate it with google.com today.
  • Firefox 36, in Beta to be released NEXT WEEK, supports the official final "h2" protocol for negotiation. I expect lots of new server side work to come on board rapidly now that the specification has stabilized. Firefox 36 also supports draft IDs -14 and -15. You will negotiate -15 with twitter as well as google using this channel.
  • Firefox 37 and 38 have the same levels of support - adding draft-16 to the mix. The important part is that the final h2 ALPN token remains fixed. These releases also have relevant IETF drafts for opportunistic security over h2 via the Alternate-Service mechanism implemented. A blog post on that will follow as we get closer to release - but feel free to reach out and experiment with it on these early channels.
Sometime in the near future I will remove support for the various draft levels of HTTP/2 that have been used to get us to this point and we'll just offer the "h2" of the proposed standard.

For both SPDY and HTTP/2 the killer feature is arbitrary multiplexing on a single well congestion controlled channel. It amazes me how important this is and how well it works. One great metric around that which I enjoy is the fraction of connections created that carry just a single HTTP transaction (and thus make that transaction bear all the overhead). For HTTP/1 74% of our active connections carry just a single transaction - persistent connections just aren't as helpful as we all want. But in HTTP/2 that number plummets to 25%. That's a huge win for overhead reduction. Let's build the web around that.

Wednesday, January 7, 2015

HTTP/2 Dependency Priorities in Firefox 37

Next week Firefox 35 will be in general release, and Firefox 37 will be promoted to the Developer Edition channel (aka Firefox Aurora).

HTTP/2 support will be enabled for the first time by default on a release channel in Firefox 35. Use it in good health on sites like https://twitter.com. Its awesome.

This post is about a feature that landed in Firefox 37 - the use of HTTP/2 priority dependencies and groupings. In earlier releases prioritization was done strictly through relative weightings - similar to SPDY or UNIX nice values. H2 lets us take that a step further and say that some resources are dependent on other resources in addition to using relative weights between transactions that are dependent on the same thing.

There are some simple cases where you really want a strict dependency relationship - for example two frames of the same video should be serialized on the wire rather than sharing bandwidth. More complicated relationships can be expressed through the use of grouping streams. Grouping streams are H2 streams that are never actually opened with a HEADERS frame - they exist simply to be nodes in the dependency tree that other streams depend on.

The canonical use case involves better prioritization for loading html pages that include js, css, and lots of images. When doing so over H1 the browser will actually defer sending the request for the images while the js and css load - the reason is that the transfer of the js/css blocks any rendering of the page and the high byte count of the images slows down the higher priority js/css if done in parallel. The workaround, not requesting the images at all while the js/css is loading, has some downsides - it incurs at least one extra round trip delay and it doesn't utilize the available bandwidth effectively in some cases.

The weighting mechanisms of H2 (and SPDY) can help here - and they are what are used prior to Firefox 37. Full parallelism is restored, but some unwanted bandwidth sharing still goes on.

I've implemented a scheme for H2 using 5 fixed dependency groups (known informally as leader, follower, unblocked, background, and speculative). They are created with the PRIORITY frame upon session establishment and every new stream depends on one of them.



Streams for things like js and css are dependent on the leader group and images are dependent on the follower group. The use of group nodes, rather than being dependent on the js/css directly, greatly simplifies dependency management when some streams complete and new streams of the same class are created - no reprioritization needs to be done when the group nodes are used.

This is experimental - the tree organization and its weights will evolve over time. Various types of resource loads will still have to be better classified into the right groups within Firefox and that too will evolve over time. There is an obvious implication for prioritization of tabs as well, and that will also follow over time. Nonetheless its a start - and I'm excited about it.

One last note to H2 server implementors, if you should be reading this, there is a very strong implication here that you need to pay attention to the dependency information and not simply implement the individual resource weightings. Given the tree above, think about what would happen if there was a stream dependent on leaders with a weight of 1 and a stream dependent on speculative with a weight of 255. Because the entire leader group exists at the same level of the tree as background (and by inclusion speculative) the leader descendents should dominate the bandwidth allocation due to the relative weights of those group nodes - but looking only at the weights of the individual streams gives the incorrect conclusion.

Monday, December 1, 2014

Firefox gecko API for HTTP/2 Push

HTTP/2 provides a mechanism for a server to push both requests and responses to connected clients. Up to this point we've used that as a browser cache seeding mechanism. That's pretty neat, it gives you the performance benefits of inlining with better cache granularity and, more importantly, improved priority handling and it does it all transparently.

However, as part of gecko 36 we added a new gecko (i.e. internal firefox and add-on) API called nsIHttpPushListener that allows direct consumption of pushes without waiting for a cache hit. This opens up programming models other than browsing.

A single HTTP/2 stream, likely formed as a long lasting transaction from an XHR, can receive multiple pushed events correlated to it without having to form individual hanging polls for each event. Each event is both a HTTP request and HTTP response and is as arbitrarily expressive as those things can be.

It seems likely any implementation of a new Web based push notification protocol would be built around HTTP/2 pushes and this interface would provide the basis for subscribing and consuming those events.

nsIHttpPushListener is only implemented for HTTP/2. Spdy has a compatible feature set, but we've begun transitioning to the open standard and will likely not evolve the feature set of spdy any futher at this point.

There is no webidl dom access to the feature set yet, that is something that should be standardized across browsers before being made available.

Friday, November 21, 2014

Proxy Connections over TLS - Firefox 33

There have been a bunch of interesting developments over the past few months in Mozilla Platform Networking that will be news to some folks. I've been remiss in not noting them here. I'll start with the proxying over TLS feature. It landed as part of Firefox 33, which is the current release.

This feature is from bug 378637 and is sometimes known as HTTPS proxying. I find that naming a bit ambigous - the feature is about connecting to your proxy server over HTTPS but it supports proxying for both http:// and https:// resources (as well as ftp://, ws://, and ws:/// for that matter). https:// transactions are tunneled via end to end TLS through the proxy via the CONNECT method in addition to the connection to the proxy being made over a separate TLS session.. For https:// and wss:// that means you actually have end to end TLS wrapped inside a second TLS connection between the client and the proxy.

There are some obvious and non obvious advantages here - but proxying over TLS is strictly better than traditional plaintext proxying. One obvious reason is that it provides authentication of your proxy choice - if you have defined a proxy then you're placing an extreme amount of trust in that intermediary. Its nice to know via TLS authentication that you're really talking to the right device.

Also, of course the communication between you and the proxy is also kept confidential which is helpful to your privacy with respect to observers of the link between client and proxy though this is not end to end if you're not accessing a https:// resource. Proxying over TLS connections also keep any proxy specific credentials strictly confidential. There is an advantage even when accessing https:// resources through a proxy tunnel - encrypting the client to proxy hop conceals some information (at least for that hop) that https:// normally leaks such as a hostname through SNI and the server IP address.

Somewhat less obviously, HTTPS proxying is a pre-requisite to proxying via SPDY or HTTP/2. These multiplexed protocols are extremely well suited for use in connecting to a proxy because a large fraction (often 100%) of a clients transactions are funneled through the same proxy and therefore only 1 TCP session is required when using a prioritized multiplexing protocol. When using HTTP/1 a large number of connections are required to avoid head of line blocking and it is difficult to meaningfully manage them to reflect prioritization. When connecting to remote proxies (i.e. those with a high latency such as those in the cloud) this becomes an even more important advantage as the handshakes that are avoided are especially slow in that environment.

This multiplexing can really warp the old noodle to think about after a while - especially if you have multiple spdy/h2 sessions tunneled inside a spdy/h2 connection to the proxy. That can result in the top level multiplexing several streams with http:// transactions served by the proxy as well as connect streams to multiple origins that each contain their own end to end spdy sessions carrying multiple https:// transactions.

To utilize HTTPS proxying just return the HTTPS proxy type from your FindProxyForURL() PAC function (instead of the traditional HTTP type). This is compatible with Google's Chrome, which has a similar feature.

function FindProxyForURL(url, host) {
  if (url.substring(0,7) == "http://") {
   return "HTTPS proxy.mydomain.net:443;"
  }
  return "DIRECT;"
}


Squid supports HTTP/1 HTTPS proxying. Spdy proxying can be done via Ilya's node.js based spdy-proxy. nghttp can be used for building HTTP/2 proxying solutions (H2 is not yet enabled by default on firefox release channels - see about:config network.http.spdy.enabled.http2 and network.http.spdy.enabled.http2draft to enable some version of it early). There are no doubt other proxies with appropriate support too.

If you need to add a TOFU exception for use of your proxy it cannot be done in proxy mode. Disable proxying, connect to the proxy host and port directly from the location bar and add the exception. Then enable proxying and the certificate exception will be honored. Obviously, your authentication guarantee will be better if you use a normal WebPKI validated certificate.

Tuesday, March 4, 2014

On the Application of STRINT to HTTP/2

I participated for two days last week in the joint W3C/IETF (IAB) workshop on Strengthening the Internet against Pervasive Monitoring (aka STRINT). Now that the IETF has declared pervasive monitoring of the Internet to be a technical attack the goal of the workshop was to establish the next steps to take in reaction to the problem. There were ~100 members of the Internet engineering and policy communities participating - HTTP/2 standardization is an important test case to see if we're serious about following through.

I'm pleased that we were able to come to some rough conclusions and actions. First a word of caution: there is no official report yet, I'm certainly not the workshop secretary, this post only reflects transport security which was a subset of the areas discussed, but I still promise I'm being faithful in reporting the events as I experienced them.

Internet protocols need to make better use of communications security and more encryption - even imperfect unauthenticated crypto  is better than trivially snoopable cleartext.  It isn't perfect, but it raises the bar for the attacker. New protocols designs should use strongly authenticated mechanisms falling back to weaker measures only as absolutely necessary, and updates to older protocols should be expected to add encryption potentially with disabling switches if compatibility strictly requires it. A logical outcome of that discussion is the addition of these properties (probably by reference, not directly through replacement) to BCP 72 - which provides guidance for writing RFC security considerations.

At a bare minimum, I am acutely concerned with making sure HTTP/2 brings more encryption to the Web. There are certainly many exposures beyond the transport (data storage, data aggregation, federated services, etc..) but in 2014 transport level encryption is a well understood and easily achievable technique that should be as ubiquitously available as clean water and public infrastructure. In the face of known attacks it is a best engineering practice and we shouldn't accept less while still demanding stronger privacy protections too. When you step back from the details and ask yourself if it is really reasonable that a human's interaction with the Web is observable to many silent and undetectable observers the current situation really seems absurd.

The immediate offered solution space is complicated and incomplete. Potential mitigations are fraught with tradeoffs and unintended consequences. The focus here is on what happens to http:// schemed traffic,  https is comparably well taken care of. The common solution offered in this space carries http:// over an unauthenticated TLS channel for HTTP/2. The result is a very simple plug and play TLS capable HTTP server that is not dependent on the PKI. This provides protection against passive eaves droppers, but not against active attacks. The cost of attacking is raised in terms of CPU, monetary cost, political implications, and risk of being discovered. In my opinion, that's a win. Encryption simply becomes the new equivalent of clear text - it doesn't promote http:// to https://, it does not produce a lock icon, and it does not grant you any new guarantees that cleartext http:// would not have. I support that approach.

The IETF HTTPbis working group will test this commitment to encryption on Wednesday at the London #IETF89 meeting when http:// schemed URIs over TLS is on the agenda (again). In the past, it not been able to garner consensus. If the group is unable to form consensus around a stronger privacy approach than was done with HTTP/1.1's use of cleartext I would hope the IESG would block the proposed RFC during last call for having insufficiently addressed the security implications of HTTP/2 on the Internet as we now know it.

#ietf89 #strint

Saturday, August 31, 2013

SSL Everywhere for HTTP/2 - A New Hope

Recently the IETF working group on HTTP met in Berlin, Germany and discussed the concept of mandatory to offer TLS for HTTP/2, offered by Mark Nottingham.  The current approach to transport security means only 1/5 of web transactions are given the protections of TLS.  Currently all of the choices are made by the content owner via the scheme of the url in the markup.

It is time that the Internet infrastructure simply gave users a secure by default transport environment - that doesn't seem like a radical statement to me. From FireSheep to the Google Sniff-Wifi-While-You-Map Car to PRISM there is ample evidence to suggest that secure transports are just necessary table stakes in today's Internet.

This movement in the IETF working group is welcome news and I'm going to do everything I can to help iron out the corner cases and build a robust solution.

My point of view for Firefox has always been that we would only implement HTTP/2 over TLS. That means https:// but it has been my hope to find a way to use it for http:// schemes on servers that supported it too. This is just transport level security - for web level security the different schemes still represent different origins. If cleartext needed to be used it would be done with HTTP/1 and someday in the distant future HTTP/1 would be put out to pasture. This roughly matches Google Chrome's public stance on the issue.

Mandatory to offer does not ban the practice of cleartext - if the client did not want TLS a  compliant cleartext channel could be established. This might be appropriate inside a data center for instance - but Firefox would be unlikely to do so.

This approach also does not ban intermediaries completely. https:// uris of course remain end to end and can only be tunneled (as is the case in HTTP/1), but http:// uris could be proxied via appropriately configured clients by having the HTTP/2 stream terminated on the proxy. It would prevent "transparent proxies" which are fundamentally man in the middle attacks anyhow.

Comments over here: https://plus.google.com/100166083286297802191/posts/XVwhcvTyh1R

Tuesday, August 13, 2013

Go Read "Reducing Web Latency - the Virtue of Gentle Aggression"

One of the more interesting networking papers I've come across lately is Reducing Web Latency: the Virtue of Gentle Aggression. It details how poorly TCP's packet loss recovery techniques map onto HTTP and proposes a few techniques (some more backwards compatible than others) for improving things. Its a large collaborative effort from Google and USC. Go read it.

The most important piece of information is that TCP flows that experienced a loss took 5 times longer to complete than flows that did not experience a loss. 10% of flows fell into the "some loss" bucket. For good analysis go read the paper, but the shorthand headline is that this is because the short "mice" flows of HTTP/1 tend to be comprised mostly of tail packets and traditionally tail loss is only recovered using horrible timeout based strategies.

This is the best summary of the problem that I've seen - but its been understood for quite a while. It is one of the motivators of HTTP/2 and is a theme underlying several of the blog posts google has made about QUIC.

The "aggression" concept used in the paper is essentially the inclusion of redundant data in the TCP flow under some circumstances. This can, at the cost of extra bandwidth, bulk up mice flows so that their tail losses are more likely to be recovered with single RTT mechanisms such as SACK based fast recovery instead of timeouts. Conceivably this extra data could also be treated as an error correction coding which would allow some losses to be recovered independent of the RTT.