Nov 15 2023

hyper v1

I’m excited to announce v1.0 of hyper, a protective and efficient HTTP library written in the Rust programming language. hyper provides asynchronous HTTP/1 and HTTP/2 server and client APIs, allowing you to bring your own IO and runtime.

It’s been exciting and humbling to watch users build awesome things. Cloudflare uses hyper within Oxy, its next generation proxy framework to handle traffic at considerable scale. After Discord’s 5x improvement to @mention response times a few years ago, they have moved most of their critical systems to depend on Rust and hyper. curl has a currently experimental HTTP backend built on hyper with the goal of making the Internet safer.

Marc Brooker, a Distinguished Engineer at AWS, commented:

When building our new container-loading data plane for AWS Lambda, we expected to need a custom binary protocol. In production, we’ve found the overhead of hyper to be SO low that we are excited for it to continue powering our services.

Johan Andersson, CTO at Embark, said:

We have been using and relying on hyper for the last 5 years for our gRPC and REST services, tools, libraries, and embedded in our next game built in Rust. It has been rock solid across all of our usages, and it really is a foundational library for the Rust ecosystem. Congrats on 1.0!

The best way to get started is to check out the guide.

Stability here we come

Over the past 9 years, hyper has grown from a web developer’s side project into a solid library powering huge network applications. It’s time to grow up. After bringing async/await support in v0.14, we focused on providing a set of basic APIs that would keep hyper safe, fast, and flexible. This meant removing some of the more opinionated “higher level” pieces. Those belong elsewhere, like hyper-util, reqwest, Axum.

This release signals some stability. Major versions, like 1.0, are stable for at least 3 years.1 We also keep a MSRV that is at least 6 months old.2 We’ll add new features, and we still have a couple places to experiment: in the hyper-util crate, and hyper_unstable compiler flags.

Starting in v0.14.25, we added a backports feature which brings the new core APIs to you immediately. Combine that with the deprecated feature, and you’ll be guided to making your existing code ready for the upgrade to 1.0. Be sure to check out the upgrade guide!

Next

The most immediate next steps are to update the other core parts of the ecosystem that depend on hyper: reqwest, Axum, Tonic. But after that, there’s plenty more to do. You’re welcome to come join us!

HTTP/3

I would like this to be my next focus. We’ve been building up the h3 crate, and reqwest has unstable support now. I’d like to stabilize the feature in reqwest, and explore how we can make it available in hyper directly. Then we can have easy HTTP/3 servers, too!

The trickiest question is making it available without being tied to a TLS/QUIC library. Then, users could choose to use quinn, or s2n-quic, or msquic, or any other.

Stabilize in curl

The biggest parts of making hyper work in curl are done. Someone with experience in Rust and C could make a huge dent in Internet safety helping to get it over the finish line.

Middleware

There’s some excellent middleware available already in tower and tower-http. But several of the important ones are just a little (or a lottle) too difficult to add to a stack. I’d also love for there to be some recommended stacks for servers and clients, that bundle together the right middleware that most people would need. To that end, I’ve mentioned before breaking open reqwest such that all of its features are middleware you can customize.

Tracing and Metrics

It’s possible to currently get a decent set of logs using tower_http::trace. It’d be better if you could get more fine-grained traces and metrics. Probably with some stabilized integration with tracing directly in hyper. Maybe some sort of hyper-metrics, similar to tokio-metrics.

io_uring

Part of the reason we made hyper have its own IO traits was to be able to adapt them for completion-based IO. I believe having decent support and benchmarks could be had pretty soon, by a motivated individual.

Thanks

A huge thank you to all our amazing contributors. You’ve made this project the success it is, and helped move hyper along the journey to 1.0. I’d like to follow up with a separate post specifically thanking you all.

Thanks to the companies who have sponsored the creation of hyper: AWS, Buoyant, Mozilla, Rust Foundation, Fly.io, Embark and others.

Your company could also become a sponsor or get support!

  1. Besides some correctness mistake that must be fixed ASAP. 

  2. We realize that some users just cannot upgrade that fast, and we care about them. 

Oct 10 2023

hyper HTTP/2 Rapid Reset Attack: Unaffected

Today, the world has been made aware of a potential vulnerability affecting most HTTP/2 implementations, sending a rapid amount of streams and resets.

If you use hyper, even just it’s h2 dependency, you are safe. hyper is not affected. Especially if you have h2 v0.3.18 or newer. We manually verified that an example hyper server responds correctly. Big thanks to @Noah-Kennedy for all the help.

If you want to read more, checkout CVE-2023-44487, or these other breakdowns.

That’s it!

You’re still here. You want to know the “why”?

Well, for two main reasons.

We added in specific detection of this problem back in April. A related flaw was reported against hyper, with the added requirement of a consistently flooded network. We fixed that. It had a CVE and RUSTSEC advisory for it, so you should have upgraded, right?

But even without that fix, the damage that could be done was local. The bigger concern of this newly announced vulnerability seems to be when the receipt of the HEADERS frame triggers more work in the handlers that needs to then be canceled. The way hyper handles frames, it will cancel out the stream before ever making it available for handlers, so the cost is local. Without the fix, and only if the user can flood the network, then hyper could consume a lot of memory keeping track of all the suddenly reset streams. If they can’t flood the network, then no problem at all.

So if you’ve upgraded since April, you’re safe. By the way…

Handling security by dealing with reports, and working with coordinated disclosures like today are a significant part of maintaining hyper. If you appreciate that hyper is kept secure, consider sponsoring. Being able to have more support during security disclosures is something that you can setup with me privately.

Sep 28 2023

Was async fn a mistake?

This stabilization PR for async fn in traits made me think: was async fn in Rust a mistake?

I mean, I dunno. Maybe it wasn’t. But play along for a moment.

By the way, I don’t mean that async/await in Rust itself is a mistake. That’s a Big Deal. It allows companies to deploy some serious stuff to production. And async and await syntax is a huge save. I don’t want to lose that. Writing manual futures and poll functions is megasad.

I’m specifically talking about the async fn sugar. What if we didn’t have it, and instead just returned impl Futures, and used async blocks inside the functions?1

The current async fn is really nice, if you fit the expected usage. If none of the differences with impl Future ever cause you problems, then great! But I do run into them. Other people seem to also.

What’s so bad?

Some of these differences cause problems that don’t have decent solutions. (Do you know the differences?)2 If you have to deal with one of them, suddenly you need to use different syntax.

And now, people need to understand both. And keep the subtle differences in their head when they read. Does that make things better? Or worse?

It’s the only place that has a magic return type. It makes lifetimes weird. With suggestions to reign them in. It leads to all sort of proposals about how to customize the return type. #[require_send], async(Send), Service::call(): Send, and I’m sure there’s others.3 I also am thinking about generators and streams, since they could also end up with magic return values.

So was it mistake? I think it may have been. Don’t worry, I don’t want to take it away from you, if you disagree!4

What if the alternative was nicer?

But I did wonder about this. What if we had the following features ready:

  • Repurpose bare trait syntax to mean impl Trait. It’s been enough editions, right?
  • Ability to forgo naming an associated type name.
  • Stealing the feature from Scala where functions can equal a single expression.

Then asynchronous functions could look like this:

fn call(&self, req: Request) -> Future<Response> = async {
    // ...
}

That’d be a nice improvement.

  1. Yea, I know, it’s a little more writing. But I am in the optimize-for-reading camp. We read much more than we write. So if I have to write a few more characters at a function definition, but it makes the reading experience more understandable, that’s a massive win. 

  2. I’ve been involved in async Rust since the beginning. I know how it used to be, I was part of the group making it better, and I pay close attention to all the new proposals. I still mean what I said: none of the solutions look nice. 

  3. Return Type Notation (RTN) syntax is probably the least gross. But it raises a bunch of questions. Does it work for all functions? If not, why not? If so, do I check I::Iter or I::into_iter(). And also to consider: Rust’s strangeness budget

  4. I could see an argument that it’s sort of like for, while, and loop. A more convenient syntax when it works, and you can use the others when you need more control. That argument breaks down when async fn is part of a trait definition. But anyways, I really just want the less-sugared way to be little nicer. 

Jul 27 2023

I'm an independent open source maintainer

tl;dr - I’m independent, sponsor me!

I’m doing something new. I’m an independent open source maintainer! In the beginning of June, I left my position at AWS.1

I’m still focused on Rust, async, and HTTP stuff. Projects like hyper, reqwest, h3, tower, and any other new ideas that come along. I just won’t be doing so as an employee.

So, then how do I get paid? Let me just clear up a couple ways I’m not. I’m not making separate licenses. I’m not charging for features.2 I’m not selling prioritization on roadmaps. Rather, I plan to make maintenance work my primary focus.

Maintenance can feel like riding a squared unicycle while juggling water balloons. Some of those balloons are:3

  • Designing proposals, interviewing users, re-writing those proposals.
  • Coding, coding, coding.
  • Triaging a never-ending supply of issues.
  • Spelunking in ancient code paths to understand and fix weird bugs.
  • Following a proper security policy with responsible disclosure, collaborating privately, and preparing detailed reports.
  • Reviewing pull requests for quality and sticking to the vision, and hopefully teaching potential collaborators.
  • Writing articles and giving talks, as a form of marketing and teaching.
  • Pretending to be a project manager.

It’s a lot of work, so who would pay for all that?

Does your company depend on my work? Become a sponsor! Consider it a form of business risk mitigation. You can use GitHub Sponsors or Patreon. I can also work with an invoice system, for any requiring that.

I am also interested in some deeper relationships with companies that want more. What exactly those relationships will look like will evolve. It would likely be things that look like office hours, support or private advice. If you want to explore that with me, reach out at sean@seanmonstar.com.

  1. I learned a lot from my 3 years at AWS. Many lessons, some anti-lessons. Overall, I’m very grateful for my time there. But I had been planning this change for a while. And it was quite refreshing taking off a few weeks before jumping back into it all. 

  2. A win about being independent is that no single company is deciding what features should be added. 

  3. This would be a good subject for another article. There’s a lot more to it, and it’d probably be surprising to people how many hats are needed to maintain popular open source libraries, besides “just being a programmer”. At least, if you want to do it well. 

Apr 27 2023

Report on Surprise hyper CVE from 2023-04-11

Meta

This document is meant to help publicize the learnings from a recent emergency in hyper. Documents like these are common within various organizations. Some call them “postmortems”, others say “incident reports”. I quite like what Amazon calls them, since it aptly describes the purpose: Correction of Error. There was an error that caused an emergency, and we want to correct that error.

Summary

A surprise CVE publicly filed for hyper on April 11, 2023 caused an emergency situation for several collaborators, and sent out dependabot warnings with no actionable advice. By day’s end, we identified a best-guess at what the cause of the low-severity vulnerability was. By the next morning, a fix was available.

That the issue should have been a CVE is uncertain.

The bigger concern is the way the CVE was filed bypassing the existing security policy. That is similar to finding a lighter in a school, and pulling the fire alarm. This COE discusses both why it may have happened, and how we can try to reduce future occurrences.

The impact

The RustSec1 advisory explains the issue this way:

If an attacker is able to flood the network with pairs of HEADERS/RST_STREAM frames, such that the h2 application is not able to accept them faster than the bytes are received, the pending accept queue can grow in memory usage. Being able to do this consistently can result in excessive memory use, and eventually trigger Out Of Memory.

In reality, being able to consistently accomplish those conditions would be very difficult for an attack, and so the likelihood of this affecting anyone is minimal. Certainly low severity.

But the bigger impact was not this particular issue, but rather that a CVE caused a sudden panic for the maintainers and for users as dependabot alerted people with nothing that they could do.

The story

The original issue was filed on May 27, 2022. Trying to better understand, I asked some poorly worded follow-up questions. Another contributor filed a pull request trying to fix the underlying issue. Several collaborators reviewed that PR, but didn’t fully grasp what it was trying to fix. It then fell into the void.

On April 11, 2023, someone decided to file a public CVE for the described issue, without following the security policy. I commented on the issue that while the motivation for doing so was likely good-intentioned, it was the wrong way to go about it. GitHub imported the report, which started triggering dependabot warnings.2 This surprised us, and at least four people dropped everything to handle the fire alarm.3

The first step was trying to determine a reproducible example. We didn’t notice at the time it was filed, but the original issue did not include full reproducible instructions. We tried to create some unit tests to mimic the behavior described, but couldn’t trigger the issue.

Eventually, we noticed that a modified test that stopped “accepting” requests from the connection, but still polled it, would cause the accept queue to grow. But hyper makes sure to have a task that is always accepting requests, unless you specifically ask it to stop. Thus, the modified test seemed like user error, but it was a just guess.

It just seemed too convuluted. Then we arrived at a much better guess.

We finally found a way to grow the accept queue even when continuously accepting, by creating a test to blast thousands of requests in a loop. Since h2s test suite uses in-memory IO streams, we are able to fill the read buffer to near infinity. That’s when we settled on our best guess: if someone can fill the socket’s read buffer faster than the server can pop requests, then the accept queue could grow unbounded. While there is a setting to limit concurrent requests, because these are immediately reset, the limit would never be checked.

After 14 hours, we had a fix written and reviewed. We determined that the issue was low severity, as the likelihood of being able to consistently attack was extremely low. And since we were adding a new limit, there was a possiblity of causing a new bug. So, better to not push something right before going to sleep.

The following morning we published the fix, as h2 v0.3.17. Surprising everyone who has rushed out new code, a new bug in it was indeed found. We then published v0.3.18.4

Five whys5

  • Why did someone file a CVE suddenly? We don’t know for sure, but we can guess.6 A related issue had been open for a year, not fixed, so perhaps the reporter thought this was the only way to move forward.
  • Why wasn’t the issue acted upon a year ago? When it was initially opened, the maintainers didn’t fully understand what the problem was. Follow-up questions were asked, but even our questions weren’t that clear. Eventually, we forgot about it.
  • Why was it forgotten? We didn’t have any recurring reason to check back and try to understand what the issue was. If it had been reported privately to the security address, it would have stayed high priority until it was solved or determined incorrect.
  • Why wasn’t the initial issue reported privately? Perhaps the original reporter didn’t know about the policy.

What we’re doing to prevent a next time

We can’t completely control someone randomly filing a new CVE and causing another fire drill. But there are other things we can improve at to reduce the likelihood of one.

  • Schedule routine triage. This could be a synchronous meeting, such as in a text channel, or an audio channel. Or maybe over Twitch. But it can also just be a thing that triagers agree to do asynchronously, with a brief routine report to make sure we actually do it.
    • ⚠️ If you or your company uses hyper , this would be an especially useful way to help with maintenance. Have an engineer or two dedicate a few hours each month helping us triage.
  • Setup a bug report checklist. There is a triage guide for bug reports, which is a good thing. But that doesn’t mean everyone (me included!) always remembers all the steps. Checklists are famous in aviation and medicine for their effectiveness in saving lives. They can also help us make sure all issues are treated properly.
  • Update the issue templates to use forms instead. We do have an issue template in place, to try to get people to fill in more information initially. But it’s pretty easy to skip it. It’s possible using GitHub’s new forms instead of just a text template could guide people more often.

RustSec and the CVE database are different. RustSec was much more helpful, coordinating with us by waiting until the emergency panic was over, and then discussing the best way to describe the advisory. ↩︎

I updated the advisory on GitHub’s end to only indicate h2, not hyper. I also indicated my disappointment in GitHub’s amplifying of the alarm and making the day much more stressful. Their reply: “We do that sometimes XD”. Cool. ↩︎

Meanwhile, a reddit thread took off, watching the action, commenting, and mostly criticizing the actions of all involved. Thankfully, I didn’t read comments like “I don’t have any sympathy for the maintainers” until after the fix was completed. ↩︎

“At least this made you fix it, right?” No. This attitude is toxic. Doing it this way burns out everyone around who could fix it. There is a reporting process for a reason. It helps the most amount of people. Please use it. ↩︎

Not literally five questions, but an exercise to try to find the root cause, and to note any extra things that could be fixed along the way. ↩︎

Some people tried to infer bad motives, such as for clout or “another notch on a security researchers belt”. I see no reason to assume that with no evidence. ↩︎