10k Concurrent Connections
Aug 5, 2016 · 4 minute read · CommentsGoPerformance
After my recent appearance on the Go Time podcast I thought it would be fun to write a post on a few performance bits in Rend, the software project I hack on at work.
This topic comes up frequently enough that I felt I should write about it. The c10k problem has many solutions already. This is just my take.
Fortunately for people in 2016, handling 10 thousand concurrent connections is not really all that difficult anymore. Servers have long been able to handle that many connections, so why is it a topic? Mostly because 10 thousand is a nice round number to aim for, but realy I think it’s because most servers do not have to handle that many in the first place. You likely won’t see a brand new Rails or Django app or fresh Wordpress install that can work. This is perfectly fine. C10k is really about I/O bound apps managing the traffic between point A and point B.
Now, with some of the philosophy out of the way, let’s look at how we can solve the c10k problem fairly easily using Go.
One Goroutine per Connection
A common pattern for a TCP-based server in Go is to have a connection accept loop spin off a separate goroutine per accepted connection. This works well for a variety of reasons:
Simplifies Code
As the author you have less to worry about in a single function. It applies to the connection you’re currently servicing and nothing else. Of course, there’s some small exceptions.
I/O Events and Continuation are Transparent
When a goroutine performs a blocking action, it is placed into a waiting state while other
goroutines that can run are run. Once the conditions are met for the goroutine to run again, it will
be placed back into its work queue. This is an essential part of the green thread / multiplexing
model that Go uses. It also means you get some fo the advantages of non-blocking IO while not
descending into callback or asynchronous .and_then
hell.
Dave Chenety wrote an article that referenced the c10k problem and why IO polling is not an issue: http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
Downside: Memory
Memory increases linearly with the number of connections. At 10k connections the overhead of buffers and the like should be in the low hundreds of megabytes. There is a nonzero cost to having buffers and I/O structs per connection, but in general this does not matter. Even for Rend, which runs on boxes that are, essentially, memory as a service, the overhead is not enough to matter.
Resources are per-connection
Expanding a bit on the point above: Most of the code to handle a request can be straightline code, not worrying about any other connections. Even a panic only affects the one incoming and related outgoing connections. In Rend, even the connections are isolated, where once incoming external connection is tied to a specific set of outgoing connections to the backends. If there is a problem anywhere in Rend that panics, the grouped connections are closed and the rest of the connections live on.
To make things concrete, this is how Rend is structured:
- One main listener
accept
loop - ok because connections are long-lived and reconnects are rare - Each new connections gets its own
server
- basically a REPL - Each
server
owns anorca
, the request orchestrator - Each
orca
owns one or morehandler
s that are able to communicate to the backend - Each
handler
owns the connection to their backend
Prefer Atomic Operations for Simple Cases
a.k.a. minimize (the existence of) critical sections in locks
Now this comes with a HUGE caveat: sync/atomic
can be hard to get right, and weird bugs can result
if it is used improperly.
https://groups.google.com/d/msg/golang-nuts/AoO3aivfA_E/zFjhu8XvngMJ
This could be considered sacrilege, given some commentary on golang-nuts points to a desire for
sync/atomic
to be used less
- counters and gauges require no lock
- histograms have grouped data so they require a sync.RWMutex BUT that’s kept after the response has been sent
in this case im referring to the metrics package in rend everything in there is atomic or has a very short critical section reactive only - metrics endpoint does work only
Pools are OK
If you know you will need billions (or trillions) of a reusable item, then you might want to
consider using a sync.Pool
to pool objects. This may or may not be worth it to you depending on
your performance requirements. Another detail that took me a while to understand is that every
instance of sync.Pool
is cleared at every garbage collection,
meaning that your pool is only good between GCs (if the line number changes in the future,
look for func poolCleanup()
in sync/pool.go
).