Opened 11 years ago

Last modified 10 years ago

#339 new enhancement

Share SSL session cache between servers?

Reported by: andersk Owned by:
Priority: minor Milestone:
Component: web Keywords:
Cc:

Description

davidben points out that our servers can’t share SSL session caches with each other. Apparently you can do this with SSLSessionTicketKeyFile in 2.4, or perhaps using distcache in 2.2?

Change History (1)

comment:1 Changed 10 years ago by geofft

We have SSLSessionTicketKeyFile now, but it's important that we don't botch PFS by storing it on disk or by never changing that key. Twitter has a good overview of their solution that we can copy features from.

One problem with syncing keys is that distributed systems suck and changing keys for every server at once isn't feasible. The other problem is that we don't want to invalidate client sessions every time we roll the ticket key. So we need to keep at least a few ticket keys around.

Summarizing some zephyr discussion, here's a possible route:

  • Add an Apache directive named e.g. SSLSessionTicketKeyDirectory that contains a list of possible keys to be tried for decrypting client-provided tickets. The single key referenced by SSLSessionTicketKeyFile is still used for generating new tickets, and is expected to be one of the files in this directory. Send this patch upstream.
  • Create a directory in ramfs (like tmpfs, but unswappable; see Twitter's discussion of swap), and configure Apache on scripts to point SSLSessionTicketKeyDirectory here, and SSLSessionTicketKeyFile at a symlink inside this directory.
  • Write a cronjob that runs on the primary only (probably via the existing cron_scripts infrastructure, except that it runs as the Apache user or something) to generate new ticket keys periodically, and keep the symlink up-to-date. Since we need to make sure the keys get distributed to all servers, the symlink can't point at the most recent key until some time has passed. Maybe just point it at the second-most-recent key always.
  • Write a cronjob that runs on each server other than the primary to rsync this directory from the scripts-primary hostname every few minutes. rsync uses SSH, which does PFS, so this conversation can't be decrypted even if the SSH host keys are compromised.
  • Make a Nagios check that alerts us if either of these cronjobs haven't run in over one key-rotation interval. Transient failures (if the primary crashes, or there's a network partition for a few minutes) are not a problem, but keys not rotating or new keys not getting propagated are.
Note: See TracTickets for help on using tickets.