Behind the Curtain: Scalability

As Paul noted in his post, we’ve been working on building up our backend to be more scalable, especially our Databases. As it currently stands, we have a single database which is loaded into memory on startup. This allows all of our queries to be super fast, event when traversing hundreds of thousands of rows.

There are a few downsides to our current deployment. First, since we’re preloading all the data into memory, the spool up time for this database is huge. Since the cost is so high, taking it down for maintenance is something we do very infrequently. While this is not a bad thing in and of itself, it does mean that we cannot make large changes to our database structure more than a couple of times a year.

Second, since all the data is on a single server, the only way to scale up is to throw more resources at the server. This is only feasible as long as our growth is at or below the pace which technological power increases. As we grow faster and faster, this is becoming less and less feasible, necessitating the need to more toward a more distributed datastore system.

Third, serving all of our data from a single datacenter means that connectivity issues in that area would adversely affect service nationwide.

Sharding solves all 3 of these issues.

First, since each shard contains just a fraction of the data, it takes much less time to warm up each server. It also allows us affect only a small set of our users if we need to take a server down for maintenance.

Second, sharding allows us to gradually bring up more servers as needed to serve the growing load. This is especially useful in allowing high volume areas to have their own dedicated cluster of shards.

Third, by splitting the data up across multiple shards, we can begin widely distributing our servers. We have long discussed and wanted to implement a way for us to host the servers away from the areas they service. For example, New York would be serviced mainly by servers hosted outside of the Northeast. Not only would this resolve the single point of failure issue that Paul outlined, it would also decrease the likelihood that our service would go down during a natural disaster, since the servers are outside of the affected area.

Of course, such a huge change will take time, and is not without its risks and potential downsides. We are making this change gradually, starting with the new PARS board feature that is coming soon. Check back often as we explore the wonder and mysteries of sharding.

One thought on “Behind the Curtain: Scalability

  1. It sounds like you are using physical, in-house servers or a relatively local data center where you are dealing with physical machines. Have you evaluated more cloud-based services? Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *