Long time readers of this blog will assert that I have no problem presenting an unpopular opinion, and/or sticking my foot in my mouth. Some times both at once! (“But wait… there’s more!”) So when N. Shah asks me how he should split his database (a tags table, items table, and users table) I say: The answer is in the question.
You have only one database
Lets drop the pretense folks. Lets come back to the real world. This is the web 2.0 world. Data is growing at a seriously exponential. And desperate times call for desperate measures.
Joins are nice. They’re pretty. They’re convenient. They keep us from having to think very much. But they do NOT promote using commodity hardware for your databases. They just don’t. No, really, an in-database join chains you to an in-database solution. You *could* keep upgrading and upgrading… faster processors… larger disks… faster raid… And then you move to buying SAN’s and you’re talking about some serious cash for that cache. Or… You think about things differently. You put in a little work up front. And you break the mold. Because one database ties you to one server. And that, my friends, is the problem.
So, N, here’s my answer: Split your database once, and then your databases once.
DB:/users
DB:/items
DB:/tags
becomes
DBTags:/tags
DBUsers:/users
DBItems:/items
And then
DBUsers:/users
Pretty simple… users tend to be a small table, and keeping them in one place makes a lot of sense here. HOWEVER. depending on your architecture and uses you could easily split the users as we do the tags (not items) below.
DBItems:/
- items_id_ending_in_0
- items_id_ending_in_1
- items_id_ending_in_2
- items_id_ending_in_3
- items_id_ending_in_4
- items_id_ending_in_5
- items_id_ending_in_6
- items_id_ending_in_7
- items_id_ending_in_8
- items_id_ending_in_9
again, pretty simple. you have your run of the mill integer item id’s split them by the last digit of your item id, and you can reduce the footprint of any one table to 1/10th of the whole dataset size
DBTags:/
- tags_crc_ending_in_0
- tags_crc_ending_in_1
- tags_crc_ending_in_2
- tags_crc_ending_in_3
- tags_crc_ending_in_4
- tags_crc_ending_in_5
- tags_crc_ending_in_6
- tags_crc_ending_in_7
- tags_crc_ending_in_8
- tags_crc_ending_in_9
Now here is a little bit of voodoo. You have these tags, and tags are words. And I like numbers. Numbers make life easy. So by creating a CRC32 hash of the word, and storing it with the tag {id|tag|crc332} you can quickly reverse the tag to an id, and then go find items with that tag id associated, while still retaining the ability to split the db by powers of 10.
You can still use your join tables items_to_users, and tags_t_items, these tables consisting of ints take up almost _NO_ space whatsoever, and so can go where convenient (if you query items for users more than users for items, then put the join table in the users db) but you cant actually preform in-server full joins any longer. Heck you can even keep two copies of the join data, items_to_tags in the items dbs, and tags_to_items in the items dbs.
So, like many things in life, going cheaper meant going a bit harder. But what did we gain? Well lets assume 10 ec2 instances…
Ec2a
- users (w)
- items 0-1 (w)
- tags 0-1 (w)
Ec2b
- items 2-3 (w)
- tags 2-3 (w)
Ec2c
- items 4-5 (w)
- tags 4-5 (w)
Ec2d
- items 6-7 (w)
- tags 6-7 (w)
Ec2e
- items 8-9 (w)
- tags 8-9 (w)
Ec2f
- items 0-1 (r)
- tags 0-1 (r)
Ec2g
- users (r)
- items 2-3 (r)
- tags 2-3 (r)
Ec2h
- items 4-5 (r)
- tags 4-5 (r)
Ec2i
- items 6-7 (r)
- tags 6-7 (r)
Ec2j
- items 8-9 (r)
- tags 8-9 (r)
So thats a total of about… oh… 1.6 terrabytes of space… 18gb of RAM, 17Ghz of processor speed, and an inherently load balanced set of database instances. And when you need to grow? split by the last 2 (16TB) digits, 3(160Tb) digits, 4(1,600TB) digits…
So, now that you’ve read to the bottom. It’s 1:00am, way past my bed time. Remember that when designing a database you — above all — need to listen to your data. Nobody will come up with a solution that perfectly fits your problem (thats why its called “your problem”) but techniques can be applied, and outlooks can be leveraged.
Disclaimer: some or all of this might be wrong, there may be better ways, dont blame me. I’m sleep-typing 😉
Nice article.
I've been thinking along these lines for some time. Getting databases to work with ec2/s3 is too much work. A different approach has to be taken.
The thing I'm concerned about is deleting/updating entries in each bucket and the performance of retrieving data. Databases have the advantage that with decent SQL you can limit the quantity of data retrieved. Having buckets full of 'stuff' means that you will probably be returning more information than you need. With some thought maybe you can reduce this with decent hashing?
As your probably aware, s3 doesn't allow modifications to objects in buckets, so if I understand you correctly, the whole bucket would have to be initially retrieved (if it wasn't in cache) and serialized back to s3 once it had been updated.
Also you would also have to do a linear search through the bucket to get to the exact place the information that you want to modify/delete is.
Have you given any thought of the best way to store this information?
Thanks again for these amazon writeups… its really changing the way I think about databases and scalability!