Archive for the ‘MySQL’ Category:
so-you-wanna-see-an-image
We’ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com… This is how. (covering a requested image already stored on S3, not the upload -> s3 process)
A request comes into pound for a file. Pound hashes the hostname (via a custom patch which we have not, but may, release) , to determine which of several backend servers the request should hit. Pound forwards the request to that server. This, of course, means that a given blog always serves from the same backend server. The only exception to the afore-mentioned rule is if that server is, for some reason, unavailable in which case it picks another server to serve that hostname from temporarily.
The request then comes into varnishd on the backend servers. The varnishd daemon checks its 300Gb worth of files cache and (for the sake of this example) finds nothing (hey, new images are uploaded all the time!) Varnishd then checks with the web server (running on the same machine, just bound to a different IP/Port#) and that request is handled by a custom script.
So, a http daemon on the same backend server runs the file request. The custom script checks the DB to gather information on the file (specifically which DC’s it is in, size, mod time, and whether its deleted or not) all this info is saved in memcached for 5 minutes. The script increments and checks the “hawtness” (term courtesy of Barry) of the file in memcached (if the file has been accessed over a certain # of times it is then deemed “hawt”, and a special header is sent with the response telling varnishd to put the file into its cache. When that happens the request would be served directly by varnishd in the previous paragraph and never hit the httpd or this script again (or at least not until the cache entry expires.)) At this point, assuming the file should exist (deleted = 0 in the files db) we fetch the file from a backend source.
Which backend source depends on where it is available. The order of preference is as follows: Always fetch from Amazon S3 if the file lives there (no matter what, the following preferences only ever occur if, for some reason, s3 = 0 in the files db), and if that fails fetch from the one files server we still have (which has larger slower disks, and is used for archiving purposes and fault tolerance only)
After fetching the file from the back end… the custom script hands the data and programatically generated headers to the http daemon, which hands the data to varnishd, varnishd hands the data to pound, pound hands the data to the requesting client, and the image appears in the web browser.
And there was much rejoicing (yay.)
For the visual people among us who like visuals and stuff… (I like visuals…) here goes…

I have to say
this is amazing: http://jan.kneschke.de/2007/10/7/wormhole-index-reads and I cant wait to try it somewhere!
PHP CLI Status Indicator
Most times when people write command line scripts they just let the output flow down the screen as a status indicator, or just figure “it’s done when it’s done” But sometimes it would be nice to have a simple clean status indicator, allowing you to monitor progress and gauge time-to-completion. This is actually very easy to accomplish. Simply use \r instead of \r\n in your output. Obviously the example below is very simplified, and this can be applied in a much more sophisticated fashion. But it works.
$row_count = get_total_rows_for_processing();
$limit=10000;
echo “\r\n[ 0%]“;
for ( $i=0; $i < = $row_count; $i = $i + $limit ) {
$query=”SELECT * FROM table LIMIT {$limit} OFFSET {$i}”;
// do whatever
$pct = round((($i+$offset)/$row_count)*100);
if ( $pct < 10 ) {
echo “\r[ $pct%]“;
} else {
if ( $pct < 100 ) {
echo “\r[ $pct%]“;
} else {
echo “\r[$pct%]“;
}
}
}
echo “\r[100%]\r\n”;
Any… good… php devs out there looking for some side work?
I know a group of guys looking to do some cool stuff who could use a few good contractors. Drop me an e-mail with maybe a sample or something cool you did in php and I’ll pass it on.
apokalyptik apokalyptik com — Subject: “PHP Consulting” (I’ll likely completely overlook your mail if you use some subject not starting with that string)
Cheers
DK
tags, items, users - loose the joins - gain the freedom.
Long time readers of this blog will assert that I have no problem presenting an unpopular opinion, and/or sticking my foot in my mouth. Some times both at once! (”But wait… there’s more!”) So when N. Shah asks me how he should split his database (a tags table, items table, and users table) I say: The answer is in the question.
You have only one database
Lets drop the pretense folks. Lets come back to the real world. This is the web 2.0 world. Data is growing at a seriously exponential. And desperate times call for desperate measures.
Joins are nice. They’re pretty. They’re convenient. They keep us from having to think very much. But they do NOT promote using commodity hardware for your databases. They just don’t. No, really, an in-database join chains you to an in-database solution. You *could* keep upgrading and upgrading… faster processors… larger disks… faster raid… And then you move to buying SAN’s and you’re talking about some serious cash for that cache. Or… You think about things differently. You put in a little work up front. And you break the mold. Because one database ties you to one server. And that, my friends, is the problem.
So, N, here’s my answer: Split your database once, and then your databases once.
DB:/users
DB:/items
DB:/tags
becomes
DBTags:/tags
DBUsers:/users
DBItems:/items
And then
DBUsers:/users
Pretty simple… users tend to be a small table, and keeping them in one place makes a lot of sense here. HOWEVER. depending on your architecture and uses you could easily split the users as we do the tags (not items) below.
DBItems:/
- items_id_ending_in_0
- items_id_ending_in_1
- items_id_ending_in_2
- items_id_ending_in_3
- items_id_ending_in_4
- items_id_ending_in_5
- items_id_ending_in_6
- items_id_ending_in_7
- items_id_ending_in_8
- items_id_ending_in_9
again, pretty simple. you have your run of the mill integer item id’s split them by the last digit of your item id, and you can reduce the footprint of any one table to 1/10th of the whole dataset size
DBTags:/
- tags_crc_ending_in_0
- tags_crc_ending_in_1
- tags_crc_ending_in_2
- tags_crc_ending_in_3
- tags_crc_ending_in_4
- tags_crc_ending_in_5
- tags_crc_ending_in_6
- tags_crc_ending_in_7
- tags_crc_ending_in_8
- tags_crc_ending_in_9
Now here is a little bit of voodoo. You have these tags, and tags are words. And I like numbers. Numbers make life easy. So by creating a CRC32 hash of the word, and storing it with the tag {id|tag|crc332} you can quickly reverse the tag to an id, and then go find items with that tag id associated, while still retaining the ability to split the db by powers of 10.
You can still use your join tables items_to_users, and tags_t_items, these tables consisting of ints take up almost _NO_ space whatsoever, and so can go where convenient (if you query items for users more than users for items, then put the join table in the users db) but you cant actually preform in-server full joins any longer. Heck you can even keep two copies of the join data, items_to_tags in the items dbs, and tags_to_items in the items dbs.
So, like many things in life, going cheaper meant going a bit harder. But what did we gain? Well lets assume 10 ec2 instances…
Ec2a
- users (w)
- items 0-1 (w)
- tags 0-1 (w)
Ec2b
- items 2-3 (w)
- tags 2-3 (w)
Ec2c
- items 4-5 (w)
- tags 4-5 (w)
Ec2d
- items 6-7 (w)
- tags 6-7 (w)
Ec2e
- items 8-9 (w)
- tags 8-9 (w)
Ec2f
- items 0-1 (r)
- tags 0-1 (r)
Ec2g
- users (r)
- items 2-3 (r)
- tags 2-3 (r)
Ec2h
- items 4-5 (r)
- tags 4-5 (r)
Ec2i
- items 6-7 (r)
- tags 6-7 (r)
Ec2j
- items 8-9 (r)
- tags 8-9 (r)
So thats a total of about… oh… 1.6 terrabytes of space… 18gb of RAM, 17Ghz of processor speed, and an inherently load balanced set of database instances. And when you need to grow? split by the last 2 (16TB) digits, 3(160Tb) digits, 4(1,600TB) digits…
So, now that you’ve read to the bottom. It’s 1:00am, way past my bed time. Remember that when designing a database you — above all — need to listen to your data. Nobody will come up with a solution that perfectly fits your problem (thats why its called “your problem”) but techniques can be applied, and outlooks can be leveraged.
Disclaimer: some or all of this might be wrong, there may be better ways, dont blame me. I’m sleep-typing ![]()
Openfound (cont)
If there’s one thing that the OpenFount guys have shown me is that they’re serious about the Infinidisk product. Mr. Donahue gave me a quick call this evening (seems my e-mail server and his e-mail server aren’t talking properly, so while I get his communications) he has not received mine (probably explaining the lack of response to my pre-sales inquiry)) to chat about his product. The particular bug that I noticed, he mentioned, was fixed a while ago in a later release than I’d tried. In my defense the page looks precisely like it did when I first got the product, and the release tar file has no version numbers on it… yet… so I did check for updates. I found out some good info, though. They’re working on putting up a trac page for some real bidirectional community participation soon. They’ll also be putting version numbers in their archives soon. Both of those things will help, I think, improve their visibility to people like me (who have very very little time.)
I’ll be re-testing the infinidisk product again, later, when next I customize an AMI.
Kudos to the openfount guys
I’m really very impressed with the speed at which the Openfount guys responded to my last post. I definitely give Kudos to Bill for being on top of things! I’m running out the door so I’ll keep this short and sweet.
He’s right, I did generalize databases into InnoDB, but thats because it’s what I use. So my apologies for that.
I definitely had no intention of badmouthing the Openfount guys (if thats what it sounded like I did, I apologize) Just reporting what I saw, and my impressions.
The Bill - I would have either used
- apokalyptik
- at
- apokalyptik
- dot
- com
or
- consult
- at
- apokalyptik
- dot
- com
or
- demitrious
- dot
- kelly
- at
- gmail
- dot
- com
Infinidisk Update
I mentioned a while back that I was going to be playing with the S3 Infinidisk Product. What I found in my testing was that this product is not prime time ready. There was a nasty bug which caused data to be lost if the mv command was used. The scripts themselves were unintuitive. They required fancy-pants nohupping or screening to use long term. Oh, and a database definitely will not work on top of this FS. It seems obvious in retrospect, but I wanted to be sure. InnoDB wont even build its initial files much less operated on the FS. To top it all off, My pre-sales support question was never even so much as acknowledged.
No, I think I’ll be leaving this product alone for now and sticking with clever uses of s3sync and s3cmd, thanks.
Google & Microsoft Working Towards the Perfect Datacenter
We all new that this would happen, google and microsoft going vying to build the biggest field of silicon trees. But what does this mean, and does it tie in with amazons latest service?! I think that undoubtedly it does.
There’s talk about a last man standing game when it comes to internet bandwidth. And I can imagine a time when we might see the internet behaving like the freeways in L.A. at rush hour. But this is more, I think.
I’ve mentioned before that the whole goal here is to “be the internet”. I don’t think that goal has changed recently. Google has sown the world two things: First that there’s a vast amount of power to be wielded by being “the internet” to the average Tom Dick and Harry, and Second that the title is *always* up for grabs. A while back Yahoo! was the internet, before that AOL was the internet, before that newsgroups were the internet. Need I say more? And each of those companies wielded an extreme sway over the comings and goings of the internet.
But now the internet means a lot more than it used to. Now the internet is sales, it’s revenue, it’s marketing, people are watching, people are reading, people are listening, and– most importantly — people are being influenced by this “new fangled internet thing”, “oh, you mean Google?”
So there’s now a lot more riding on who gets to “be the internet” these days. The one thing that ginormous corporate entities can’t seem to get a hold of is the fickle way in which the internet is backwards from real world businesses. In the real world it’s all too common for a newcomer to storm into a market, take hold of it with genuinely better product, and then let all that slip away into mediocrity and poor quality. And the kicker is that people will *still* pay for it if it’s crap… as long as its tangible. But the internet is fickle. It’s sort of tangible but more or less ethereal.
I think for the first time people outside the scientific communities are getting wind of a crazy idea: insubstantial value. That is something that didn’t have value a minute ago, wont have value a minute from now, but at the moment is extremely valuable. Which, inherently, means that this thing has the constant need to justify itself. I’m no economics guy, and I’m certainly not in touch with the “average Joe” (who would almost certainly not follow me through more than two or three blog posts) but I think the difference here is that there’s no physical reality to intimidate us.
We don’t have to grow particularly attached to anything on the internet because it’s not “in our lives” we’re in its life. It doesn’t take up space in our house, we take up space in its house. For once in our lives we find that we aren’t the ones who are at the mercy of demand, but are - in fact - in demand. It’s a feeling of empowerment that is slowly but surely changing the world. Mark my words children n classrooms 100 years from now will be studying the historical impact of all of the events which are happening before our eyes at this very moment, in this place that’s not a place.
I think I’ve become side tracked. Oh yes, consumers being in demand, corporations unable to handle the discrepancies of the actions of the same people online and off line, and… Ahh yes… The underdog.
Why, do you think, it is that in this virtual world so often it’s the couple of guys who met in college coding outside a cafe, or this dude in his moms basement, or a couple of people who tried to do one thing but failed fantastically into doing something else completely right? Because people of talent are, all of a sudden, relinquished of the necessity to offer anything physical… People with a talent for the ethereal, all of a sudden, have a place in which the ethereal acquires value.
And, as in any underdog story, these small (sometimes rare) meteoric rises to the top will carry others with them. And these are the kind of people who remember the hands that helped them up.
So, sure, bandwidth and all that. But the people who make it easiest for those suited to developing the intangible will have everything to gain in the long run. Amazon sees this, and is doing an amazing job with it. Their recent successes with S3, SQS, and EC2, are testimony to their understanding of this new ecosystem. But they ought not to think that Google and Microsoft haven’t noticed this and where the young blood is heading.
Make no mistake, amazon has made extremely agile, grassroots, moves to “be the internet” from the bottom up… But there will soon be a clash of services as G and M do the same from the “top down” and “sideways in” respectively.
I will say this: The first company to crack the database problem will have a distinct advantage in the struggles to come.
Disclaimer: Everything I just said is more than likely to be complete nonsense as I just kind of rambled it out “stream of consciousness” style .
The (theoretical) web services database
I’ve been kind of floating around this topic for a while… Well databases in general… And I see a lot of people who have rather high standards (which is not a bad thing.) I imagine the complication of offering a service like this comes from the fact that database people have very stringent standards.
Things like ACID transactions, Foreign keys, Table/Row/Column/Field read/write locking, always come up in these types of conversations. I suppose that this is so because it’s been the standard for so long… It’s just how people *think* about databases… Which means that its what databases should be, right? Right?
Well not long ago the people at Amazon rethought process communication, and rethought storage, and then rethought servers. Perhaps its about time they rethought the database as well. I have a hunch (as others have noted here before) that they already are!
I really think that a lot, and I mean a LOT, could be done with a very simple model.
- Tables are their own island (no foreign keys)
- simple auto Incrementing PK’s
- every column indexed
- only simple operators supported ( =, >, < , !=, is null, is not null )
Heresy! Ack! Foo! Bar! NO! THATS NOT A REAL DATABASE. Well, no, not as you mean by “real database” but it certainly is a database. And I expect it would be good enough for 85% of peoples wants, needs, and desires.
We’ve learned that delays in storage give us permanence. We’ve learned that the pipeline is a good (and global) thing, and we’ve learned that impermanence gives us expandability. Necessity being the mother of invention I expect that something like this will be out soon, and I expect that people will learn to be perfectly happy with it. It’s all about flexibility and agility here people!
It’ll come, people will complain, it’ll work, and as time goes on, I think it’ll get better and better.
Subscribe to the comments for this post