Danny de Wit wrote in with a request for collaboration on how to best use EC2 and S3 for his new Ruby On Rails CRM application. And I’m happy to oblige.
At this point I dont know much about what he’s doing, so I hope to start rough and open a dialogue with him and work through the excersice over a bit of time.
The story so far
We have a rails front end, a Dabatase backend, EC2, and S3
Well… that was a quick rundown…
Summary of what we will need to accomplish the task on S3 and EC2
First off we will need to be able to think outside the traditional boxes. But I think Danny is open to that. Second we will need to deal with the database itself. Third We have to deal with the issue of dynamic IP addresses. Fourth we have to deal with some interesting administrative glue (monitoring, alerting, responding) Fifth we have to deal with backups. And finally we have to deal with code distribution.
Now, Where do we start?
First we should start with the database. I wont lie to you, most of the challenge in regards to using these services will be centered around the database. We need to examine how it’s laid out, how its accessed, and what our expectations are when it comes to size. Specifically what we need to look for are two main things: A) bottlenecks, and B) data partitioning strategies.
Bottlenecks. We have to examine where we may or may not have trouble as far as data replication goes. Because if we are making hourly backups and we have to bring up another server at the half hour marker we’re going to have to have a strategy in place to bring the data store up to date. And the layout of the database can make this particularly prohibitive or it could make this very easy. And besides… having a bunch of servers doesnt help if they cant stay in sync.
Data partitioning. It’s easy to say “later on we’ll just distribute the data between multiple servers” but unless you’ve planned for a layout which supports this you might have a particularly difficult time doing so without makor reworking on your application. Also data partitioning can be your friend in the speed department as well. If you’re thoughtful about HOW you store your daya you can use the layout itself to your advantage. For example a good schema might actively promote load ballancing where a bad schema will cause excessive load on particular segments. A good schema will actually act as an implied index for your data, and a bad schema will require excessive sorting and indexing
So what now?
So, Danny, the ball is in your court. You have my e-mail address. You have my blog address. Lets get together and talk database before we move forward into the glue.
Hey,
Thanks for taking up the case.
Readers be ware
Technical-like information by a non-tech person. This might make you cringe.
Some more background.
First, the CRM/ERP app we're talking about is ging to be a commercial product. So all demands and requirements should be seen in that type of dynamic.
The system is designed to communicate with front-end websites through webservices. So it has both 'inside' users and 'outside' users. (There must be a term for that, but I don't know what it is.)
The app will manage the entire 360 degree process of an organization's transactions, including fulfillment.
We had to resort to developing this solution ourselves, because no other solution out there could meet our specific needs, so what is an entrepreneur to do, in those situations.
The app's interface and functionality is designed by myself and the actual development, including database design is done by our development partner.
The company itself, Expo revolves around entrepreneurship and is getting ready for launch. We have huge ambitions, but are determined to build our company by bootstrapping it from the first dollar on up. Or actually the first Euro, since we have a European background.
So, on to the case.
Database Backend
Of course, how could I forget to mention in my first comment: We're intending to use MySQL.
And that's about all I personally can say about that.
With regards to these choices we can go any which way we want still, so the question should perhaps be, how do we need to do it, in order to reach the ideal goals of scaling up and down automatically on EC2 and using S3 as our persistant storage for all data.
Backup Strategy
The hourly backup was a suggestion, not a requirement.
What I want is to have the best possbile data-safety, within reason. So my traditional thoughts are something like a RAID solution, with added backups.
Data Partitioning.
We are still free to make choices here.
I expect, and I repeat expect, a typical client to have a dataset in the size of 1Gb to 5Gb over time.
Is this enough to get you started?
Danny