So now we know that we need to look at things in terms of pools or “stacks” of resources… But how? Thats a good question! And from here on out… It gets technical (even though we’re going to be talking in high level generalities)
Now lets take on step back, and examine the tools that we have at our disposal
- Amazon EC2 – scalable processing power, temporary storage
- Amazon S3 – Scalable permanent storage, no processing power
- Amazon SQS – Scalable queueing of “things”
First we need a control structure. We need a way to programatically interface with this potential pool of machines. And we need a way to, with as little hands on work as possible, be able to bring up and down the services that we need. For our mechanism of communication we will use Amazons Simple Queueing Services. According to Amazon: “Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. By using Amazon SQS, developers can simply move data between distributed application components performing different tasks, without losing messages or requiring each component to be always available”
We’ll start large-scale and work our way down to the fine grained detail. Our global Queue structure. We’ll have one Global queue structure which will be used by our orchestrator to communicate with machines who have no assigned role yet, and then sub queues which relate to a specific server.
[ Global Queue --------------------------------------------------- ]
[ Web1 Q ] [ Web2 Q ] [ Db1 Q ] [ Db2 Q ] [ Sp1 Q] [ Sp2 Q ] [ O Q ]
The sub-queues will be used for monitoring as well as giving orders by the orchestrator, and the [O]rchestrator queue will be for subordinate servers to send messages back to the orchestrator
Oh, yea, him! At this point we have one machine known as the orchestrator. And it will be acting as the brain for the operation. It will be the one that actually manages the servers — Ideally they will require no intervention.
This orchestrator server will be configured to maintain a baseline number of each type of server at all times. It will monitor the vitals of the servers under its command, most important of which will be server load, and server responsiveness. If the average load of a given set of servers goes above a pre-configured threshold it will throw a signal down into the global queue asking for a new server of that type. If the average load drops below a given threshold it will send a message down the specific queue asking that a server be decommissioned. Also if a server is unresponsive it will call for a new server to be commissioned, and it will decommission the unresponsive server.
The last thing that the orchestrator will be responsible for is keeping a set number of spare servers idling on the global queue. The reason for this is responsiveness.
For example: If it takes 10 minutes for an instance to start up. And it’s being started because the web servers load is so high that you’re seeing unacceptably slow response times. Thats 10 extra minutes that you have to wait. But if you have a server Idling in “give me some orders” mode. The switch happens in just a minute.
So your orchestrator calls for a new web server. First it creates a new local queue for Web3. It then sends a request down the global queue pipe that a machine should assume a web server role and attach to the Web3 queue. This is the command to commission a new web server
Decommissioning the web server is basically the same, only in reverse. The orchestrator sends a message down the Web3 pipe asking for the the server to halt. The server responds with an OK once it’s finished its processing and is getting ready to actually shut down.
The rest of the magic happens inside the instance!
Since a server is kept at idle by the orchestrator, it’s basically sitting there monitoring the global queue for commission commands. Once per X number of minutes (or seconds) it checks the queue for such a command. And when it receives one is when the magic actually happens. In this case it’s a request for a web server on the Web3 queue. So the queue monitor runs a script designed to configure the machine to be a web server. The script grabs a copy of the proper document root and web server configuration and any necessary package from an external source (possibly an S3 storage container, possibly a subversion or cvs server, possibly just rsync’ing from a known source.) Once the local code and configuration has been updated, all the services required for running as a web server are started. Perhaps this is a) memcached, b) apache. Perhaps this is a) Tomcat, b) apache. Maybe its just starting apache. But thats not important. What’s important is that the server just changed itself, dynamically, into whatever was needed. It will then send a message up the Web3 queue that it is ready for action.
The orchestrator gets the message from Web3, perhaps registers it with a load balancer, or in DNS somewhere, and then goes about its business
On an order to decommission Web3 waits for connections to cease to apache (The orchestrator removed it from load balancing before the decommission request was sent). Turns off apache, turns off the supporting services. And then sends its log files out to some host somewhere (assuming you arent doing something more intelligent with standardized logging.) Web3 puts a message in the global queue that it’s halting. And it halts.
The orchestrator gets the message. Decommissions the Web3 Queue. And doesn’t think about it again until the next time that the web server load increases.
There is a *LOT* of detail that goes into setting a process like this up. And those details will change from server role to server role, and from organization to organization. But the general principle… Ought… To work
Ought? Well I haven’t tried it… … Yet