Archive for the ‘Business’ Category:
Bash Coding Convention ../
We use dirname() a lot in php to make relative paths work from multiple locations like so. The advantages are many:
require dirname( dirname( __FILE__ ) ) . '/required-file.php'; $data = file_get_contents( dirname(__FILE__).'/data/info.dat');
But in bash we often dont do the same thing, we settle for the old standby “../”. Which is a shame because unless your directory structure is set up exactly right, and you have proper permissions, and you run the command from the right spot, it doesnt work as planned. I think part of the reason is that its not obvious how to reliably get a full path to the script from inside itself. Another reason is that ../ is shorter to type and easier to remember. Finally there’s always one time scripts for which this methodology is overkill. But if you’re planning to write a script which other people will (or might) be using, I think it’s good practice to do it right. Googling for things you’d think to search for on this subject does not yeild very informative results, or incomplete (incorrect) methods… so… here’s how to do the above php in bash:
source $(dirname $(dirname $(readlink -f $0)))/required-file.sh data=$(cat $(dirname $(readlink -f $0))/data/info.dat)
Hope this helps someone
As a side note, the OSX readlink binary functions differently. You’ll want to use a package manager to install gnu coreutils, and iether use greadlink, or link greadlink to a higher precedence location on your $PATH (I have /opt/local/bin:/opt/local/sbin: at the beginning of my $PATH)
Simple TCP Daemon Example
Using some stuff I’ve covered in the past on my blog here’s a simple way to put up a daytime server (well to put any service onto a tcp port. I haven’t looked into its bi-directional capabilities yet, this was just sort of a proof-of-concept…
$ apt-get install ipsvd $ wget http://blog.apokalyptik.com/files/daemonize/daemonize.c $ cc daemonize.c -o daemonize $ ./daemonize /var/run/daytime.pid /var/log/daytime.log 'tcpsvd 0 13 date'
start/stop and/or monit script are an extremely short jump from there… And kind of trivial/menial… so I leave that as an exercise to you… if you care ![]()
This deserves some link love
Andy bogged a piece of advice that I have him which I got from Barry… and if you want to know how to get the true absolute path to the real location of the current script is from inside of it (like phps realpath and __FILE__) I suggest you check it out
As Close to A Real Daemon As Bash Scripts Get
I’ve written a little something which is gaining some traction internally, and I always intended to share it with the world. So… Here. daemon-functions.sh
What it does is allow you to write a bash function called “payload” like so:
function payload() {
while [ true ]; do
checkforterm
date
sleep 1
done
}
source path/to/daemon-functions.sh
Once you’ve done that it all just happens. daemon-functions gives you logging of stderr, stdout, a pid file, start, stop, pause, resume, and more functions. when you start your daemon it detaches completely from your terminal and runs in the background. Works very simply with monit straight out of the box. you can have as many daemons as you wish in the same directory and they wont clobber each other (as the pid, control, and log files all are dynamically keyed off of the original script name.) Furthermore inside your execution loop inside of the payload function place a checkforterm call at any place which it makes sense for your script to be paused, or stopped. it can detect stale pid files and run anyway if the process isnt really running. As an added bonus you dont actually have to loop inside payload, you can put any thing in there, have a script thats not a daemon, but will take an hour, day, week, month to finish? stick it in, run it, and forget it.
Daemonize Anything
I hacked together this little C program from this other little c program. Basically acts as an execution wrapper that lets you fork() and detach and run a command in the background with a pidfile and log file for program output. So far I havent had any problems with it… but then I’m not a true C guy so any input is welcomed.
It’s good for the server. It’s good for the soul.
ack (http://petdance.com/ack/), love it (thanks nikolay)
Throttle your Threads…
Lets say you want to run some command, such as /bin/long-command on a set of directories. And you have a lot of directories. You know it’ll take forever to complete serially, so you want to cook up a way to run these commands in parallel. You know the server CAN handle more than one command at once, but you have no idea how many it can handle without keeling over, and you have thousands of commands to run. Running them all at once backgrounded will kill the system for sure. You COULD try and stagger them and let the delay in overlap be a natural throttle, but sometimes the command completes in one minute and sometimes in 10, so thats not a good idea either. So you decide it would be best to set a process concurrency limit. But what if you set that limit too low? too high? restarting in the middle would be bad… you COULD make some sort of completed log and build into your script a skip for completed files, but why? that doesnt seem so elegant. Your car is good at handling variable speed allowances… it goes fast when you say and slow when you say… maybe we can give a simple bash script a gas pedal? That just might work!
echo '5' > /tmp/threads
for i in $(fileroot/*); do while [ $(pgrep long-command) -ge $(cat /tmp/threads) ] do sleep 1 done ( /bin/long-command $i )& sleep 1 done
Now you can speed it up and throttle it back by adjusting the integer value inside /tmp/threads.
“It was the little old server from Pasadena…”
so-you-wanna-see-an-image
We’ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com… This is how. (covering a requested image already stored on S3, not the upload -> s3 process)
A request comes into pound for a file. Pound hashes the hostname (via a custom patch which we have not, but may, release) , to determine which of several backend servers the request should hit. Pound forwards the request to that server. This, of course, means that a given blog always serves from the same backend server. The only exception to the afore-mentioned rule is if that server is, for some reason, unavailable in which case it picks another server to serve that hostname from temporarily.
The request then comes into varnishd on the backend servers. The varnishd daemon checks its 300Gb worth of files cache and (for the sake of this example) finds nothing (hey, new images are uploaded all the time!) Varnishd then checks with the web server (running on the same machine, just bound to a different IP/Port#) and that request is handled by a custom script.
So, a http daemon on the same backend server runs the file request. The custom script checks the DB to gather information on the file (specifically which DC’s it is in, size, mod time, and whether its deleted or not) all this info is saved in memcached for 5 minutes. The script increments and checks the “hawtness” (term courtesy of Barry) of the file in memcached (if the file has been accessed over a certain # of times it is then deemed “hawt”, and a special header is sent with the response telling varnishd to put the file into its cache. When that happens the request would be served directly by varnishd in the previous paragraph and never hit the httpd or this script again (or at least not until the cache entry expires.)) At this point, assuming the file should exist (deleted = 0 in the files db) we fetch the file from a backend source.
Which backend source depends on where it is available. The order of preference is as follows: Always fetch from Amazon S3 if the file lives there (no matter what, the following preferences only ever occur if, for some reason, s3 = 0 in the files db), and if that fails fetch from the one files server we still have (which has larger slower disks, and is used for archiving purposes and fault tolerance only)
After fetching the file from the back end… the custom script hands the data and programatically generated headers to the http daemon, which hands the data to varnishd, varnishd hands the data to pound, pound hands the data to the requesting client, and the image appears in the web browser.
And there was much rejoicing (yay.)
For the visual people among us who like visuals and stuff… (I like visuals…) here goes…

I have to say
this is amazing: http://jan.kneschke.de/2007/10/7/wormhole-index-reads and I cant wait to try it somewhere!
Subscribe to the comments for this post

