php debugging the really really hard way

If you’re ever in a situation where something is only happening intermittently, and only on a live server, and only while it’s under load… Lets say its not generating any error_log or stderr output, and you cant run it manually to reproduce… (we’ve all been in this situation) How do you get any debugging output at all?

Step 1: add this to the top of your entry point php file

if ( $_SERVER['REMOTE_ADDR'] == '127.0.0.1' ) {
       error_log( ' :: ' . getmypid() );
       sleep( 10 );
}

Step 2: use curl on the localhost to make the request

Step 3: (this assumes your error log is /tmp/php-error-output) run the following command in a second (root) terminal window

strace -p $(tail -n 1000 /tmp/php-error-output | grep ' :: ' | tail -n 1 | sed -r s/'^.+ :: '//g) -s 10240 2>&1

Good luck…

Using wait, $!, and () for threading in bash

This is a simplistic use of the pattern that I wrote about in my last post to wait on multiple commands in bash. In essence I have a script which runs a command (like uptime or restarting a daemon) on a whole bunch of servers (think pssh). Anyways… this is how I modified the script to run the command on multiple hosts in parallel. This is a bit simplistic as it runs, say, 10 parallel ssh commands and then waits for all 10 to complete. I’m very confident that someone could easily adapt this to run at a constant concurrency level of $threads… but I didn’t need it just then so I didn’t go that far… As a side note, this is possibly the first time I’ve ever *needed* an array in a bash script… hah…

# $1 is the commandto run on the remote hosts
# $2 is used for something not important for this script
# $3 is the (optional) number of concurrent connections to use

if [ ! "$3" == "" ]
then
    threads=$3
else
    threads=1
fi

cthreads=0;
stack=()
for s in $servers
  do
    if [ $cthreads -eq $threads ]; then
        for job in ${stack[@]}; do
              wait $job
        done
        stack=()
        cthreads=0
    fi
    (
        for i in $(ssh root@$s "$1" )
            do
                echo -e "$s:\t$i"
        done
    )& stack[$cthreads]=$!
    let cthreads=$cthreads+1
done
for job in ${stack[@]}; do
    wait $job
done

bash – collecting the return value of backgrounded processes

You know that you can run something in the background in a bash script with ( command )&, but a coworker recently wanted to run multiple commands, wait for all of them to complete, collect and decide what to do based on their return values… this proved much trickier. Luckily there is an answer

#!/bin/bash

(sleep 3; exit 1)& p1=$!
(sleep 2; exit 2)& p2=$!
(sleep 1; exit 3)& p3=$!

wait "$p1"; r1=$?
wait "$p2"; r2=$?
wait "$p3"; r3=$?

echo "$p1:$r1 $p2:$r2 $p3:$r3"

Command line arguments in bash scripts

This is something that has always annoyed me about bash scripts… The fact that it’s difficult to run

/path/to/script.sh --foo=bar -v -n 10 blah -one='last arg'

So I decided to write up a bash function that let me easily (once the function was complete) access this type of information. And because I like sharing, here it is:

#!/bin/bash
function getopt() {
  var=""
  wantarg=0
  for (( i=1; i< =$#; i+=1 )); do
    lastvar=$var
    var=${!i}
    if [ "$var" = "" ]; then 
        continue 
    fi
    echo \ $var | grep -q -- '='
    if [ $? -eq 0 ]; then
      ## -*param=value
      var=$(echo \ $var | sed -r s/'^[ ]*-*'/''/)
      myvar=${var%=*}
      myval=${var#*=}
      eval "${myvar}"="'$myval'"
    else
      echo \ $var | grep -E -q -- '^[ ]*-'
      if [ $? -eq 0 ]; then
        # -*param$
        var=$(echo \ $var | sed -r s/'^[ ]*-*'/''/)
        eval "${var}"=1
        wantarg=1
      else
        echo \ $var | grep -E -- '^[ ]*-'
        if [ $? -eq 0 ]; then
          # the current one has a dash, so cannot be
          # the argument to the last parameter
          wantarg=0
        fi
        if [ $wantarg -eq 1 ]; then
          # parameter argument
          val=$var
          var=$lastvar
          eval "${var}"="'${val}'"
          wantarg=0
        else
          # parameter
          if [ "${!var}" = "" ]; then
            eval "${var}"=1
          fi
          wantarg=0
        fi
      fi
    fi
  done
}

OIFS=$IFS; IFS=$(echo -e "
"); getopt $@; IFS=$OIFS

now at this point (assuming the above command line parameter and script) I should have access to the following variables: $foo ("bar") $v (1) $n (10) $blah (1) $one ("last arg"), like so:

OIFS=$IFS; IFS=$(echo -e "
"); getopt $@; IFS=$OIFS

echo -e "
foo:\t$foo
v:\t$v
n:\t$n
blah:\t$blah
one:\t$one
"

You might be curious about this line:

OIFS=$IFS; IFS=$(echo -e "
"); getopt $@; IFS=$OIFS

IFS is the variable that tells bash how strings are separated (and mastering its use will go a long way towards enhancing your bash scripting skills.) Anyhow, by default IFS=” ” which normally is OK, but in our case we dont want “last arg” to be two seperate strings, but one. I cannot put the IFS assignment inside the function because by that point bash has already split the variable, it needs to be done at a level of the script in which $@ has not been touched yet. So I store the current IFS variable in $OIFS (Old IFS) and set IFS to a newline character. After running the function we reassign IFS to what it was beforehand. This is because I dont know what you might be doing with your IFS. There are lots of reasons you might have already assigned it to something else, and I wouldnt want to break your flow. So we do the polite thing.

And in case the above gets munged for some reason you can see the plain text version here: bash-getopt/getopt.sh

Anyways, hope this helps someone out. If not it’s still here for me when *I* need it ;)

Bash Coding Convention ../

We use dirname() a lot in php to make relative paths work from multiple locations like so. The advantages are many:

require dirname( dirname( __FILE__ ) ) . '/required-file.php';
$data = file_get_contents( dirname(__FILE__).'/data/info.dat');

But in bash we often dont do the same thing, we settle for the old standby “../”. Which is a shame because unless your directory structure is set up exactly right, and you have proper permissions, and you run the command from the right spot, it doesnt work as planned. I think part of the reason is that its not obvious how to reliably get a full path to the script from inside itself. Another reason is that ../ is shorter to type and easier to remember. Finally there’s always one time scripts for which this methodology is overkill. But if you’re planning to write a script which other people will (or might) be using, I think it’s good practice to do it right. Googling for things you’d think to search for on this subject does not yeild very informative results, or incomplete (incorrect) methods… so… here’s how to do the above php in bash:

source $(dirname $(dirname $(readlink -f $0)))/required-file.sh 
data=$(cat $(dirname $(readlink -f $0))/data/info.dat)

Hope this helps someone :)

As a side note, the OSX readlink binary functions differently. You’ll want to use a package manager to install gnu coreutils, and iether use greadlink, or link greadlink to a higher precedence location on your $PATH (I have /opt/local/bin:/opt/local/sbin: at the beginning of my $PATH)

As Close to A Real Daemon As Bash Scripts Get

I’ve written a little something which is gaining some traction internally, and I always intended to share it with the world. So… Here. daemon-functions.sh

What it does is allow you to write a bash function called “payload” like so:

function payload() {
while [ true ]; do
checkforterm
date
sleep 1
done
}
source path/to/daemon-functions.sh

Once you’ve done that it all just happens.  daemon-functions gives you logging of stderr, stdout, a pid file, start, stop, pause, resume, and more functions.  when you start your daemon it detaches completely from your terminal and runs in the background.  Works very simply with monit straight out of the box.  you can have as many daemons as you wish in the same directory and they wont clobber each other (as the pid, control, and log files all are dynamically keyed off of the original script name.)  Furthermore inside your execution loop inside of the payload function place a checkforterm call at any place which it makes sense for your script to be paused, or stopped. it can detect stale pid files and run anyway if the process isnt really running.  As an added bonus you dont actually have to loop inside payload, you can put any thing in there, have a script thats not a daemon, but will take an hour, day, week, month to finish? stick it in, run it, and forget it.

Bash Tip: Closing File Descriptors

I recently found that you can close bash file descriptors fairly easily, it goes like this:

    exec 0>&- # close stdin
    exec 1>&- # close stdout
    exec 2>&- # close stderr

Which makes it easy to daemonize things using only bash (lets face it there are times when you JUST don’t need anything more than a simple bash script, you just need it backgrounded/daemonized). Take this example of a daemon that copies any new files created in a directory to another place on the filesystem

#!/bin/bash

##
## Shell Daemon For: Backup /root/
## (poorly coded, quick and dirty, example)
##

PIDFILE="/var/run/rootmirror.pid"
LOGFILE="/log/log/rootmirror-%Y-%m-%d.log"
NOHUP="/usr/bin/nohup"
CRONOLOG="/usr/bin/cronolog"
case $1 in
start)
    exec 0>&- # close stdin
    exec 1>&- # close stdout
    exec 2>&- # close stderr
    $NOHUP $0 run | $CRONOLOG $LOGFILE >> /dev/null &
    ;;
stop)
    /bin/kill $(cat $PIDFILE)
    ;;
run)
    pgrep -f "$0 $1" > $PIDFILE
    while [ true ]; do
        event=$(inotifywait -q -e close_write --format "%f" /root/)
        ( cp -v "/root/$event" "/var/lib/rootmirror/$event" )&
    done
    ;;
*)
    echo "$0 [ start | stop ]"
    exit 0
    ;;
esac

One especially nice detail here is that this wont hang while exiting your SSH session after you start it up (a big pet peeve of mine).

Scripting without killing system load

Let us pretend for a moment that you have a critical system which can *just* handle the strain that it’s under (I’m sure all of you have workloads well under your system capabilities, or capabilities well over your workload requirements, of course; still for the sake of argument…) And you have a job to do which will induce more load. The job has to be done. The system has to remain responsive. Your classic response to this problem is adding a delay, for example:

     #!/bin/bash
     cd /foo
     find ./ -type d -daystart -ctime +1 -maxdepth 1 | head -n 500 | xargs -- rm -rv
     while [ $? -eq 0 ]; do
          sleep 60
          find ./ -type d -daystart -ctime +1 -maxdepth 1 | head -n 500 | xargs -- rm -rv
     done

Of course this is a fairly simplistic example. Still it illustrates my point. The problem with this solution is that the machine you’re working on is likely to have a variable workload where its main use comes in surges. By defining a sleep time you have to iether sleep so long that the job takes forever to finish, or skirt with high loads and slow response times. Ideally you would be able to let her rip while the load is low and throttle her back while the load is high, right? Well we can! Like so:

     #!/bin/bash
     function waitonload() {
          loadAvg=$(cat /proc/loadavg | cut -f1 -d'.')
          while [ $loadAvg -gt $1 ]; do
               sleep 1
               echo -n .
               loadAvg=$(cat /proc/loadavg | cut -f1 -d'.')
               if [ $loadAvg -le $1 ]; then echo; fi
          done
     }

     waitonload 1
     find ./ -type d -daystart -ctime +1 -maxdepth 1 | head -n 500 | xargs -- rm -rv
     while [ $? -eq 0 ]; do
          waitonload 1
          find ./ -type d -daystart -ctime +1 -maxdepth 1 | head -n 500 | xargs -- rm -rv
     done

This modification will only run the desired commands when the system load is less than 2, it will wait for that condition to continue the loop. This can be very handy for very large jobs needing to be run on loaded systems. Especially jobs which can be subdivided into small tasks!

Backgrounding Chained Commands in Bash

Sometimes it’s desirable to have a chain of commands backgrounded so that a multi-step process can be run in parallel. And often times its not desirable to have yet another script made to do a simple task that doesn’t warrant the added complexity. An example of this would be running backups in parallel. The script sniplet below would allow up to 4 simultaneous tar backups to run at once — recording the start and stop times of each individually — and then wait for all the tar processes to finish before exiting

max_tar_count=4
for i in 1 3 5 7 2 4 6 8
 do
 cur_tar_count=$(ps wauxxx | grep -v grep | grep tar | wc -l)
 if [ $cur_tar_count -ge $max_tar_count ]
  then 
  while [ $cur_tar_count -ge $max_tar_count ]
   do
   sleep 60
   cur_tar_count=$(ps wauxxx | grep -v grep | grep tar | wc -l)
  done
 fi
 ( echo date > /backups/$i.start &&
     tar /backups/$i.tar  /data/$i &&
     echo date > /backups/$i.stop )&
done
cur_tar_count=$(ps wauxxx | grep -v grep | grep tar | wc -l)
while [ $cur_tar_count -gt 0 ]
 do
 sleep 60
 cur_tar_count=$(ps wauxxx | grep -v grep | grep tar | wc -l)
done

The real magick above is highlighted in red. You DO want that last loop in there to make the script wait until all the backups are really done before exiting.