Logging post data

Lets say you have a relatively complex php web application, like wordpress. You have it running under apache (which is common.) You have good control of your site via .htaccess (which is common — permalinks and all.) And something happens to your blog (e.g. someone is exploiting some unknown vulnerability to compromise your content), which you want to track down. So you want to log, for instance, HTTP POST data. Your first instinct might be to add some logging code to index.php (mine was) But there are a lot of possible places which might be directly accessed, especially in the admin. So The trick I use (and this is probably the only time I’ve ever condoned this) is to use PHPs auto_prepend_file functionality.

Make a /home/user/postlog/ directory, then a /home/user/postlog/logs/ directory (and chmod a+rw that one.) Next make a simple /home/user/postlog/postlog.php file with the following contents:

<?php
if ( isset($_POST) && is_array($_POST) && count($_POST) > 0 ) {
  $log_dir = dirname( __FILE__ ) . '/logs/';
  $log_name = "posts-" . $_SERVER['REMOTE_ADDR'] . “-” . date(”Y-m-d-H”) . “.log”;
  $log_entry = gmdate(’r') . “\t” . $_SERVER['REQUEST_URI'] . “\r\n” . serialize($_POST) . “\r\n\r\n”;
  $fp=fopen( $log_dir . $log_name, ‘a’ );
  fputs($fp, $log_entry);
  fclose($fp); }
?>

Finally add this line to the top of your .htaccess file:

php_value auto_prepend_file /home/user/postlog/postlog.php

If all went well this should start logging any request to any php file with any post data into the /home/user/postlog/logs/ direcory (with a unique log per ip per day)


Posted on : Mar 25 2008
Posted under blogging, php, web stuff |

so-you-wanna-see-an-image

We’ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com… This is how. (covering a requested image already stored on S3, not the upload -> s3 process)

A request comes into pound for a file. Pound hashes the hostname (via a custom patch which we have not, but may, release) , to determine which of several backend servers the request should hit. Pound forwards the request to that server. This, of course, means that a given blog always serves from the same backend server. The only exception to the afore-mentioned rule is if that server is, for some reason, unavailable in which case it picks another server to serve that hostname from temporarily.

The request then comes into varnishd on the backend servers. The varnishd daemon checks its 300Gb worth of files cache and (for the sake of this example) finds nothing (hey, new images are uploaded all the time!) Varnishd then checks with the web server (running on the same machine, just bound to a different IP/Port#) and that request is handled by a custom script.

So, a http daemon on the same backend server runs the file request. The custom script checks the DB to gather information on the file (specifically which DC’s it is in, size, mod time, and whether its deleted or not) all this info is saved in memcached for 5 minutes. The script increments and checks the “hawtness” (term courtesy of Barry) of the file in memcached (if the file has been accessed over a certain # of times it is then deemed “hawt”, and a special header is sent with the response telling varnishd to put the file into its cache. When that happens the request would be served directly by varnishd in the previous paragraph and never hit the httpd or this script again (or at least not until the cache entry expires.)) At this point, assuming the file should exist (deleted = 0 in the files db) we fetch the file from a backend source.

Which backend source depends on where it is available. The order of preference is as follows: Always fetch from Amazon S3 if the file lives there (no matter what, the following preferences only ever occur if, for some reason, s3 = 0 in the files db), and if that fails fetch from the one files server we still have (which has larger slower disks, and is used for archiving purposes and fault tolerance only)

After fetching the file from the back end… the custom script hands the data and programatically generated headers to the http daemon, which hands the data to varnishd, varnishd hands the data to pound, pound hands the data to the requesting client, and the image appears in the web browser.

And there was much rejoicing (yay.)

For the visual people among us who like visuals and stuff… (I like visuals…) here goes…


Posted on : Oct 10 2007
Posted under Amazon AWS, Business, MySQL, Software Development, blogging, linux, php, web stuff |

I have to say

this is amazing: http://jan.kneschke.de/2007/10/7/wormhole-index-reads and I cant wait to try it somewhere!


Posted on : Oct 06 2007
Posted under Business, MySQL, blogging, cli, linux |

Autumn Leaves Leaf #2: Feeder

This handy little bot keeps track of RSS feeds, and announces in the channel when one is updated. (note: be sure to edit the path to the datafiles) Each poller runs inside its own ruby thread, and can be run on its own independent schedule

require 'thread'
require 'rss/1.0'
require 'rss/2.0'
require 'open-uri'
require 'fileutils'
require 'digest/md5'

class Feeder < AutumnLeaf

def watch_feed(url, title, sleepfor=300)
  message "Watching (#{title}) [#{url}] every #{sleepfor} seconds”
  feedid = Digest::MD5.hexdigest(title)
  Thread.new {
    while true
      begin
        content = “”
        open(url) { |s|
          content = s.read
        }
        rss = RSS::Parser.parse(content, false)
        rss.items.each { |entry|
          digest = Digest::MD5.hexdigest(entry.title)
          if !File.exist?(”/tmp/.rss.#{feedid}.#{digest}”)
            FileUtils.touch(”/tmp/.rss.#{feedid}.#{digest}”)
            message “#{entry.title} (#{title}) #{entry.link}”
          end
          sleep(2)
        }
      rescue
        sleep(2)
      end
      sleep(sleepfor)
    end
  }
  sleep(1)
end

def did_start_up
  watch_feed(”http://planet.wordpress.org/rss20.xml”, “planet”, 600)
  watch_feed(”http://wordpress.com/feed/”, “wpcom”, 300)
end

end

Posted on : Oct 02 2007
Posted under Business, Random Thoughts, Software Development, blogging, cli, linux, ruby on or off rails |

Autumn Leaves Leaf #1: Announcer

This bot is perfect for anything where you need to easily build IRC channel notifications into an existing process. It’s simple, clean, and agnostic. Quite simply you connect to a TCP port, give it one line, the port closes, the line given shows up in the channel. eg: echo ‘hello’ | nc -q 1 bothost 22122

require 'socket'
require 'thread'

class Announcer < AutumnLeaf

        def handle_incoming(sock)
                Thread.new {
                line = sock.gets
                message line
                sock.close
                }
        end

        def did_start_up
                Thread.new {
                        listener = TCPServer.new('',22122)
                        while (new_socket = listener.accept)
                                handle_incoming(new_socket)
                        end
                }
        end

end

wordpress.com squeezed out another baby last night

Congrats to http://claudiacanals.wordpress.com/ which happens to be our one millionth hosted blog.  This happened around 11:38pm PDT, and weighed 7lbs 6oz


Posted on : May 24 2007
Posted under Business, In The News, blogging, web stuff |

DivShare, Day 1 (raw commentary)

I began looking at divshare a few days ago as a way to stor, save, and share my personal photo collection.  The idea of auto-galleries, unlimited space, flash video, and possible FTP access was… enticing.  But it’s tough to tell how something like this is going to work on a large scale…

So… after messing around with a free divshare account for a while I decided it was more worth my while to pay 10 bucks for a pro account and get FTP access than to try and use mechanize (or something similar) to hack out my own makeshift API.  Now I have about… Oh… 8,000 files I want to upload… So… doing that 10 at a time was just _NOT_ going to happen…

After paying for a pro account I was *immediately* granted FTP access, no waiting. And for that I was grateful.  Since I take photos at 6MP, and thats WAY too large for most online uses I have a shell script which automagically creates 5%, 10%, and 25% or original sized thumbnails.  This meant that I had an expansive set of files I could upload and only take a couple of hours doing it (5% thumbs end up being less than 200Mb.)  This, I thought, would be an excellent test of their interfaces.

So an-ftping-i-a-go.  Upload all my files into a sub directory (005). Visit the dash. nothing. Visit the ftp-upload-page to recheck… maybe I did something wrong. AND WHAM! an 8,000 check box form to accept ftp uploaded files… ugh.  Thankfully they’re all checked by default.  I let it load (for a long while) and hit submit… and wait… and wait… and wait.  Then the server side connection times out a while later.  Fair enough. Check my dash… about 1500 of the 8,000 photos were imported… I’m going to have to do this 6 times. Annoying, but doable.  Hit the second submit, and pop open another browser to look at my dash.  And divshare did *nothing* with my folder name… that wasnt translated to a “virtual” folder at all. tsk tsk.

So I need to put about 1500 photo, manually, into an 005 folder… and then I realize… I have to do this 20 files at a time… with no way to just show files that are not currently in a folder.

… uh no …

Ok, so I open up one of the photos that I DID put into the 005 folder, and it did, in fact, make them into a “gallery” of sorts. It made a thumbnail , and displayed all 3,000 photos side by side in something similar to an iframe… no rows. just one row… 3,000 columns… and waiting as my browser requests each… and every… thumb… from divshare. Wonderful.  The gallery controls are simple enough an iframe with a scrollbar at the bottom, a next photo link, and a previous photo link.  And all 3 controls make you loose your place in the iframe when you use them…

Now dont get me wrong. You get what you pay for. But hey… I did pay this time ;)  The service is excellent for what it does. And my use case was a bit extreme. Still I hope that they address these issues that I’ve pointed out.  I’d really like to continue using them, and if they can make my pohoto process easier I’ll gladly keep paying them $10/mo

Thats

  1. Don’t ignore what pro users are telling you when they upload
  2. Process large-accepts in the background, let me know I need to come back later
  3. Negative searching (folder == nil)
  4. Mass file controls (Iether items/page, or all-items-in-view (folder == nil))
  5. Give me a gallery a non-broadband user can use (1500 thumbs in one sitting tastes bad, more filling)
  6. Don’t undo what I’ve done in the gallery every click.  Finding your place among 8,000 photos is tedious to do once

And I know I sound like I’m just complaining. And I am. But this is web 2.0 feedback baby. Ignore my grouchiness, and (If I’m lucky) take my suggestions and run with them asap.  The photo/files market is very very far from cornered!


Whoa, talk about neglecting your weblog! Bad Form!

I know, I know, I’ve been silent for quite some time. Well Let me assure you that I’m quite all right! Are you less worried about me now? Oh good. (Yes I’m a cynical bastage sometimes.)

So life has, as it tends to do, come at me pretty fast. I’ve left my previous employer, Ookles, and I wish them all the best in accomplishing everything that they’ve been working towards. So I’ve Joined up with the very smart, very cool guys at Automattic. I have to tell you I’m excited to be working with these guys, they’re truly a great group.

I guess that means I’m… kind of… like… obligated to keep up on my blog now, eh?  I’m also kind of, like, ehausted. Jumping feet first into large projects has a tendency to do that to a guy though.  And truth be told I would have it any other way…

:D

Cheers

DK


Posted on : Mar 12 2007
Posted under Business, Personal, Random Thoughts, Software Development, blogging, excuses, php, web stuff |

Openfound (cont)

If there’s one thing that the OpenFount guys have shown me is that they’re serious about the Infinidisk product.  Mr. Donahue gave me a quick call this evening (seems my e-mail server and his e-mail server aren’t talking properly, so while I get his communications) he has not received mine (probably explaining the lack of response to my pre-sales inquiry)) to chat about his product. The particular bug that I noticed, he mentioned, was fixed a while ago in a later release than I’d tried.  In my defense the page looks precisely like it did when I first got the product, and the release tar file has no version numbers on it… yet… so I did check for updates.  I found out some good info, though.  They’re working on  putting up a trac page for some real bidirectional community participation soon.  They’ll also be putting version numbers in their archives soon. Both of those things will help, I think, improve their visibility to people like me (who have very very little time.)
I’ll be re-testing the infinidisk product again, later, when next I customize an AMI.


Programmers don’t like to code?

Rentzsch.com says programmers don’t like to code… Yea… I’ll buy that.  I’ve described “what I do” to many many people as “solving tough problems.”  I’d much rather get into the problem itself than write all the scaffolding and crap that lets me get face to face with it.   So yea. Sign me up.

Cheers


Posted on : Feb 04 2007
Posted under Personal, Random Thoughts, Software Development, blogging, excuses, php, web stuff |