Welcome to WePay’s engineering blog. Over on the main WePay Blog there are a lot of fun posts about organizing ski trips, deadbeat roommates and other such gems. Here we are not going to talk about any of that. This will be dedicated to (hopefully) interesting solutions to technical problems. The first thing we are going to attack is deploying code, and specifically PHP code, although the tool we wrote could be used to deploy any type of code.
Deploying web apps doesn’t tend to be very difficult, at least not on the surface. You copy your new files to your document root and off you go. However, there are actually a lot of things potentially wrong with that approach.
- Many ways of copying files are not atomic. That means that you risk a user seeing a partially copied file, or even worse, the partial file may be cached in one of the many caches between your code and the user.
- File dependencies. Even if you solve #1 and use something like rsync, which creates files atomically, you will typically have interdependencies between files that will break your Web application if the new version of a file ends up including an older version.
- Opcode caches, in-memory user caches and disk caches may need to be cleared or you end up with weird artifacts from the previous release.
- If you swap out the underlying files in the middle of a user request weird things can happen.
- Restarting the web server is disruptive. In order to be able to deploy frequently, there is a need to be able to safely deploy new code without any downtime.
- If something goes wrong, you don’t want to leave your system in a broken state, so some sort of rollback/cleanup is necessary.
- And finally, if you have a team of developers and multiple servers, having a log and notfication of deploys tends to be a good idea.
A common solution, employed by tools such as Capistrano is to copy the new files to a directory and then have the document root be a symbolic link that is simply moved from the old release to the new release when the copy has completed. This nicely solves issues #1 and #2.
In order to solve issues #3 and #4 you have to go a bit beyond a simple symlink swap. You have to understand what sort of caching is going on. For a typical PHP application you are going to have 2 main caches you need to worry about and they sort of work together. There is the opcode cache and the realpath+stat cache. The opcode cache is a shared memory cache and there is just one of these for all your PHP processes. The realpath+stat cache is per-process. When using an opcode cache like APC the realpath is used to determine the filesytem device+inode for a file and this is used as the cache index.
If you ponder the previous paragraph a bit, you should be able to see a potential problem. If we cache realpath lookups and we then change the docroot symlink, our system is not going to see that symlink change. Even if we have apc.stat enabled, which checks to see if a file has changed, this check is going to continue to go to the previous version which hasn’t changed, so we will continue to serve up the old version. While this may seem like a problem, consider issue #4 above. By continuing to serve up the old version, we have not messed up any outstanding requests and we are in control over when requests will see the new release.
What we need is a deploy mechanism that is smart enough to do the right thing. This could be Capistrano with some scripts. This is what we used at WePay for quite a while and it works ok, but most of our code is in PHP and it made sense to have our deploy system better integrated so I stole the parts of Capistrano I liked, dumped the rest and wrote a PHP-based deploy tool that we now use. I called it Ploy and the Web interface came to be known as WePloy. Dumb name, I know, but as they say, naming things, cache invalidation and off-by-one errors are the two hardest things in computer science.
An actual deploy looks like this. I blurred out (badly) the verbose output of the deploy, it gives the general idea of what people see when they do a deploy at WePay:
The code is composed of 3 main parts. A configuration file in .ini format. A main PHP script that is directly callable from the command line primarily for scheduled cron deploys and a Web interface.
The configuration file should be mostly self-explanatory and looks like this:
; Ploy Configuration File
; Global configuration section that can be overridden inside each
; target section if necessary with #{var} replacement.
deploy_user = deploy
public_key_file = "~/.ssh/deploy_id_rsa.pub"
private_key_file = "~/.ssh/deploy_id_rsa"
scm = svn
scm.user = deploy
scm.passwd = SuperSecretPassword
repository = https://svn.example.com/wepay#{revision}
deploy_to = /var/www/#{application}
[target stage]
application = stage.wepay.com
revision = /branches/release/2010_12_1
hosts[] = 10.2.1.20
hosts[] = 10.2.1.21
[target dev]
application = dev.wepay.com
revision = /trunk
hosts[] = 10.2.1.22
[target rel]
application = wepay.com
revision = /branches/release/2010_12_1
hosts[] = 11.22.33.44
hosts[] = 11.22.33.45
hosts[] = 11.22.33.46
hosts[] = 11.22.33.47
Note the use of public key pairs here. We set up a deploy user and put the public key on each deploy target machine. The command-line version of the tool supports password-protected keys, but for unattended cron job deploys and web deploys you are better off without a password on your deploy key. Individual users can deploy from the command-line from their own accounts if they have the deploy private key as well.
The main script uses pecl/ssh2 to scp/sftp files to the remote hosts and to execute remote commands. The deploy() method in the Ploy class does most of the work. You can read the code for the details, but in broad strokes it checks out the given branch from source control, cleans it up, creates a tarball and sends that to each host and verifies a checksum to make sure it was not corrupted somehow. It then creates the new release directory and untars the files into it.
At this point, in order to address issues #3, #4 and #5 we have to do some gymnastics. Remember the quip about cache invalidation being hard? Well, it is. Caches are great for speeding things up and we should all cache heavily. But it does make things like code deploys a bit trickier. Here is what we have:

This shows the serving stack on each web server behind the load balancer. nginx dispatches FastCGI requests to our PHP-FPM (FastCGI Process Manager) processes. Each of these processes have a local cache where realpaths and stats are cached. Then there is the shared APC cache which caches opcode arrays (the compiled PHP scripts) and user cache entries which is application-level data. When we put our new revision of the code in place, say revision 28 replacing revision 27 in this case, and move the docroot symlink to point to revision 28 we still have the caches pointing to revision 27. So we need to clear the shared cache, which isn’t very hard, and we also need to clear the per-process caches which is a little bit harder.
The way I chose to do this is to store the current revision both the deployed script files and in shared memory. There is a little deploy script that is run on the server at deploy time that substitutes the current revision into a common file included on all requests. I also run curl to hit a script that sets the revision in shared memory. Then at the start of the next request in each php-fpm process it will see that the hardcoded revision (remember we are still serving up version 27 files so we have the version 27 hardcoded) is different from the one in shared memory and it will flush its realpath+stat cache and start serving revision 28 files on this request instead.
The code to do this is simple enough. At the top of your front-controller, if you have one, put something like this:
// Set on deploy by deploy script
define('DEPLOY_VERSION','nOtSeT');
if(isset($_SERVER['SERVER_NAME'])) {
$key = $_SERVER['SERVER_NAME'] . '_deploy_version';
if (($rev=apc_fetch($key)) != DEPLOY_VERSION) {
apc_clear_cache();
apc_clear_cache('user');
if($rev < DEPLOY_VERSION)
apc_store($key, DEPLOY_VERSION);
}
}
$key = 'php.pid_'.getmypid();
if (($rev=apc_fetch($key)) != DEPLOY_VERSION) {
if($rev < DEPLOY_VERSION)
apc_store($key, DEPLOY_VERSION);
clearstatcache(true);
}
And I have a setrev.php script in the docroot that sets the revision in shared memory which is hit during the deploy after moving the symlink to the new release. Read through the deploy method in ploy.php to see the exact sequence of steps.
I have put all the code on Github. Feel free to grab it and use it for your own deploys. Note that you can’t just drop it in. You are going to need to go through the deploy method in the Ploy class to make sure it performs all the steps you need. Also note that on major deploys where everything changes, including the DB schemas, you are likely going to need to take the site offline to do the deploy since you aren’t going to be able to run the old revision next to the new revision during the upgrade the way this approach does.
In general, having a solid one-button deploy mechanism is essential for any fast-moving development team. If every production push is an adventure the extra friction will affect the product. You want production pushes to be something anyone in the company can do safely at any time so it is worth spending a bit of time thinking about how to do it well.