Automated Website Backup to Amazon S3

After reading Lifehacker’s article linking to Gina Trapani’s article about automatic website backups, I decided it would be a good idea to implement this for my own websites. Gina’s solution is great for one website, but I have multiple websites under one user. I am definitely not a bash-fu master by any stretch of the imagination so the best I could have done with bash would have been to copy Gina’s script and modify it some to fit my needs. I decided instead to write my backup script using PHP 5.3 (version is important!) using the Amazon SDK  for PHP version 1.5.3. This gave me the ability to index an array by a string if I so chose, and just, in general, felt like a more comfortable environment for me to work in.

The first few requirements from Gina’s solution are exactly mine, so I quote them here:

  • You’re running a LAMP-based web site (Linux, Apache, MySQL and PHP/Perl/Python).
  • You have command line access to your web server via SSH.
  • You know how to make new folders and chmod permissions on files.
  • You’re comfortable with running scripts at the command line on your server and setting up cron jobs.
  • You know where all your web server’s files are stored, what databases you need to back up, what username and password you use to log into MySQL.

And the last one is the important change that I made to these requirements. My script will upload the backup file to Amazon’s S3 cloud storage instead of using rsync and ssh to upload it to another server.

  • You have an Amazon account and have activated S3. You also need to have a bucket set up to receive your backup files. You must understand that by using this script it will cost you money because you are using Amazon’s services. They will charge you per GB-Month you use. Please review the pricing for S3 so you aren’t taken by surprise.

The Config Section


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/local/bin/php-5.3

<?php
// the above should be changed to the path of the php 5.3
// executable on your system

error_reporting(E_ALL);

################################################################################
### Config Section
################################################################################
$config = array(
    'user'               => 'user',
    'path_to_sites'      => '/path/to/sites',
    'local_backup_days'  => 5,
    'home_dir'           => '/path/to/home/directory',
    's3_key'             => 'OMGTHISISMYKEY',
    's3_secret'          => 'PLEASEDONTSHARETHISSECRETKEYWITHANYONE',
    'bucket'             => 'mr-bucket-rules',
    'chunk_size_in_MB'   => 10,
    'remote_backup_days' => 10
);

$sites = array(
    'example.com' => array(
        'has_db'  => false),
    'blog.example.com' => array(
        'has_db'  => true,
        'db_host' => 'mysql.example.com',
        'db_name' => 'my_blog_db',
        'db_user' => 'bloguser',
        'db_pass' => 'correct horse battery staple')
    );

The first line is just to tell the shell that we want to run this file using the php-5.3 executable at /usr/local/bin/php-5.3.  This should be changed to whatever the path of the PHP executable is on your system, but remember that 5.3 is needed for the Amazon SDK to do its thing later on. This hash-bang line is needed if you want to just type ./backup_and_upload_to_s3.php on the command line (or without ./ in your crontab) to run this file. In order to do this, the file must be executable, so running chmod +x backup_and_upload_to_s3.php is also necessary. You could also skip these two steps and just type php-5.3 backup_and_upload_to_s3.php.

Next is the $config array for all the odds and ends that were specific to my setup.

  • user is the name of the user running this script. It is used for naming the files as backup_user_...
  • path_to_sites is the directory in which all of the website directories are placed.
  • local_backup_days is how long to keep backups on the local system. Backups older than this are deleted.
  • home_dir is used for Amazon’s SDK. It requires a HOME environment variable to be in place to work.
  • s3_key is the API key from Amazon.
  • s3_secret is the API shared secret you also get from Amazon.
  • bucket is the name of the Amazon bucket to upload to.
  • chunk_size_in_MB is the size of each chunk uploaded as a multipart upload. This allows the script to cancel an upload if an error occurs in the middle. I found 10 MB to be a good number.
  • remote_backup_days is the number of days to keep remote backups on Amazon. Older backups are deleted.
The $sites array holds all of the information about the websites that you want to back up. There are several assumptions made about these websites:
  1. Each website resides in its own directory named the same as the website. The string put here will be used as part of the path name after the path_to_sites config variable.
  2. Each website has a separate database. The database information is per website, so if you use on database with table prefixes, putting the same information in of both will result in a second copy of the whole database.
If has_db is false, the rest of the information is not needed, so I left it out for sites that do not have a database. You can put as many different sites as you want in this config array and all of them will be archived. I have about 12 sites that are all archived, some with a ton of data and some with very little data, and all are saved.

The Backup Process

The script will make a backup of all necessary materials to get your site up and running again after a catastrophic event (or a host move, which can be a catastrophe in and of itself). Again, Gina says it best:

In order to back up your web site, your script has to back up two things: all the files that make up the site, and all the data in your database. In this scheme you’re not backing up the HTML pages that your PHP or PERL scripts generate; you’re backing up the PHP or PERL source code itself, which accesses the data in your database. This way if your site blows up, you can restore it on a new host and everything will work the way it does now.

Local Backup

At the end of this portion, there will be one big backup_username_date.bak.tar.gz on the local system that contains all the data for all the configured websites for that user. The script here is rather long, so it would be best to head over to the github repo I have set up with the code in it. You could even fork it and improve upon it. If you do, I would appreciate a comment describing what you did improve.

The script first creates a directory with the date and time in the name for the backup that is running. This will be the base temporary folder. All of the MySQL databases that are configured as part of a site will be dumped into this folder as a gzip file. All of the websites will be tarred and gzipped as well in a separate directory. After the two dumping / compressing phases, the whole folder is put into another tar archive and gzipped for good measure. The temporary file is then deleted. This all happens within the directory that the script resides.

The last step in the local backup portion is to delete older backups. The script is set up to hold backups for the configured number of days, the default being 5. After this limit, the backups are simply deleted. The script will output all of the information regarding what databases and directories are being backed up and which backups are being deleted.

Remote Backup

After the local file has been created, it is uploaded to Amazon’s S3 service using the configuration values for the bucket, key, and secret key. The file is uploaded in chinks of the size that the user configures. The default is 10 MB, which I found to be a good balance between speed and quick failure. The chunks are uploaded one by one to Amazon and once they are all finished, the upload is completed. Each chunk is verified so that network failures are found out quickly. I personally also like to have feedback regarding a long running process, so chunks are good for me.

After uploading the most recent backup, the archives older than the configured number of days are deleted from Amazon’s servers. The number of days can be configured, so please make sure you pay attention to your budget when you select anything large. Each upload will most likely take a similar amount of space as the one before.

Automation

In order to automate this script, you need to add an entry into your crontab. In order to do this type

crontab -e

into your console to start the crontab editing application using the default editor. Once this is open, you need to add the script into the crontab using standard crontab syntax. The syntax is as follows:

1
2
3
4
5
6
7
8
* * * * * command to be executed
- - - - -
| | | | |
| | | | +----- day of week (0 - 6) (Sunday=0)
| | | +------- month (1 - 12)
| | +--------- day of month (1 - 31)
| +----------- hour (0 - 23)
+------------- min (0 - 59)

* in the value field above means all legal values for that column.
The value column can have a * or a list of elements separated by commas. An element is either a number in the ranges shown above or two numbers in the range separated by a hyphen (meaning an inclusive range).

(Borrowed from http://www.adminschoice.com/crontab-quick-reference. Where would I be without Google?)

For my websites, I decided it would be good to have a daily backup performed at midnight. Thus my crontab is as follows:

0 0 * * * /path/to/backup/backup_and_upload_to_s3.php

This crontab is made with the assumption that I have the hash-bang line at the beginning of the file and have run chmod to make the file executable. Otherwise, the file will look like

0 0 * * * php-5.3 /path/to/backup/backup_and_upload_to_s3.php

You can run the backup script as often as you’d like, but you should keep in mind that these are not incremental backups. Each file contains all of the information that your websites contained at that point in time. Each backup is a full, independent backup of all configured sites.

Conclusion

Website backups are something that is often overlooked by people when on a shared environment. Some people just assume the web host will have a backup and others will just not care. Once this solution is set up, you can just cruise along without worrying about your websites at all. Every day a new backup is made and uploaded to a third party whose job is to provide reliable storage. If anything should happen to your web host, you can easily get back all of your information and be up and running within a couple hours. Like I said above, if you have the itch to improve upon this script, please do to over on github. If you use it, please drop me a line in the comments. Above all, be happy now that you don’t have to worry about your websites now that you have an automated backup in place.

Function Calls, Word Alignment, and Interrupts on a TI DSP

Background

I was taking the microprocessors class last fall and started writing this post… now here it is! I had a couple of labs that were giving me problems. The particular DSP I was using for my lab (the Texas Instruments TMS320F28335) has a word size of 16 bits. When writing a ton of assembly code for a couple of the labs, I came across an error that cropped up concerning word alignment that was particularly obvious when using an interrupt routine set to go off every millisecond or so.

The Problem With Stack Misalignment (No Interrupts)

When a stack becomes misaligned, it creates problems when pushing or popping 32 bit values. Since the DSP I was using addresses memory by words, a 32 bit push will write to two places in memory. When writing to an address like 0×1000, the write will place the 32 bit value to addresses 0×1000 and 0×1001. Order is unimportant because I use a 32 bit pop as the reverse. Here is the meat of the problem: if the stack pointer is misaligned, that is it points to an odd address in memory, the 32 bit push will overwrite the previously placed value. This is very bad when attempting to return one word from a function call or when an interrupt suspends the main “thread” (for lack of a better term) in between two 16 bit pushes. Note that this may not be the exact way the stack works on the DSP, but conceptually it is the same.

The first time I encountered this problem was when I had a keypad scanning function that would return a single word, the key code. When calling the function (at least in the initial lab) there were no problems. The return address of the function call was placed on the stack and the function ran. It is important to note that the address is 32 bits, or 2 words, long. The function would then scan the key pad for any depressed buttons and return a key code regardless. This key code was 16 bits, or 1 word, long. The reason I removed the return address from the stack is because it is required to be at the top of the stack when the RET instruction is executed. That is where the DSP will take the address from. In a mirror fashion, a 2 word, 32 bit read from memory at an odd location will read that location and the one below, not the ones expected. If a read is performed at 0×1001, I would expect 0×1001 and 0×1002 to be given back. Not so. The DSP returns 0×1000 and 0×1001. Effectively, a 32 bit read from 0×1000 is the same as a 32 bit read from 0×1001. The picture below illustrates what happens when this problem occurs.

 

As the picture shows, this situation eventually leads to the function call returning to an unknown location, which is generally random and contains uninitialized memory. This causes all sorts of fun things to happen, like flashing LEDs and a general dysfunctional state in which the board must be reset and reprogrammed.

The Solution (No Interrupts)

The solution to this problem was simple, in retrospect (hindsight is 20/20, after all). I simply pushed a second word with a dummy value, in this case 0, so the stack would stay aligned. This required two push instructions in a row but it made the program work. The stack pointer always gets incremented or decremented by 2 memory locations, ensuring that it stays on an even value. The graphic below shows the execution of the program after this fix is applied.

The Problem With Stack Misalignment (With Interrupts)

The previous solution is just fine considering the circumstances it was in. There are no asynchronous tasks that could interrupt the execution of the main “thread.” This is unrealistic, however, because real systems will have interrupts on regular or irregular bases which can interrupt the code execution at any time. These essentially execute as a function call, pushing the return value on the stack (as a 2 word value) and going to the interrupt handler body. These pseudo-function-calls must be assumed to occur between any two instructions in the main program. This causes the same stack misalignment error as before in some situations.

The previous solution would place two words on to the stack, one word at a time. This means that two separate instructions were required, providing a kink in the armor in which an interrupt can wreak havoc. The program may even execute just fine for awhile, but eventually an interrupt will execute between two of the push instructions, creating the same situation as before. The graphic representation of this would be the same as above.

The Solution (With Interrupts)

The solution that I decided to use was to pass everything through a 32 bit register on its way to the stack. I would place a 16 bit dummy value side by side with a 16 bit value to be pushed on the stack and use one push instruction. The DSP may need to actually make two trips to memory, but at least it is guaranteed to not b interrupted between writing both words. This solution would graphically look the same as the solution above. After fixing this, the program proceeded to run just fine, even with interrupts.

Conclusion

When using a DSP that operates on a stack, misalignment can occur. Misalignment places the stack pointer at an odd position in the stack. Upon writing a 32 bit value, or 2 words, to the stack, the data will overwrite the singular 16 bit value in the even position just below the stack pointer and the stack pointer will point to uninitialized data. When the function or interrupt returns, it may jump to uninitialized memory. The solution to both of these problems is to push and pop only 32 bit values on and off the stack, packing and unpacking data as necessary.

Windows 8 Developer Preview VMWare Install

I was very excited when I heard that a Developer Preview of Windows 8 was coming out. I downloaded it (even though I went over my data cap for the month >.<) so I could play with it in a virtual machine. I immediately went to VMWare Workstation 7.14 to attempt installing it, only to run into an error that said:

MONITOR PANIC: vcpu-0:NOT_IMPLEMENTED vmcore/vmm/intr/apic.c:1903

Well, crud. That didn’t work very well. I arrived at the loading screen (with a fancy new spinning wheel thingy) and got a popup error. After a few searches, I discovered (I believe my initial hint was from here) that the Windows 8 Developer Preview simply doesn’t work on VMWare Workstation 7.xx. Thus I acquired VMWare Workstation 8 so I could install the OS and have fun. I have screenshots from now on of what I have done.

First I clicked on the big “Create a New Virtual Machine” button in the Home tab. That brought up the new virtual machine wizard. This is pretty standard, but for completeness I will cover what I did to set up the virtual machine.

VM Setup

The first screen shown asks you which VM type you want to create. I chose a Typical install because that’s what this is. Throughout this process I generally treated Windows 8 as Windows 7.

 

Next was the method of installing the operating system. I had the .iso file on my desktop so I chose that.

 

Next it asked me what kind of OS I wanted. I chose Windows 7 x64 because I figured that was closest by compatibility standards.

 

Next it asks for the product key you want to use. I just left everything at the default values here, including leaving the product key blank. This gets me into trouble later.

 

Then I got this error. Just click yes and keep going. I will end up dealing with this later.

 

I named my virtual machine “Windows 8 x64 Developer Preview” and left it in the default location. You can change this now if you want to.

 

I gave my Winows 8 install 40 Gigabytes of room to play with. I think it requires 16 to install, so I left plenty of room. I chose to split the disk into multiple files in case I want to move the VM around later. This splits the virtual hard disk into 2 GB files that grow with the VM. The virtual disk isn’t large, so it should be fine.

 

We’re ready to create! Wait, no. I want to customize my hardware a bit. The default amount of RAM is 1 GB (yuck) and the default processor is 1 physical with 1 core. I want to devote a bit more to the machine. Click on the “Customize Hardware…” button to change settings.

 

I increased the RAM to 4 GB (I have a few gigs to spare) and the processing power to 1 physical with 2 cores. I also deleted the floppy drive (who needs one of those?).

 

OK now we’re ready to boot! Woohoo it’s booting! And it has a new loading spinner.

 

First thing I encountered was another error on boot (grr….) that said:

Windows cannot read the <ProductKey> setting from the unattend[sic] answer file.

Uh oh. It’s broken already. It took me awhile (and some Google searching) before I realized that the floppy drive had somehow come back. Eventually i noticed the floppy drive contained an autoconfig file that Windows tries to read in order to do an unattended install. This file is either incompatible or just broken. Here you must go back in to the machine settings and REMOVE the floppy drive or it won’t work. This contains the auto-install settings that don’t include the product key. Even if you specify a product key (any key…) earlier it fails to work.

The Install

This part is (most likely) the same for VMs or physical machines. Once you remove the Lazarus-like floppy drive (on the VM… you still have a real floppy drive?), you are greeted with an install screen that looks familiar. There’s just a couple of language and keyboard settings here.

 

This is pretty simple. Just press “Install now.” Currently there’s no system to repair.

 

The only way to go on is to agree to anything and everything they want. Just check the box and hit “Next.”

 

As before, there’s no current system so hit the “Custom (advanced)” option to install fresh.

 

Since I used only one drive on my system, I only have one choice. If you were fancy and have multiple drives at this point, choose the one you want to put the OS on.

 

Now it installs… This took absolutely forever to expand the files. In here it will reboot a couple of times. Just let it run and do its thing. I watched an episode of Hak5.

 

Once it’s done installing the files on to your system, it will start up into this screen and do some configuring for the hardware. This didn’t take very long.

 

You must agree to their license terms (again) to start the setup process. I found it a bit odd that they take such an informal tone on this screen.

 

Setup

The setup experience for Windows 8 was very different from what I am used to. There are very few screens and it focuses mainly on the connectivity and experience rather than technical details.

There is an initial screen that says “Let’s go through a few basics” just before this. It went by a bit fast so I didn’t have time to snip the window. Here the computer is given a name. I very creatively named it and moved on.

 

Microsoft has made it dead simple to get through the settings. They just set them all for you! I didn’t have any problems with their defaults so I just chose “Use express settings” and moved on. You may want to change some of the stuff.

 

This is where things start getting very different. They want you to link your OS with you Windows Live ID. I already had one, but if you don’t you can create on here and use that. I entered my ID and moved on.

 

Since I already had an account, I was just taken to the login screen. I put in my credentials and continued. If you missed the link on the previous screen, you can create a Windows Live ID here too.

 

Here I set up a security question. The questions seemed more secure than the standard “Mother’s maiden name” type.

 

After the rest of the steps it sat on this screen for awhile. I also received an email to my account saying something about a new computer being authorized.

 

Afterwards it automagically logged me on (another screen too fast for me to snip). I was finally greeted by the brand new Metro UI.

Whew. That was a lot of screenshots. I will be playing with this for awhile and probably post some interesting screenshots. Have fun installing and using Windows 8!

Strange Errors, Makefiles, and Large Matrices

Strange Errors Abound

The first problems I encountered numbered im the hundreds. I was getting errors that all said something like this:

1
error: expected '=', ',', ';', 'asm' or '__attribute__' before cut...

After some frantic Google searching (yes, it worked this time) I came across a forum thread that contained some lists of error messages that seemed similar to mine. Someone in this thread suggested that the compiler was compiling the CUDA code as C code so the standard library of CUDA functions was not automatically being referenced. The solution to this problem (obviously enough) is to put the file extension “.cu” at the end of any file that contains CUDA library function calls or any CUDA specific syntax like so:

1
file_with_cuda_stuff.cu

Makefiles for CUDA

When compiling a CUDA program that has multiple files, it is best to create a makefile that can do this work for you instead of having to manually type in every command every time. There is a CUDA makefile format that includes all sorts of options, including:

EXECUTABLE
CUFILES
CUDEPS
CCFILES
LIBS
LDFLAGS
INCLUDES

These are supposedly all you need to create a makefile for a CUDA program, with the line

include /path/to/common.mk

at the end to include all of the standard options that come with the CUDA SDK.

I had a lot of problems using this kind of makefile; the LDFLAGS and LIBS options were failing to pass through and I had absolutely no idea why. What I decided to do next was use a regular makefile, as I had some experience with these while writing C and C++ programs in the past. This did the trick. Without all of the options and complexity (which I’m sure comes in handy to someone at some point) it is a much simpler and cleaner makefile. Example:

# Awesome Makefile
proj.out : proj.cu
    nvcc proj.cu -I/opt/cuda-sdk/C/common/inc -L/opt/cuda-sdk/lib

Large Matrix Multiplication

Background

I recently had to write a program (read: do a homework assignment) that multiplied two large matrices. One was 8192×32768 and the other was 32768×32768. Both contained 32 bit single precision floating point numbers.

The Basic Idea

When you can’t fit the operand matrices and the result matrix in memory in their entirety, you must split them up in a way that still enables tiling and maximizes occupancy of the multiprocessors inside the chip. The way you can split up the input and output matrices can be seen in this picture. You take a strip out of each matrix that, when multiplied by the other, gives you the section of the result matrix that you want. This division and joining happen on the host computer, not on the device. The device memory can only hold the partitions that you move on to the device at one time. When you need to multiply the next chunk, you overwrite the current strip that is on the device. Each pair of matrix strips is a separate kernel launch, each of which requires moving the operands and result to and from the device.

Cuda Timing Woes

I have come to realize that programming using Nvidia’s CUDA can be a major PITA pit when a problem crops up. Generally there are few resources besides the CUDA Programming Guide that are helpful. Most of the answers that one would find in their searches are on the forums at Nvidia, and even then many searches turn up unanswered questions.

My particular problem was in the realm of timing the execution of a kernel on the device. In the example code that was provided to me in my class, they used several undocumented functions from the cutil.h (or, in my case, the cutil_inline.h) header. These functions included:

  • cutCreateTimer
  • cutStartTimer
  • cutStopTimer
  • cutGetTimerValue
  • cutDeleteTimer

All of these functions are not officially supported by Nvidia. The exact error that I was getting was this:

ld: cannot find -lcutil

Basically, the linker could not find the static cutil library to link against. I used the command line

nvcc example.cu -Iinclude -lcutil -L/opt/cuda-sdk/lib

where the path

/opt/cuda-sdk/lib

contained the file libcutil.a, which is the shared library version of cutil. I searched high and low and couldn’t find a libcutil.so anywhere inside the SDK file structure. As it turns out, it doesn’t exist, at least not in the SDK version that I am using.

The way you are supposed to time something is by using events. I will get to events in a minute, but first I want to describe the journey I took to get there.

Google, thou hast failed me

I spent a total of probably 4 hours of Google searching, interspersed between other classes and homework, which got me basically nowhere. I found various sources all telling me something different. I tried many of these different approaches, including:

  • looking in cutil.h for clues on what the function declarations looked like to see if maybe I could add a function prototype to help out the linker (wrong direction… i think. no effect anyway)
  • adding the phrase ‘extern “C”‘ before the prototype declaration (no effect)
  • decompiling the shared library and recompiling it into a static library (which didn’t work anyway)

The above are just a few of the many different things I tried while searching. Most of the suggestions I tried were from Nvidia forums. In the forums also were about 20 threads that I came across that said to just reference the library using -lcutil and -L/path/to/library, which didn’t work in the first place.

Nvidia to the rescue

In the end I gave up and started looking through the Nvidia CUDA Programming Guide (warning: large PDF. Source page) page by page. I came across an example in the middle (Page 39, Section 3.2.7.6) that focuses on events and how they can be used to time execution. Here is the basic structure for timing something in your program (in C):

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);

// Whatever needs to be timed goes here

cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop);

So, there you have it

If you are trying to time something using any of the cutXxXxTimer functions, stop using those and start using events. The functions are officially supported, aren’t any harder to use, and, best of all, they actually link correctly when you go to compile your program.