Ubuntu is knee-deep in excellent software that you can use to do
various types of backups, ranging from backing up to files
that you can store on removable media or simply move
to other systems. Below we discuss BackupPC, an
excellent open source backup application for
doing regular, networked backups of
multiple systems to a central
backup server.
Backups
Backups are spare copies of the files and directories that are found on a computer system, written to and stored on removable media that is preferably stored somewhere other than beside your computer. Doing backups is a time-consuming, but absolutely mandatory task if you place any value at all on the files, e-mail, and other data that you have stored on your computer.
Backups are exactly like auto insurance policies. You rarely need them, and you hope that you never do. They are usually just time-consuming and expensive (your time has some value, right?). However, one rainy night when you discover that you’ve just accidentally deleted your home directory or when a user comes to you and says that they’ve accidentally deleted your company’s personnel records, payroll data, or the source code for your company’s products, you’d better have a good answer. The right answer, of course, is, “I’ll restore that from backups immediately.”
It’s hard to think of anything that so thoroughly combines the mundane and mandatory as backing up your data. It’s boring. It’s time-consuming. And, of course, it’s critical. This text is oriented toward you as a systems administrator, regardless of how many systems you’re responsible for. As system administrators, our responsibility is to provide secure, well-maintained, and rigorously backed up systems for the benefit of the users of the computer systems we’re responsible for. You should feel even more responsible if you’re only supporting a user community of one (yourself), because you won’t even have anyone else to blame if a catastrophe occurs. Even if you’re a community of one, I’m sure that you feel that whatever you do on your computer system is important. Backups keep it safe.
Here i explain a variety of solutions for creating backups on Ubuntu Linux systems, ranging from command-line solutions to some impressive graphical tools. It also covers the flip side of making backups, restoring files from them, which is what makes backups worthwhile in the first place.
Before discussing the different tools used to actually create backups, it’s useful to review some of the basic issues and approaches in backing up any kind of computer system. Though you may already be totally familiar with these concepts and occasionally mumble backup and restore commands in your sleep, providing a clear picture of what you’re trying to accomplish in doing backups and how backup systems are usually designed provides a firm foundation for discussing the various tools discussed later. I discuss many topics that are overkill for a home computing environment but are mandatory in multisystem business or academic environments.
Why Do Backups?
In an ideal world, backups would not be necessary. Computer hardware and software would always work correctly, users would never make mistakes. Unfortunately, in the real world, things are different. Computer system administrators and other members of an MIS/IT department do backups for many reasons, helping protect you against the following types of problems:
- Natural disasters such as fires, floods, and earthquakes that destroy computer systems
- Hardware failures in disk drives or other storage media that make it impossible to access the data that they contain
- System software problems such as filesystem corruption that might cause files and directories to be deleted during filesystem consistency checks
- Software failures such as programs that crash and corrupt or delete the files that you’re working on
- Pilot error, AKA the accidental deletion of important files and directories
Many people tend to confuse
RAID (Redundant Array of Independent Disks) arrays with backups. They are not the same thing at all. RAID arrays can be a valuable asset in keeping your existing data online and available in the face of disk failures, but they do not protect against any of the problems identified in the previous list. All of the drives in a RAID array will burn evenly in case of a fire or other natural disaster.
In addition to protecting you against these sorts of problems accessing the data they, you, and any other users of your systems require, there are a variety of procedural and business reasons to back up the data on your computer systems. Complete and accurate backups provide:
- A complete historical record of your personal, corporate, or organizational business and financial data. Sadly enough, this includes serving as a source of information that you, your company, or your organization may someday need to defend itself or to prove its case in a lawsuit or other legal proceedings.
- A source of historical information about research projects and software development.
- A way of preserving data that you do not need to make continuously available online, but which you may need to refer to someday. This includes things like projects that you’ve completed, the home directories of users who are no longer using your systems, and so on.
A final issue where backups are concerned is the need for off-site storage of all or specific sets of your backups. The history of personal, business, and academic computing is littered with horror stories about people who did backups religiously, but stored them in a box beside the computer. After a file or natural disaster, all that the administrators of those systems were left with were poor excuses and unemployment benefits.
Off-site storage (cloud computing) is critical to your ability to recover from a true physical catastrophe, but it also raises another issue — the need for appropriate security in the storage location you select.
For the same reasons that you wouldn’t leave the door to your house propped open and then go on vacation and wouldn’t put a system that didn’t use passwords on the Internet, you shouldn’t store your backups in an insecure location. This is especially important if you are in charge of computer systems that are being used for business.
Wherever you store your company’s current and historical backup media should have a level of security comparable to wherever your computers are in the first place. Though your local cat burglar might not actively target a stack of CDs, removable disks, or storage locker full of backup tapes, any competitors you have would probably be ecstatic to be able to read and analyze the complete contents of your company’s computer systems. Why not just save everybody time and mail them your source code and customer lists?
A Few Words About Backup Media Backups take a significant amount of time and require a significant investment in both media and backup devices. Nowadays, even home computer systems store tens or hundreds of gigabytes of information, which means that you either need to have fast, high-capacity backup devices, or you must prepare yourself for a laborious day or two or loading CDs, DVDs, or tapes.
Other, more historical solutions such as Zip disks, Jazz disks, LS-120 disks, and so on, provide such a small amount of storage that they’re really only useful for backing up individual files, directories, or sets of system configuration files, and are therefore not discussed here.
If the mention of backup tapes causes flashbacks to mainframe computer days or old sci-fi movies, you may want to rethink that. Even for home use, today’s backup tape drives are fast, store large amounts of data, are relatively cheap, and use tapes that fit in the palm of your hand. Though disk-to-disk backups are becoming more and more common, especially in networked environments, backup tapes are still quite popular and cost-efficient.
CD-Rs and DVD-Rs are eminently suitable for backups of home computer systems because they are inexpensive and typically provide enough storage for backing up the selected files and directories that comprise most home backups.
For home use, I prefer CD-R and DVD-R media over their rewritable brethren because of the cost difference and the fact that rewritable CDs and DVDs are only good for a limited number of writes.
On the other hand, CD-Rs and DVD-R’s are rarely appropriate for enterprise backups because even DVD-Rs are not large enough to back up complete systems, it’s tricky to split backups across DVD-R media, and DVD-R’s are relatively slow to write to. They can be useful when restoring a system because of their portability, because you can take them directly to the system you’re restoring without having to move a tape drive, do a network restore, and so on. However, I personally prefer removable hard drives or tapes in enterprise or academic environments.
Different Types of Backups
Now that I’ve discussed why to do backups and some of the basic issues related to storing them, let’s review the strategy behind actually doing backups. As mentioned previously, backups take time and have associated costs such as backup media, but there are a variety of ways to manage and minimize those costs. There are three basic types of backups:
- archive backups, which provide a complete snapshot of the contents of a filesystem at a given time
- incremental backups, which reflect the changes to the contents of a filesystem since a previous backup
- spot backups, which provide a snapshot of specific files or the contents of one or more important directories at a given time
Spot backups are the most common type of backups done by home computer users, because writing a copy of your current projects, mail folders, or even your entire home directory to a few CD-Rs or DVD-Rs is relatively fast and cheap. There isn’t all that much to say about this approach, because it can easily be done using drag and drop, so the rest of this section focuses on the classic backup models of
archives and
incremental backups. I’ll discuss some techniques for doing spot backups later.
Archive backups, often referred to as
archives or
full backups, are the ultimate source for restoring data, because they usually contain a copy of every file and directory on a specific filesystem or under a certain directory on your computer at the time that the backup was done. In an ideal world, it would be great to be able to do daily archive backups simply because this would guarantee that no one could ever lose more than a day’s work, regardless of the type of calamity that occurred to your computer system. Unfortunately, archive backups have some drawbacks:
- They take the maximum amount of time that backups could require because they make a copy of every file and directory on every filesystem on all of your computer systems.
- The volume of data that is preserved by an archive backup means that they use the maximum amount of space on your backup media.
- Producing the largest possible volume of backup media maximizes the amount of storage space required to store it, and makes your record keeping as complex (and as critical) as it possibly could be.
- Archives are best done when no one is working on a computer system. This reduces the amount of time that it takes to do the backups (because they’re not competing with anyone for computer time), and also guarantees the consistency of the files and directories that are being copied to your backup media, because nothing can be changing. This may not be a big point in a home computing environment, but in a business environment, making sure that no one is using a computer system so that you can do an archive backup is often impractical (as on systems that run 24×7 services such as Web servers, database systems, and so on) or, best case, reduces the availability of a computer system to the company and your customers.
Although the advantages of archive backups as a complete record of everything are significant, these kinds of issues keep archives from being a reasonable approach to daily backups for any home computer, business, or organization. You could always do them less often than daily, but reducing the frequency of your backups increases your exposure to losing a significant amount of data if your disks fail or your computer bursts into flames.
Enter
incremental backups. As mentioned before, incremental backups are backups that contain a copy of all of the files and directories that have changed on a computer system since some previous backup was done. If a problem occurs and you need to restore files and directories from backups, you can restore an accurate picture of those files and directories by first restoring from an archive backup, followed by restoring from some number of incremental backups up through your most recent ones, which should restore whatever you’ve backed up to the date of your most recent incremental backups. When combined with
archives, incremental backups provide the following advantages:
- They help minimize the amount of disk space or other backup media required to do backups. Archives usually require large quantities of most types of backup media, while incrementals inherently require less because they aren’t preserving as much data.
- They can be done more quickly, because they are copying less data than an archive backup would.
- The backup media to which incremental backups are written requires less storage space than archive backups, because there’s less of it.
- Even in business and academic environments, incremental backups can be done while the computer systems and filesystems you’re backing up are available for use.
Another nice feature of incremental backups is that they record changes to the files and directories on your computer systems since some previous backups, which are not necessarily archives. In corporate environments, most systems administrators organize their backup media and associated procedures in a way similar to the following:
- Archives are done infrequently, perhaps every six months or so, or just before any point at which major changes to your filesystems or computer systems are being made.
- Monthly incremental backups are made of all changes since the previous archive. If your budget and backup storage capabilities are sufficient, you usually keep the monthly incremental backups around until you do another archive backup, at which point you can reuse them.
- Weekly incremental backups are made of all changes since the previous monthly backup. You can reuse these each month, after you do the new monthly backups.
- Daily backups are made of all changes since the previous weekly backup. You can reuse these each week, after you do the new weekly backups. Some installations even just do dailies since a previous daily or the daily done on some previous day of the week.
No backup system can make it possible to restore any version of any file on a computer system. Even if you were lucky or compulsive enough to be doing daily archives of all of your computer systems, files that exist for less than a day can’t be restored, and it isn’t possible to restore a version of a file that is less than a day old. Sorry. When designing a backup schedule and the relationships between archive and various incremental backups, you have to decide the granularity with which you might need to restore lost files. For example, the general schedule of archives, monthlies, weeklies, and dailies doesn’t guarantee that you can restore a version of a file that is newer than the previous archive. For example:
- If the file was deleted one day before the first set of monthly backups were done based on the archive, it would be present on the archive and on the weekly backups for a maximum of one month. At that point, the weekly tape containing that file would be overwritten and the newest version of the file that could be restored was the version from the archive.
- If the file was deleted one day after the first set of monthly backups were done based on the archive, it would be present on the archive and on the first monthly backup for a maximum of seven months — a new archive would be done at that point, and the monthly tape wouldn’t be overwritten until one month after the new archive. At that point, the monthly tape containing that file would be overwritten and the newest version of the file that could be restored was the version from the most recent archive.
Selecting a backup strategy is essentially a calculation of how long it will take someone to notice the absence of one or more files and request a restore, taking into account the level of service that you need to provide and the cost of various levels of service in terms of media, backup time, and storage/management overhead. Sometimes you will notice missing files immediately, such as when you accidentally delete the task you’re actively working on. Other problems, such as lost files because of gradual disk failures or filesystem corruption, may not surface for a while.
Almost all backup systems generally provide automatic support for doing incremental backups since a previous incremental or archive backup. The Linux
dump program, which I’ll discuss in the next section, assigns different numbers to different backup “levels,” and keeps track of which levels of backups have been
done based on the name of the device on which the filesystem is located.
A final issue to consider when doing backups and restoring files is when to do them, and what privileges are required. It’s generally fastest to do backups during off-peak hours when system usage is generally at a minimum, so that the backups can complete as quickly as possible, and when people are less likely to be modifying the files that you’re backing up. In an enterprise environment, this may mean that you’ll want to have a graveyard shift of operators. In this case, you’ll need to think about how to make sure that operators have the right set of privileges.
Being able to back up user files that may be heavily protected, or using a backup system that accesses the filesystem at the filesystem level generally requires root privileges. Many people use programs such as sudo (which is already our friend on Ubuntu systems) or set s-bits on privileged binaries such as backup and restore programs so that they don’t have to give the administrative password to the operators or part-time staff that generally do backups at off-peak hours.
Verifying and Testing Backups
Just doing backups isn’t a guarantee that you’re safe from problems, unless you’re also sure that the backups you’re making are readable and that files can easily be restored from them. Though it’s less common today, there’s always the chance that the heads in a tape drive may be out of alignment. This either means that you can only read the tapes back in on the same tape drive that you wrote them on, or that they can’t be read at all. You should always verify that you can read and restore files from backups using another device than the one on which they were made. You don’t have to check every tape every day, but random spot checks are important for peace of mind and for job security. Similarly, tapes can just stretch or wear out from use — be prepared to replace the media used to do various types of incremental backups after some set amount of time. Nobody appreciates WORN backup media — write once, read never — even though its storage capacity is apparently infinite.
One of the problems inherent to backups is that every type of computer media has a shelf life of some period of time, depending on the type of media, the environment in which it is stored, and how lucky you are. No backup media has infinite shelf life. For example, backup tapes can last for years, but they can also be unreadable after a much shorter period of time. Long-lived media such as write-once CD-Rs and DVD-Rs are attractive because of their supposed longevity, but they have other problems, as mentioned earlier in the quote entitled “A Few Words About Backup Media.” Media such as these may only be suited for certain types of backups,
depending on whether your backup software writes to the backup device as a filesystem or as a raw storage device. Also, no one yet knows exactly how long those types of media will last, but they certainly take up less room than almost any kind of tape or stack of hard drives.
In addition to spot-checking the backup media that you are currently using, you should always make a point to spot-check old archives every few years to make sure that they’re still useful.
Aside from the fact that backups can be subject to the vagaries of the device on which they’re written, having those devices available when you need to restore backups is an important point to consider. It’s a well-known nerd fact that many government and military sites have huge collections of backup data written on devices that don’t exist anymore, such as super low-speed tape drives and 1” or 7-track tapes. Even if the devices exist, the data is often not recoverable, because it’s written in some ancient, twisted backup format, word size, and so on. When you retire a computer system, deciding if you’ll ever need to restore any of its archive data is an easily overlooked issue. If you’re lucky, you’ll be able to read in the old archives on your new system and write them back out to some newer backup media, using some newer backup format. If you’re not, you’ve just acquired a huge number of large, awkward paperweights that will remind you of this issue forever.
Deciding What to Back Up
Aside from cost-saving issues like using higher-density media such as CD-ROMs for archive purposes, another way to reduce the number of old backups that you have to keep around, as well as minimizing the time it takes to do them, is to
treat different filesystems differently when you’re backing them up. For example, system software changes very infrequently, so you may only want to back up the partitions holding your operating system when you do an archive. Similarly, even locally developed application software changes relatively infrequently, so you may only want to back that up weekly. I can count on one hand, with one finger, the number of times that I’ve needed to restore an old version of an application. On the other hand, you may not be so lucky. Keeping backups of your operating system and its default applications is important, and is certainly critical to restoring or rebuilding an entire system should you ever need to do so (which is known in backup circles as a
bare-metal restore).
In terms of backups (and thanks to the excellence of the
Ubuntu Update Manager), you can usually just preserve your original installation media (or even re-retrieve it over the net) if it is ever necessary to completely restore the system software for your Ubuntu system. However, if your systems run a custom kernel or use special loadable kernel modules, you should always make sure that you have a backup of your current configuration and all of the configuration files in directories such as /etc that describe the state of your system. You’ll be glad you did if the disk on which your finely tuned and heavily tweaked version of an operating system bursts into flames late one night.
The issues in the first few sections of this text often give system administrators and system managers migraines. Losing critical data is just as painful if you’re only supporting yourself. Thinking about, designing, and implementing reasonable backup policies, schedules, and disaster recovery plans is an important task no matter how many people will be affected by a problem. Backups are like insurance policies — you hope that you never need to use them, but if you do, they had better be available.
Backup Software for Linux
Many backup utilities are available for Ubuntu systems. Most of these are traditional command-line utilities that can either create archive files or write to your backup media of choice in various formats, but some interesting open source graphical solutions are also beginning to appear.
The next few sections discuss the most common open source utilities that are used to do backups on Linux systems, grouping them into sections based on whether they create local backup files or are inherently network-aware. As discussed in the previous section, off-site storage of backups is an important requirement of a good backup strategy. In today’s networked environments, off-site storage can be achieved in two basic ways:
- either by writing to local backup media and then physically transporting that media to another location, or
- by using a network-aware backup mechanism to store backups on systems that are physically located elsewhere.
Local Backup and Restore Software for Linux
The roots of the core set of Linux utilities lie in Unix, so it’s not surprising that versions of all of the classic Unix backup utilities are available with all Linux distributions. Some of them are starting to show their age, but these utilities have been used for years and guarantee the portability of your backups from any Linux system to another.
The classic Linux/Unix backup utilities available in the Ubuntu distribution are the following, in alphabetical order:
- cpio: The cpio utility (copy input to output) was designed for doing backups, taking a list of the files to be archived from standard input and writing the archive to standard output or to a backup device using shell redirection. The cpio utility can be used with filesystems of any type, because it works at the filesystem level and therefore has no built-in understanding of filesystem data structures.
- dd: The original Unix backup utility is called dd, which stands for dump device, and it does exactly that, reading data from one device and writing it to another. The dd utility doesn’t know anything about filesystems, dump levels, or previous runs of the program — it’s simply reading data from one source and writing to another, though you can manipulate the data in between the two to do popular party tricks like converting ASCII to EBCDIC. The dd utility copies the complete contents of a device, such as a disk partition to a tape drive, for backup purposes. It wasn’t really designed to do backups, though there are situations in which dd is the perfect tool: For example, dd is the tool for you if you want to copy one partition to another when a disk is failing, make on-disk copies of the partitions on a standard boot disk for easy cloning, or use an application that reads and writes directly to raw disk partitions which you can only backup and restore as all or nothing. Because dd reads directly from devices and therefore doesn’t recognize the concept of a filesystem, individual file restores are impossible from a partition archive created with dd without restoring the entire partition and selecting the files that you want.
- dump/restore: The dump and restore utilities were designed as a pair of utilities for backup purposes, and have existed for Unix since Version 6. Although cpio and tar combine the ability to write archives with the ability to extract files and directories from them and dd can’t extract anything except an entire backup, the dump program only creates backups and the restore program only extracts files and directories from them. Both dump and restore work at the filesystem data structure level, and therefore can only be used to backup and restore ext2 and ext3 filesystems (at the moment, at least). However, the dump/restore programs can accurately back up and restore any type of file that is found in ext2 and ext3 filesystems, including device-special files and sparse files (without exploding their contents and removing their “sparseness”). The dump/restore utilities can only be used to back up entire filesystems, though they have built-in support for doing incremental backups, keeping a record of which filesystems have been backed up and which level of backups has been performed for those filesystems. All of this information is tracked in an easily understood text file named /etc/dumpdates. Archives created with the dump utility can automatically span multiple tapes or other media if the devices support end-of-media detection, but can also span cartridge or magnetic tape media by using command-line options that tell dump the length or capacity of the tape. The most entertaining feature of the restore program is its ability to execute it in interactive mode, in which case it reads the information from the tape necessary to create a virtual directory hierarchy for the archived filesystem that it contains. You can then use standard commands such as cd to explore the list of the files on the tape and mark specific files and directories to be restored.
The dump/restore programs are not installed as part of a default Ubuntu distribution, but can easily be installed using apt-get (both are located in the dump package)
- tar: Probably the most widely used and well-known Unix backup utility, the tar command (tape archiver) takes a list of files and/or directories to be backed up and archives those files to an output device or to standard output. The GNU version of tar, once known as gtar to differentiate it from the version of tar that came with the Unix operating system (back when anyone cared), is yet another amazing piece of work from the Free Software Foundation. GNU tar provides capabilities far and above the abilities of classic Unix tar, including the built-in ability to read from compressed tar archives created with gzip, support for incremental backups, support for multivolume archives, and much more. The tar program is filesystem-independent and accesses files and directories without needing to know their low-level data structures. The tar program is far and away the most popular free archiving utility available for Linux, and is used to archive almost every free software package. The DEB and RPM archive formats actually contain tar files that are compressed using the gzip utility, and files with the .tgz or .tar.gz (also gzipped tar files) are commonly used to distribute most Linux source code.
The utilities discussed in this section all create local archive files or write their archives to local storage devices. Of course, when you’re using a network-aware operating system such as Ubuntu Linux, the term local storage devices actually includes anything that appears to be local to your system, which therefore
includes network storage that is mounted on a directory of the system that you are using. Common examples of this are
NFS-mounted directories or directories that are mounted on your Linux system via
Samba.
Directories that are mounted over the network enable you to integrate remote storage with local backup commands in ways such as the following:
- Back up remote directories to local archives by mounting the remote directories on your local system and including them in the backups that you do.
- Write your backup files to remote storage by creating your backup archives in remote directories that are mounted on your system as local directories.
Both of these scenarios provide ways of satisfying the basic off-site requirement of backups through the use of network-mounted directories.
Network-Oriented Backup Software for Linux
The utilities discussed in the previous section all create local archive files or write their archives to local storage devices (or storage that appears to be local). The backup utilities discussed in this section are slightly different — they are inherently network-aware, and therefore
enable you to create and manage localbackups of the contents of remote systems.
The following are some of the more commonly used, network-aware backup systems that are available for Ubuntu.
There are many more, which you can find by starting the Synaptic Package Manager and doing a Description and Name search for the term backup.
The following are my personal favorites:
- Amanda: The Advanced Maryland Automated Network Disk Archiver is an open source distributed backup system that was originally developed for Unix systems at the University of Maryland in the early 1990s. Amanda makes it quite easy to back up any number of client workstations to a central backup server, supports Windows Microsoft backups via Samba, and provides a complete backup management system for your Ubuntu system. Amanda supports multiple sets of backups with distinct configurations, supports disc and tape backups, tracks backup levels and dates on its client systems, produces detailed reports that are automatically delivered via e-mail, and keeps extensive logs that make it easy to diagnose and correct the reason(s) behind most problems. Communication between Amanda clients and servers is encrypted to heighten security. Amanda is not installed by default on Ubuntu systems, but is available in the Ubuntu repositories and can easily be installed using Synaptic, apt-get, or aptitude. Amanda consists of two packages, amanda-server and amanda-client. Take a glance at Amanda’s home Web site.
- BackupPC: BackupPC is a nice backup system that provides a Web-based interface that enables you to back up remote systems using smb, tar, or rsync. BackupPC creates backups of your remote systems that are stored and managed on your BackupPC server, and also enables authorized users to restore their own files from these archives, removing the number one source of migraines for system administrators. Configuration data for each client system is stored on the BackupPC server, which enables you to back up different types of systems using different commands or protocols, and to easily identify which remote directories or filesystems you want to back up. One especially nice feature of BackupPC is that it uses standard Linux commands on the server to create backups, and therefore doesn’t require the installation of any software on client systems, though some client-side configuration may be necessary for certain backup commands. See at BackupPC’s home page . See “Installing and Using the backuppc Utility,” later in this section for more information about installing, setting up, and using BackupPC.
- Bacula: Bacula is an extremely powerful set of programs that provide a scalable network backup and restore system that supports Linux, Unix, and Microsoft Windows systems. Its power and flexibility easily match that of Amanda, but it is more flexible in terms of how and where backups are stored. Bacula is not installed by default on Ubuntu systems, but is available in the Ubuntu repositories and can easily be installed using Synaptic, apt-get, or aptitude. Bacula is quite powerful, but can be complex — if you’re interested in exploring Bacula, you may want to start by installing the bacula-doc package and reading its documentation to determine if it is right for your environment. Bacula is primarily command-line oriented, but provides a graphical console as a wrapper around its command-line interface. Bacula’s home page is here .
- Rsync: Rsync (remote sync) is a command-line file and directory synchronization program that makes it easy to copy files and directories from one host to another. When both a local and remote copy of a file or directory hierarchy exist, rsync is able to leverage built-in features that help reduce the amount of data that needs to be transmitted to ensure that the local and remote copies of those files and directories are identical. The remote-update protocol used by the rsync utility enables rsync to transfer only the differences between two sets of files and directories. The rsync program is automatically installed as part of a default Ubuntu installation, but requires some configuration on the remote systems that you want to copy to your local host.
Backing Up Files to Local, Removable Media
The introductory section of this text entitled introduced the basic concepts of backups, many of which may seem impractical for home use. Whether or not they are really impractical depends on the problems that you want to be able to solve using your backups.
- If you’re mostly interested in protecting yourself against disk failures or the accidental deletion of critical files that you’ re working on, you may not need to worry about doing archive and incremental backups — doing spot backups of important files and directories to a CD-R or DVD-R may suffice.
- Similarly, if you don’t need to be able to restore any file from any point in time, but just need to have recent copies of your files, then spot backups of the directories that you want to back up may be sufficient, done with whatever frequency you’re comfortable with.
- If you’re not concerned about losing all of your data if your house or apartment is destroyed, then you don’t have to worry about things like storing backups off-site.
The bottom line is that I can’t tell you what you’re comfortable with — that’s up to you, and defines your backup strategy. The next few sections highlight how you can use some of the utilities mentioned earlier (and even the standard Linux cp command) to create backup copies of important files.
For home use, the most popular backup method is simply dragging and dropping directories to CD-R or DVD-R media to create spot backups of those directories. The second most popular way of backing up your system is to use hard drives that you can attach to your systems via USB or FireWire ports. On the plus side, unless you’re using a really small removable hard drive, this gives you a larger pool of available storage for backups than a CD or DVD, and enables you to either store more backups of important files and directories or create a single copy of each important directory on removable storage which you can then just update each time you do backups. On the minus side, a removable hard drive is much more expensive than CD-R or DVD-R disks and is more of a pain to store off-site and retrieve each time you do backups.
Archiving and Restoring Files Using tar
The tar program is one of the oldest and most classic Linux/Unix utilities. Though it can write to a backup device, such as a tape drive, the
tar command is most commonly used to create archive files, such as source code, that can easily be shared with others. Archive files created using the tar command typically have the .tar file extension. The GNU tar command, which is the version of tar found on Ubuntu and all other Linux systems, provides built-in compression capabilities, being able to automatically compress tar archives on the fly. Compressed tar archives typically have either the file extension .tgz, indicating that they are compressed (and can be uncompressed) using the gzip application, or the file extension .tar.bz2, indicating that they are compressed (and can be uncompressed) using the bzip2 application. Archive files produced using the tar utility are typically referred to as
tarballs.
Because of its age, you have to be kind when passing arguments to the tar command, because in some cases they must be specified in a particular order.
Creating an archive file using tar is easy. For example, to create a tarball called home_dir_backup.tgz that contains all of the directories in /home, you could use commands like the following:
$ cd /home
$ sudo tar czvf /tmp/home_dir_backup.tgz *
Note that you want to write the backup file somewhere other than the directory that you are backing up. Creating a backup file in the directory that you’re working in would cause the tar command to back up the file that it was creating, which would both not work correctly and waste tremendous amounts of space.
The tar options in this command have the following meanings:
- c: Create a new archive file. If a file by the specified name already exists, it will be overwritten and its original contents will be lost.
- z: Compress the archive file using the same techniques used by the gzip application.
- v: Be verbose, displaying the name of every file added to the archive file as it is added.
- f: Write the output of the tar command to the file whose name appears as the next argument on the command-line. In this example, the output of the tar command would be written to the file /tmp/home_dir_backup.tgz.
After a significant amount of output, the file
/tmp/home_dir_backup.tgz will be created, containing a complete recursive copy of all files and directories under
/home. You can then copy this file to backup media such as a CD or DVD, or to a removable hard drive.
After you’ve created a tarball of a given set of directories, you can easily create another tarball that only contains files and directories that have changed since a specific date (such as the date on which the first tarball was created) using commands like the following:
$ cd /home
$ sudo tar czvf /tmp/home_dir_backup.tgz * --newer “2006-06-23”
This command produces extremely verbose output, even if you drop the
v option, which is puzzling at first. This is an artifact of the format used in tar files. Even when used with the --newer option, the tar file header must contain the complete directory structure in which it is looking for files newer than the specified date. This is necessary so that the tar command can create extracted files in the right directory location. In other words, if you use the tar command to extract the entire contents of a tarball created using the
--newer option, it will create an empty directory hierarchy that only contains files that are newer than the date that was specified when the tarball was created.
Creating tarballs isn’t much fun without being able to retrieve files from them. You can extract various things from a tarball:
- Its entire contents. For example, the following command would extract the entire contents of the tarball home_dir_backup.tgz, creating the necessary directory structure under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz
- One or more directories, which recursively extracts the complete contents of those directories. For example, the following command would extract the directory Ubuntu_Bible and all the subdirectories and files that it contains from the tarball home_dir_backup.tgz, creating the necessary directory structure under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz Ubuntu_Bible
- One or more specific files, which extracts only those files but creates all of the directories necessary to extract those files in their original location. For example, the following command would create the directory Ubuntu_Bible and extract the file chap22.txt from the tarball home_dir_backup.tgz, creating the Ubuntu_Bible directory under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz Ubuntu_Bible/chap22.txt
For more detailed information on the tar command, see its online reference information
(man tar). As one of the oldest Linux/Unix commands, it has accumulated a huge number of command-line options over the years, many of which you will probably never use. However, command-line options are like bullets — you can never have too many.
Making an Up-to-Date Copy of a Local Directory Using cp
If you’re only backing up a few directories and are primarily concerned with keeping copies of the files that you are actively working on, it’s often simplest to just keep copies of those directories on removable media.
The traditional Linux/Unix
cp command provides options that make it easy to create a copy of a specified directory, and then to subsequently update only files that have been updated or that do not already exist in the copy. For example, to back up all of the directories in
/home to a removable drive mounted at
/media/LACIE (LACIE is a popular manufacturer of prepackaged USB hard drives), you could use a command like the following:
$ sudo cp –dpRuvx /home /media/LACIE/home
The cp options in this command have the following meanings:
- d: Don’t de-reference symbolic links, i.e., copy them as symbolic links instead of copying what they point to.
- p: Preserve modes and ownership of the original files in the copies.
- R: Copy the specified directory recursively.
- u: Copy files only if the original file is newer than an existing copy, or if no copy exists.
- v: Display information about each file that is copied. (You may not want to use this option, but it’s interesting, at least the first few times you do this.)
- x: Don’t follow mount points to other filesystems.
After running this command, you will have a copy of every directory under
/home on your system in the directory
/media/LACIE/home. You can then detach your removable drive and store it somewhere safe (preferably off-site).
Any time that you want to update your backup, retrieve the drive and update this copy at any time by simply rerunning this command.
Making an Up-to-Date Copy of a Remote Directory Using rsync
As mentioned earlier,
rsync is a commonly used command-line utility that enables you to push or pull files to or from remote systems.
The rsync program must be configured on the remote systems before you can push or pull file or directories to or from those systems.
To use
rsync on an Ubuntu system, you must first enable it so that the system starts
rsync as a background process, and then also modify the
rsync configuration file to add entries for specific directories that you want to be able to read from and write to remotely. To enable
rsync, edit the file
/etc/defaults/rsync using your favorite text editor and a command like the following:
$ sudo emacs /etrc/default/rsync
In the line that begins with RSYNC_ENABLE, change false to true, and then save the updated file.
Next, create the rsync configuration file before actually starting the
rsync daemon.
Most Linux systems use an Internet service manager such as inetd or xinetd to manage incoming requests for on-demand services such as ftp, tftp, rsync, and vnc. These Internet service managers automatically start the appropriate daemon when an incoming request is received. Though these Internet service managers are available in the Ubuntu repositories, they are not installed by default. On Ubuntu systems, a specific system startup file that starts rsync in daemon mode is provided as /etc/init.d/rsync. If you subsequently install xinetdrsync requests, you will want to disable this file and create the file /etc/xinetd.d/rsync to make sure that the rsync service is enabled on your system.
The /etc/defaults/rsync file just determines whether rsync is enabled or not. The actual configuration information for rsync itself is stored in the file /etc/rsyncd.conf, which does not exist by default on an Ubuntu system. To create this file, use your favorite text editor and a command like the following:
$ sudo emacs /etc/rsyncd.conf
A minimal
rsync configuration file that contains a definition remotely synchronizing the directories under
/home on your system would look something like the following:
uid = root
transfer logging = true
log format = %h %o %f %l %b
log file = /var/log/rsyncd.log
hosts allow = 192.168.6.255/3
[homes]
path = /home
comment = Home Directories
auth users = wvh
secrets file = /etc/rsyncd.secrets
The first section of this file sets parameters for how the
rsync daemon runs. In order, the rsync daemon runs as root (uid), logs all transfers (transfer logging), uses a specific log file format (log format) and log file (log file), and allows access from any host whose IP address is on the 192.168.6.x subnet (hosts allow).
The second section of this file identifies a synchronizable entity known as home that maps to the directory
/home on that system. Synchronization to or from this directory is done as the user
wvh, whose password must be supplied in the file
/etc/rsyncd.secrets.
After saving this file, use the
sudo command and your favorite text editor to create the file
/etc/rsync.secrets, with a command like the following:
$ sudo emacs /etc/rsyncd.secrets
This file should contain an entry for each auth users entry in the
/etc/rsync.conf file, in this case
wvh. Each entry in this file contains the name of a user, a colon, and the plain-text password for that user, as in the following example:
wvh:hellothere
Next, save this file and make sure that it is readable only by the root user on your system using a command like the following:
$ sudo chmod 600 /etc/rsyncd.secrets
You can now start the rsync daemon using the following command:
$ sudo /etc/init.d/rsync restart
You can now create a local copy of the
/home directory on your Ubuntu system using a command like the following, where
ubuntu-system is the name or IP address of the system on which you just configured the rsync daemon:
$ rsync –Havz ubuntu-system-addr::home /media/LACIE/home
The arguments to the rsync command in this example have the following meaning:
- H: Preserve hard links if these exist in any directories that are being copied.
- a: Use archive mode, which preserves ownership, symbolic links, device iles, and so on, and is essentially a shortcut that saves you specifying several other options.
- v: Be verbose, identifying each file that is copied or considered for copying. (You may not want to use this option, but it’s interesting, at least the first few times you run rsync.)
- z: Use compression when transferring files, which improves throughput.
If you have problems using
rsync, you should check the
/var/log/rsyncd.log file (on the system that you are trying to retrieve files from) for error messages and hints for resolving them. If you are not using the verbose option on the host where you are retrieving these files, you may want to use it to see if you can identify (and resolve) any other errors that the host that is trying to retrieve files is reporting.
The rsync configuration file created in this section is just a minimal example, and is not particularly secure. For details about all of the options available in an rsync configuration file and information about making rsync more secure, see the man page for the rsyncd.conf file (man rsyncd.conf).
Installing and Using the backuppc Utility
This section explains how to install, configure, and use the
backuppc utility to back up a variety of hosts on your local network to a central Ubuntu server. Introduced earlier, backuppc is a great application that is both easy to use for a system administrator and empowering for any authorized user. Any authorized user can initiate backups of the machines that they have admin rights to and can also restore files from existing backups of those machines, all using a convenient Web interface.
If you have more than one machine on your home network, or if you’re working in a multimachine enterprise or academic environment, the BackupPC software is well worth a look. Its Web-based interface is easy to set up and use; various types of supported backups are easy to configure, initiate, and monitor; it can back up your Linux, Unix, Windows, and Mac OS X systems; and
the fact that it doesn’t require that you install any special software on the systems that you want to back up makes backuppc a great package.
The backuppc utility supports four different backup mechanisms (known in the BackupPC documentation as
backup transports) to enable you to back up different types of systems. These are the following:
- rsync: Back up and restore via rsync via rsh or ssh. This is a good choice for backing up Linux, Unix, or Mac OS X systems, and you can also use it to back up Microsoft Windows systems that support rsync, such as those running the Cygwin Linux emulation environment.
- rsyncd: Back up and restore via rsync daemon on the client system. This is the best choice for Linux, Unix, and Mac OS X systems that are running an rsync daemon. You can also use this mechanism to back up Microsoft Windows systems that support rsyncd, such as those running the Cygwin Linux emulation environment.
- smb: Back up and restore using the smbclient and the SMB protocol on the backuppc server. This is the best (and easiest) choice to use when backing up Microsoft Windows systems using backuppc, and you can also use it to back up Mac OS X systems or Linux and Unix systems that are running a Samba server.
- tar: Back up and restore via tar, over ssh, rsh, or nfs. This is an option for Linux, Unix, and Mac OS X systems. You can also use this mechanism to back up Microsoft Windows systems that support tar, ssh, rsh, and/or nfs, such as those running the Cygwin Linux emulation environment.
A default backup transport value for all backups is set in the primary backuppc configuration file,
/etc/backuppc/config.pl. The specific mechanism used to back up any particular host can be identified in that host’s configuration file, as discussed later in the sections entitled “Defining a Backup Using rsyncd” and “Defining a Backup Using SMB.”
Although backuppc does a great job of backing up systems running Microsoft Windows and Mac OS X, you should be aware of a few issues. First, backuppc is not suitable for backing up Windows systems so that you can do a bare-metal restore. Backuppc uses the smbclient application on your Ubuntu system to back up Windows disks, so it doesn’t back up Windows ACLs and can’t open files that are locked by a Windows client that is currently running (such as, most commonly, things like Outlook mailboxes). Similarly, backuppc doesn’t preserve Mac OS file attributes. See here for a list of current limitations in using backuppc. It’s a surprisingly short document!
Installing backuppc
Special-purpose backup solutions such as backuppc aren’t installed as part of a default Ubuntu installation because
they’re probably overkill for most people. However, as with all software packages on Ubuntu, the Synaptic Package Manager makes it easy to install
backuppc and the other software packages that it requires. To install backuppc, start the Synaptic Package Manager from the
System ➪ Administration menu and supply your password to start Synaptic. Once the Synaptic application starts, click Search to display the search dialog. Make sure that Description and Name are the selected items to search through, enter backup as the string to search for, and click Search. After the search completes, scroll down in the search results until you see the backuppc package, right-click its name, and select Mark for Installation to select that package for installation from the pop-up menu.
Depending on what software you have previously installed on your Ubuntu system and what you select in Synaptic, a dialog may display that lists other packages that must also be installed, and asks for confirmation. If you see this dialog, click Mark to accept these related (and required) packages. After you are finished making your selections, click Apply in the Synaptic toolbar to install backuppc and friends on your system.
Once the installation completes, the configuration phase starts. During this phase, Synaptic automatically runs a script that sets up the initial account that you will use to access backuppc via your Web server. This process displays a dialog, which tells you the initial password for the Web-based backuppc interface.
Once you see this dialog, write down the password for the backuppc interface and click Forward. Once the remainder of the installation and configuration process completes, you’re ready to back up the system you’re using and the other systems on your network.
Configuring backuppc
On Ubuntu systems, backuppc stores its configuration information in two locations.
- General backuppc configuration information and passwords are stored in files in the directory /etc/backuppc.
- Backup files themselves and host-specific backup configuration information is stored in subdirectories of /var/lib/backuppc.
Backups of a single system take a significant amount of space, which is only compounded when you begin to back up other hosts to a central backup server.
If you didn’t specify using logical volumes when you installed your Ubuntu system, you may want to add a new disk to your system before starting to use backuppc and format that disk as a logical volume.
You can then copy the default contents of
/var/lib/backuppc to the new disk (preserving file permissions and ownership), and mount that disk on the directory
/var/lib/backuppc on the system that you are using for backups. When you need more space to store backups in the future, this will enable you to add other disks to your system and add their space to the logical volume used to store backups. The backuppc utility also provides an archive capability that enables you to migrate old backups to other hosts for archival purposes, freeing up disk space on your primary backup server.
Though not discussed here, setting up archives hosts is discussed in the BackupPC document — which is great, by the way!
The first thing that you should do is to change the backuppc password to something easier to remember than the random string generated during the backuppc installation process. You can do this by issuing the following command:
$ sudo htpasswd /etc/backuppc/htpasswd backuppc
This sequence uses sudo to run the
htpasswd command to change the password for the user
backuppc in the file
/etc/backuppc/htpasswd. When you are prompted for a new password, enter something easier to remember than “TLhCi25f,” which was the default password generated for my backuppc installa-
tion. You will be prompted to reenter the new password to make sure that you typed it correctly.
Identifying Hosts to Back Up
Each host that you want to back up must be identified in the file
/etc/backuppc/hosts. Like all backuppc configuration files, this file is easy to update. Any characters in any lines in this file that follow a hash mark are comments, which help explain the meaning of the various fields used in the file. A minimal backuppc configuration file looks like the following:
host dhcp user moreUsers
localhost 0 backuppc
The first non-comment line in /etc/backuppc/hosts defines the names of the various fields in each line, and should therefore not be modified (This is the line beginning with the word “host” in the example). All other lines represent entries for hosts that will be backed up.
The first actual host entry, for localhost, is a special entry used for backing up system configuration information on the backuppc server, and should not be changed. The fields in each entry that define a host have the following meanings:
- The first field identifies a particular machine, either by hostname, IP address, or NetBios name.
- The second field should be set to 0 for any host whose name can be determined by DNS, the local hosts file, or an nmblookup broadcast. This field can be set to 1 to identify systems whose names must be discovered by probing a range of DHCP addresses, as is the case in some environments where DHCP and WINS are not fully integrated. Setting this field to 1 requires changes in the host-specific configuration file’s $Conf{DHCPAddressRanges} variable to define the base IP address and range of IP addresses that should be probed.
- The third field identifies the name of the person who is primarily responsible for backing up that host. This primary user will receive e-mail about the status of any backup that is attempted. I tend to leave this as the backuppc user, so that this user maintains an e-mail record of all backup attempts, but you can set this to a specific user if you wish.
- The fourth field (which is optional) consists of one or more users who also have administrative rights to initiate backups or restore files for this machine. The names of multiple users must be separated by a comma.
As an example, the hosts file on one of my
backuppc servers looks like the following:
host dhcp user moreUsers
localhost 0 backuppc
192.168.6.64 0 backuppc wvh
64bit 0 backuppc wvh,djf
64x2 0 backuppc juser
win2k 0 backuppc wvh,djf
The backuppc program checks the timestamp on the /etc/backuppc/hosts files each time the backuppc process wakes up, and reloads this file automatically if the file has been updated. For this reason, you should not save changes to the hosts file until you have created the host-specific configuration files, as described in the examples in the next two sections. If the backuppc process reloads the hosts file before you have created the host-specific configuration data and another authorized user initiates a backup of this system, you will either back up the wrong thing or a backup failure will occur. You can always make changes to the hosts file and leave them commented out (by putting a # as the first character on the line) until you have completed the host-specific configuration.
Defining a Backup Using rsyncd
The section earlier in this chapter entitled “Making an Up-to-Date Copy of a Remote Directory Using rsync” explained how to set up
rsync in daemon mode on an Ubuntu system and how to define synchronization entries that can be remotely accessed via rsync. The sample rsync configuration file created in that section defined a synchronization entry called
homes that would enable an authorized user to synchronize the contents of all directories under
/home on a sample Ubuntu system. We’ll use that same configuration file in the example in this section.
The previous section showed how to define entries in the
/etc/backuppc/hosts file for the various hosts that you want to back up via backuppc. The first step in host-specific configuration is to use the sudo command to create a directory to hold host-specific configuration data, logs, and so on. Throughout this section, I’ll use the sample host entry 64bit, which I defined in the section entitled “Identifying Hosts to Back Up” as an example.
- The first step in host-specific configuration is to use the sudo command to create the directory /var/lib/backuppc/64bit, as in the following command: $ sudo mkdir /var/lib/backuppc/64bit
- Next, use the sudo command and your favorite text editor to create a host-specific configuration file named config.pl in that directory, using a command like the following: $ sudo emacs /var/lib/backuppc/64bit/config.pl The contents of this file should be something like the following;
$Conf{XferMethod} = 'rsyncd';
$Conf{CompressLevel} = '3';
$Conf{RsyncShareName} = 'homes';
$Conf{RsyncdUserName} = 'wvh';
$Conf{RsyncdPasswd} = 'hellothere';
The first line identifies the backup mechanism used for this host as rsyncd, which overrides the default backup mechanism specified in the generic /etc/backuppc/config.pl file. The second line sets the compression level for this host’s backups to level 3, which provides a good tradeoff between the CPU load and time required to do compression and the amount of compression that you actually get. The last three entries in this file correspond to the synchronization entry in the sample rsyncd.conf and associated rsyncd.secrets file created in “Making an Up-to-Date Copy of a Remote Directory Using rsync” earlier.
When using backuppc to do automated backups, I like to create a separate authorized user to use rsync for backup purposes, so that the system logs show who actually requested a remote sync operation. To do this, you would add this user (I usually use backuppc) to the auth users entry in the remote host’s /etc/rsyncd.conf file and create an appropriate username/password pair in the remote host’s /etc/rsyncd.secrets file. You would then modify the host-specific backuppc configuration file to use this username and password. I didn’t do this here for simplicity’s sake, but doing this would provide more accurate log data on the client system.
- If the remote system uses an rsync binary other than the default /usr/bin/rsync or the rsync program is listening on a port other than the standard port (873), you should add correct definitions for these to the host-specific configuration file. The default settings for the associated configuration parameters are the following: $Conf{RsyncdClientPort} = 873;
$Conf{RsyncClientPath} = ‘/usr/bin/rsync’;
Next, change the ownership and group of the /var/lib/backuppc/64bit directory to backuppc and change the protection of the configuration file /var/lib/backuppc/64bit/config.pl so that it is not publicly readable (because it contains password information) using the following commands:
$ sudo chmod -Rv backuppc:backuppc /var/lib/backuppc/64bit
$ sudo chmod 600 /var/lib/backuppc/64bit/config.pl
- The last step in creating a host-specific backup definition for backuppc is to cause the backuppc process to reread its configuration data, which you can do by explicitly reloading the configuration file, explicitly restarting the backuppc process, or by sending the associated process a hang-up (HUP) signal. You can force backuppc to reload the configuration file using the following command:
$ sudo /etc/init.d/backuppc reload
The definition for your backup host can now be selected via the backuppc Web interface.
- At this point, you can follow the instructions in the section entitled “Starting Backups in backuppc” to back up this host. The example in this section only backs up the home directories of users on the remote machine. To recursively back up other directories, you would simply create other synchronization entities for those directories in the remote host’s /etc/rsyncd.conf file, and then add entries for those synchronization entities to the host-specific configuration file. For example, to back up synchronization entries named homes, /, and /boot, you would change the host-specific RsyncShareName entry to look like the following:
$Conf{RsyncShareName} = [‘/’, ‘homes’, ‘/boot’];
If you back up multiple filesystems or synchronization points, you may create a custom set of arguments to the rsync command in the host-specific configuration file. This enables you to add options such as --one-file-system, which causes backuppc to back up each filesystem separately, simplifying restores.
You can also add options to exclude certain directories from the backups, which you will certainly want to do if you are backing up a remote system’s root directory (‘/’), as in following examples:
$Conf{RsyncArgs} = [
# original arguments here
‘--one-file-system’,
‘--exclude’, ‘/dev',
‘--exclude’, ‘/proc’,
‘--exclude’, ‘/media’,
‘--exclude’, ‘/mnt’,
‘--exclude’, ‘/lost+found’,
];
These settings would prevent backups of
/dev, which contains
device nodes and is dynamically populated at boot time on modern Linux systems,
/proc, which is the mount point for an in-memory filesystem that contains transient data, directories such as
/media and
/mnt on which removable media is often temporarily mounted, and /lost+found, which is a directory used during filesystem consistency checking. You can also exclude directories from rsync backups using the
BackupoFilesExclude directive, as in the following example:
$Conf{BackupFilesExclude} = [‘/dev’, /proc’, ‘/media’, ‘/mnt’, ‘/lost+found’];
The backuppc program reads the configuration settings in
/etc/backuppc/config.pl first, and then loads host-specific configuration settings, which enables the
/etc/backuppc/config.pl file to provide default settings for all backups. After you have used backuppc for a while and are comfortable with various settings, you may want to consider modifying the default settings in the
/etc/backuppc/config.pl file for configuration variables such as $Conf{RsyncArgs}, $Conf{BackupFilesExclude}, and $Conf{CompressLevel}, to minimize the number of entries that you have to create in each of your host-specific configuration files.
Defining a Backup Using SMB
The section of this chapter entitled “Identifying Hosts to Back Up” showed how to define entries in the
/etc/backuppc/hosts file for the various hosts that you want to back up via backuppc. The first step in host-specific configuration is to use the sudo command to create a directory to hold host-specific configuration data, logs, and so on. Throughout this section, I’ll use the sample host entry win2k from the sample hosts file as an example. As you might gather from its name, this is indeed a system running Microsoft Windows 2000. There’s no escaping from the Borg.
- The first step in host-specific configuration is to use the sudo command to create the directory /var/lib/backuppc/win2k, as in the following command:
$ sudo mkdir /var/lib/backuppc/win2k
- Next, use the sudo command and your favorite text editor to create a host-specific configuration file named config.pl in that directory, using a command like the following:
$ sudo emacs /var/lib/backuppc/win2k/config.pl
The contents of this file should be something like the following;
$Conf{XferMethod} = 'smb';
$Conf{CompressLevel} = '3';
$Conf{SmbShareName} = ['wvh', 'djf'];
$Conf{SmbShareUserName} = 'backuppc';
$Conf{SmbSharePasswd} = 'hellothere';
The first line identifies the backup mechanism used for this host as smb, which overrides the default backup mechanism specified in the generic /etc/backuppc/config.pl file. The second line sets the compression level for this host’s backups to level 3, which provides a good tradeoff between the CPU load and time required to do compression and the amount of compression that you actually get. The last three entries in this file define the Windows shares that you want to back up, the name of an authorized user who has access to these shares, and the password for that user.
When using backuppc to back up Microsoft Windows systems, you should create a Windows user that you will only use to do backups, and then add this user to the standard Windows Backup Operators group. This prevents you from having to put your Windows administrator password in the backuppc configuration files. Even though you’ll protect those files so that randoms can’t read them, the fewer places where you write down a password, the better, especially one with the keys to your entire Windows kingdom.
- Next, change the ownership and group of the /var/lib/backuppc/win2k directory to backuppc and change the protection of the configuration file /var/lib/backuppc/win2k/config.pl so that it is not publicly readable (because it contains password information) using the following commands: $ sudo chmod -Rv backuppc:backuppc /var/lib/backuppc/win2k
$ sudo chmod 600 /var/lib/backuppc/win2k/config.pl
- The last step in creating a host-specific backup definition for backuppc is to cause the backuppc process to reread its configuration data, which you can do by explicitly reloading the configuration file, explicitly restarting the backuppc process, or by sending the associated process a hang-up (HUP) signal. You can force backuppc to reload the configuration file using the following command: $ sudo /etc/init.d/backuppc reload
The definition for your backup host can now be selected via the backuppc Web interface. At this point, you can follow the instructions in the section entitled “Starting Backups in backuppc” to back up this host.
The example in this section only backs up shares that correspond to the home directories of selected users on the remote machine. As mentioned earlier in this text, backuppc backups do not support bare-metal restores of Windows systems, and I therefore typically don’t back up shares such as C$, which is a default Windows share that represents your system’s boot drive. You may find it useful to do so to make sure that you have backup copies of drivers, the registry, and so on, but I find it simpler to start from scratch when reinstalling Windows.
Windows systems accumulate so much crap in their filesystems over time that doing a fresh installation from your distribution media often frees up a surprising amount of space.
If you have several identical systems, restoring partition images created with
Norton Ghost or the
Linux partimage or
g4u utilities is always the fastest way to rebuild a Windows system without having to locate the drivers for every device that you will ever want to use with your rebuilt system and reinstalling all of your favorite applications.
The backuppc program reads the configuration settings in
/etc/backuppc/config.pl first, and then loads host-specific configuration settings, which enables the
/etc/backuppc/config.pl file to provide default settings for all backups. After you have used backuppc for a while and are comfortable with various settings, you may want to consider modifying the default settings in the
/etc/backuppc/config.pl file for configuration variables, such as $Conf{CompressLevel}, to minimize the number of entries that you have to create in each of your host-specific configuration files.
Starting Backups in backuppc
Thanks to backuppc’s Web orientation, starting backups, viewing the status of those backups, and checking the backup history for any host is impressively easy.
- To start a backup in backuppc, connect to the backuppc Web interface using the URL http://hostname/backuppc, where hostname is the name of the host on which the backuppc server is running. A dialog displays in which you are prompted for the login and password of an authorized user. Once you enter the user/password combination for a user listed in the file /etc/backuppc/htpasswd, the backuppc server’s home page displays.
- Once this screen displays, click the Select a host... drop-down box and select one of the hosts from the list that displays.
- Selecting the name of any host takes you to a summary page for that host, which provides status information, lists authorized users who can back up and restore files to this host using backuppc, and displays the last e-mail that was sent about this host. Each system’s home page displays the subject of the last e-mail sent to the owner of this host. E-mail is only sent occasionally, so seeing a historical problem report does not mean that this problem is still occurring.
- Once this page displays, you can scroll down on the page to see additional status information about available backups, any transfer errors that occurred during backups, and other tables that show the status of the pool where backup files are archived and the extent to which existing backups have been compressed to save disk space.
- To start a backup, click either Start Full Backup to start a full (archive) backup of the system, or Start Incr Backup to start an incremental backup containing files that have changed since the last full backup.
- The confirmation page displays. Clicking Start Full Backup (or Start Incr Backup for an incremental backup) queues the backup and displays a link that you can click to return to the main page for that host to monitor the state of the backup.
Restoring from Backups in backuppc
Thanks to backuppc’s Web orientation and the fact that backuppc backups are stored online on the backup server, restoring files from backuppc can be done online, by any authorized user whose name is associated with that host in the
/etc/backuppc/hosts file. Backuppc enables you to browse through online backups, interactively select the files and directories that you want to restore, and restore them in various ways.
- To begin restoring files or directories, click the name of the full or incremental backup in which they are located. A screen displays.The bottom of the screen displays a hierarchical listing of the files and directories that are contained in the full or incremental backup that you selected. If you selected an incremental backup, the contents of that incremental backup are overlaid on the contents of the previous full backup to give you an accurate snapshot of the contents of your system when the backup was done. You can drill down into the backup by selecting directories from the tree view at the left, or you can drill down into individual directories by selecting from the view of the current directory shown at the right of the main window.
- Once you have selected all of the files and directories that you want to restore, scroll to the bottom of the restore page and click restore selected files. A page that enables you to specify how you want to restore those files displays. You have three options when restoring files using the backuppc Web interface:
- Direct restore: Selecting this option restores files directly to the host from which they were backed up. When doing a direct restore, you have the option of restoring files in the locations from which they were originally backed up, or into a subdirectory that backuppc will create for you if it does not already exist. (The latter is almost always a good idea so that you don’t accidentally overwrite any files that you don’t actually mean to.) To select this option, enter the name of any subdirectory that you want to use (I usually specify one called tmp) and click Start restore.
- Download Zip archive: Selecting this option restores the selected files and directories into a zip-format archive that you can download to your desktop and manually extract the contents of. When selecting this option, you can optionally specify the compression level used in the zip file, which can be important if you are restoring large numbers of files. To select this option, click Download Zip file.
- Download Tar archive: Selecting this option restores the selected files and directories into a tar-format archive that you can download to your desktop and manually extract the contents of. To select this option, click Download Tar file.
If you selected the Direct restore option, backuppc displays a confirmation screen. This lists the files and directories that you selected for restoration and confirms the location to which they will be restored, including the name of any subdirectory that you specified.
- To proceed, click Restore, If you selected the Zip or Tar archive options, the backuppc application displays your Web browser’s standard file download dialog after the archive file has been created.
As you can see from this section (and the preceding sections), backuppc provides a powerful, flexible interface for backing up and restoring files on many different systems to a single backuppc server. All you need are a few configuration files and sufficient disk space, and lost files (and the lost time that is usually associated with them) can be a thing of the past.