Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

07 April 2010

Ubuntu --- Setting Up an NFS Server



The Network File System that is the de facto standard for sharing 
directories between Unix-like systems over a network. 
NFS is simple, lightweight, and fast — and its 
implementation has been freely available 
for any Unix-like system 
since the early 1980s.


NFS

Sharing groups of files that multiple people need access to is standard operating procedure in business today and, thanks to home networking, is getting to be SOP for home use as well. Providing centralized access to a collection of audio materials that you’ve extracted from your CD collection or the vacation photos from your most recent trips is just as important to the home user as providing centralized access to your procedure manuals and software source repository is to the business user or SOHO developer. Luckily, Linux systems provide several ways of sharing directories over a network, some oriented primarily toward Linux and other Unix-like systems, including Apple’s Mac OS X, and others oriented more toward Microsoft Windows systems (which Linux and Mac OS X systems can also access, of course). This text discusses how to set up one of your Ubuntu Linux systems so that other systems can access its directories over the network using NFS(Network File System), which is popularly used on all Linux and Unix-like systems. (For information on setting up your Ubuntu system to share directories with Microsoft Windows systems, see “Setting Up a Samba Server.”
    Sun Microsystems’ Network File System, better known simply as NFS, is the most common networked filesystem in use today, largely because it comes preintalled and for free with almost every Unix and Unix-like system. NFS clients and servers are also available for almost every type of modern computer system, including those running Microsoft Windows and Apple’s Mac OS X.
    Here i provide an overview of NFS, discuss different versions of NFS and their capabilities, and discuss the various applications associated with NFS. Beyond this background material, this text focuses on explaining how to set up your Ubuntu system to be an NFS file server — how to access NFS file servers from other systems is also discussed. I conclude by discussing NIS(Network Information System), a distributed authentication mechanism that is commonly used in conjunction with NFS.


Overview of the Network File System
NFS is a network filesystem that provides transparent access to files residing on remote disks. Network filesystems are often commonly referred to as distributed filesystems, because the files and directories that they provide access to may be physically located on many different computer systems that are distributed throughout your home, academic environment, or business. Developed at Sun Microsystems in the early  1980s, the NFS protocol has been revised and enhanced several times between then and now, and is available on all Linux, Unix, and Unix-like systems and even for Windows systems from many third-party software vendors. The specifications for NFS have been publicly available since shortly after it was first released, making NFS a de facto standard for distributed filesystems.
    NFS is the most common distributed filesystem in use today, largely because it is free and available for almost every type of modern computer system. NFS enables file servers to export centralized sets of files and directories to multiple client systems. Good examples of files and directories that you may want to store in a centralized location but make simultaneously available to multiple computer systems are users’ home directories, site-wide sets of software development tools, and centralized data resources such as mail queues and the directories used to store Internet news bulletin boards. The following are some common usage scenarios for using NFS:
  • Sharing common sets of data files: Sharing files that everyone on your network wants to access, whether they are audio files, business data, or the source code for tomorrow’s killer app, is the most common use of any type of networked filesystem.
  • Explicitly sharing home directories: Suppose that the home directories for all of your users are stored in the directory /export on your NFS file server, which is automatically mounted on all of your computer systems at boot time. The password file for each of your systems would list your user’s home directories as /export/user-name. Users can then log in on any NFS client system and instantly see their home directory, which would be transparently made available to them over the network.
An alternative to the previous bullet is to automatically mount networked home directories using an exported NFS directory that is managed by an NFS automount daemon. Whenever access to a directory managed by an automount daemon is requested by a client, the daemon automatically mounts that directory on the client system. Automounting simplifies the contents of your server’s  /etc/exports file by enabling you to export only the parent directory of all home directories on the server, and letting the automounter manage that directory (and therefore its subdirectories) on each client.
 See the quote at the end of this text for general information on automounting, a complete discussion of which is outside the scope of this text.
  • Sharing specific sets of binaries across systems: Suppose that you want to make a specific set of GNU tools available on all of the systems in your computing environment, but also wanted to centralize them on an NFS server for ease of maintenance and updating. To ensure that configuration files were portable across all of your systems, you might want to make these binaries available in the directory /usr/gnu regardless of the type of system that you were using. You could simply build binaries for each type of system that you support, configuring them to be found as /usr/gnu but actually storing them in directories with names such as /export/gnu/ubuntu, /export/gnu/solaris8, and so on. You would then configure each client of a specified type to mount the appropriate exported directory for that system type as /usr/gnu. For example, /export/gnu/ubuntu would be mounted as /usr/gnu on Ubuntu systems, /export/gnu/solaris8 would be mounted as /usr/gnu on Solaris systems, and so on. You could then simply put /usr/gnu/bin in your path and the legendary “right thing” would happen regardless of the type of system that you logged in on.
As you’ll see in a little, NFS is easy to install, easy to configure, and provides a flexible networked filesystem that any Ubuntu, other Linux, Unix, or Unix-like system can quickly and easily access. In some cases, it’s easy to trip over a few administrative gotchas, but Ubuntu provides powerful and easy-to-use tools that simplify configuring NFS file servers to “do the right thing.”


Understanding how NFS Works
If you simply want to use NFS and aren’t too concerned about what’s going on under the hood, you can skip this section. However, this section provides the details of many internal NFS operations because some enquiring minds do indeed want to know and because, frankly, it’s just plain interesting to see some of the hoops that NFS clients and servers have to use to successfully communicate between different types of computer systems, often with different types of processors. So, if you’re interested, read on, McDuff!
    The underlying network communication method used by NFS is known as Remote Procedure Calls (RPCs), which can use either the lower level Universal Datagram Protocol (UDP) as their network transport mechanism (NFS version 2) or TCP (NFS version 3). For this reason, both UDP and TCP entries for port 2049, the port used by the NFS daemon, are present in the Linux /etc/services file. UDP minimizes transmission delays because it does not attempt to do sequencing or flow control, and does not provide delivery guarantees — it simply sends packets to a specific port on a given host, where some other process is waiting for input.
    The design and implementation of RPCs make NFS platform-independent, interoperable between different computer systems, and easily ported to many computing architectures and operating systems.
RPCs are a client/server communication method that involves issuing RPC calls with various parameters on client systems, which are actually executed on the server. 
The client doesn’t need to know whether the procedure call is being executed locally or remotely — it receives the results of an RPC in exactly the same way that it would receive the results of a local procedure call.
    The way in which RPCs are implemented is extremely clever. RPCs work by using a technique known as marshalling, which essentially means packaging up all of the arguments to the remote procedure call on the client into a mutually agreed-upon format. This mutually agreed-upon format is known as eXternal Data Representation (XDR), and provides a sort of computer Esperanto that enables systems with different architectures and byte-orders to safely exchange data with each other. The client’s RPC subsystem then ships the resulting, system-independent packet to the appropriate server. The server’s RPC subsystem receives the packet, and unmarshalls it to extract the arguments to the procedure call in its native format. The RPC subsystem executes the procedure call locally, marshalls the results into a return packet, and sends this packet back to the client. When this packet is received by the client, its RPC subsystem unmarshalls the packet and sends the results to the program that invoked the RPC, returning this data in exactly the same fashion as any local procedure call. Marshalling and unmarshalling, plus the use of the common XDR data representation, make it possible for different types of systems to transparently communicate and execute functions on each other. RPC communications are used for all NFS-related communications, including:
  1. communications related to the authentication services used by NFS (NIS or NIS+)
  2. managing file locks
  3. managing NFS mount requests
  4. providing status information, and 
  5. requests made to the NFS automount daemon
To enable applications to contact so many different services without requiring that each communicate through a specific, well-known port, NFS lets those services dynamically bind to any available port as long as they register with its central coordination service, the portmapper daemon. The portmapper always runs on port 111 of any host that supports RPC communications, and serves as an electronic version of directory assistance. Servers register RPC-related services with the portmapper, identifying the port that the service is actually listening on. Clients then contact the portmapper at its well-known port to determine the port that is actually being used by the service that they are looking for.
    Communication failures occur with any networked communication mechanism, and RPCs are no exception. As mentioned at the beginning of this section, UDP does not provide delivery guarantees or packet sequencing. Therefore, when a response to an RPC call is not received within a specific period of time, systems will resend RPC packets. This introduces the possibility that a remote system may execute a specific function twice, based on the same input data. Because this can happen, all NFS operations are idempotent, which means that they can be executed any number of times and still return the same result — an NFS operation cannot change any of the data that it depends upon.
    Even though NFS version 3 uses TCP as its network transport mechanism, the idea of idempotent requests is still part of the NFS protocol to guarantee compatibility with NFS version 2 implementations. As another way of dealing with potential communication and system failures, NFS servers are stateless,meaning that they do not retain information about each other across system restarts.
If a server crashes while a client is attempting to make an RPC to it, the client continues to retry the RPC until the server comes back up or until the number of retries exceeds its configured limit, at which time the operation aborts. 
Stateless operation makes the NFS protocol much simpler, because it does not have to worry about maintaining consistency between client and server data. The client is always right, even after rebooting, because it does not maintain any data at that point.
    Although stateless operation simplifies things, it is also extremely noisy, inefficient, and slow. When data from a client is saved back to a server, the server must write it synchronously, not returning control to the client until all of the data has been saved to the server’s disk. As described in the next section, “Comparing Different Versions of NFS,” newer versions of NFS do some limited write caching on clients to return control to the client applications as quickly as possible. This caching is done by the client’s rpciod process (RPC IO Daemon), which stores pending writes to NFS servers in the hopes that it can bundle groups of them together and thus optimize the client’s use of the network. In the current standard version of NFS (NFS version 3), cached client writes are still essentially dangerous because they are only stored in memory, and will therefore be lost if the client crashes before the write completes.
    In a totally stateless environment, a server crash would make it difficult to save data that was being modified on a client back to the server once it is available again. The server would have no way of knowing what file the modified data belonged to because it had no persistent information about its clients. To resolve the problem, NFS clients obtain file handles from a server whenever they open a file. File handles are data structures that identify both the server and the file that they are associated with. If a server crashes, clients retry their write operations until the server is available again or their timeout periods are exceeded. If the server comes back up in time, it receives the modified data and the file handle from the client, and can use the file handle to figure out which file the modified data should be written to.
    The lack of client-side caching also has a long-term operational impact because it limits the type of dependencies that NFS clients can have on NFS servers. Because clients do not cache data from the server, they must re-retrieve any information that they need after any reboot. This can definitely slow the reboot process for any client that must execute binaries located on an NFS server as part of the reboot process. If the server is unavailable, the client cannot boot. For this reason, most NFS clients must contain a full set of system binaries, and typically only share user-oriented binaries and data via NFS.


Comparing Different Versions of NFS
NFS has been around almost since the beginning of Unix workstation time, appearing on early Sun Microsystems workstations in the early 1980s. This section provides an overview of the differences between the four different versions of NFS, both for historical reasons, and to illustrate that NFS is by no means a done deal. NFS 4 resolves the biggest limitations of NFS 3, most notably adding real client-side data caching that survives reboots. The most common version of NFS used on systems today is NFS version 3, which is the version that i focus here.  The following list identifies the four versions of NFS and highlights the primary features of each:
  • Version 1: The original NFS protocol specification was used only internally at Sun during the development of NFS, and I have never been able to find any documentation on the original specification. This would only be of historical interest.
  • Version 2: NFS version 2 was the first version of the NFS protocol that was released for public consumption. Version 2 used UDP exclusively as its transport mechanism, and defined the 18 basic RPCs that made up the original public NFS protocol. Version 2 was a 32-bit implementation of the protocol, and therefore imposed a maximum file size limitation of 2GB on files in NFS and used a 32-byte file handle. NFS version 2 also limited data transfer sizes to 8KB
  • Version 3: NFS version 3 addressed many of the shortcomings and ambiguities present in the NFS version 2 specification, and took advantage of many of the technological advances in the 10+ years between the version 2 and 3 specifications. Version 3 added TCP as a network transport mechanism, making it the default if both the client and server support it; increased the maximum data transfer size between client and server to 64KB; and was a full 64-bit implementation, thereby effectively removing file size limitations. All of these were made possible by improvements in networking technology and system architecture because the NFS version 2 was released. Version 3 also added a few new RPCs to those in the original version 2 specification, and removed two that had never been used (or implemented in any NFS version that I’ve ever seen). To improve performance by decreasing network traffic, version 3 introduced the notion of bundling writes from the client to the server, and also automatically returned file attributes with each RPC call, rather than requiring a separate request for this information as version 2 NFS had done.
  • Version 4: Much of the NFS version 4 protocol is designed to position NFS for use in Internet and World Wide Web environments by increasing persistence, performance, and security. Version 4 adds persistent, client-side caching to aid in recovery from system reboots with minimal network traffic, and adds support for ACLs and extended file attributes in NFS filesystems. Version 4 also adds an improved, standard API for increased security through a general RPC security mechanism known as Remote Procedure Call Security - Generic Security Services (RPCSEC_GSS). This mandates the use of the Generic Security Services Application Programming Interface (GSS-API, specified in RFC 2203) to select between available security mechanisms provided by clients and servers.


Installing an NFS Server and Related Packages

To install the packages required to run and monitor an NFS server on your Ubuntu system, start the Synaptic Package Manager from the System ➪ Administration menu, and click Search to display the search  dialog. Make sure that Names and Descriptions are the selected items to look in, enter nfs as the string to search for, and click Search. After the search completes, scroll down until you see the nfs-common and  nfs-kernel-server packages, right-click each of these packages and select Mark for Installation to select that package for installation from the pop-up menu.
As you can see the Ubuntu repositories provide two NFS servers: one that runs in the Linux kernel and another that runs in user space.
  1. The kernel-based NFS server is slightly faster, provides some command-line utilities, such as the exportfs utility, that you may want to use to explicitly share directories via NFS (known as exporting directories in NFS-speak) and monitor the status of directories that you share using NFS.
  2. The user-space NFS server is slightly easier to debug and control manually.
This & explains how to install and use  the kernel-based NFS server — if you have problems sharing directories using NFS, you may want to subsequently install the user-space NFS server to help with debugging those problems.
    Depending on what software you have previously installed on your Ubuntu system and what you select in Synaptic, a dialog may display that lists other packages that must also be installed, and ask for confirmation. When you see this dialog, click Mark to accept these related (and required) packages. Next, click Apply in the Synaptic toolbar to install the kernel-space NFS server and friends on your system.
    Once the installation completes, you’re ready to share data on your system with any system that supports NFS.



Using the Shared Folder Tool to Export Directories
At this point in it should come as no surprise that Ubuntu Linux provides an easy-to-use graphical tool (shares-admin) that simplifies the process of defining and configuring the directories that you want to export via NFS from your Ubuntu system. To start this tool, select System ➪ Administration ➪
Shared Folders
. After supplying your password in the administrative authentication dialog that displays, the Shared Folder tool starts. To define a directory that you want to share via NFS, click Add to display the dialog.
    To export a directory using NFS, click the Share with item and select NFS as the sharing protocol that you are working with. This displays the settings that are relevant for NFS. As you can see the default exported/shared directory that is initially selected when you start the Shared Folder admin tool is your home directory. In this example, I’m going to share the directory that contains my online audio collection. To specify another directory for sharing, click the Path item and select Other from the drop-down menu to display the directory selec-
tion dialog.
    To select a directory somewhere on your system, click root and navigate through the directory tree on your system to select the directory that you want to export, which in this example is my /opt2 directory. Click Open to select that directory (or whatever directory you want to export) and return to the dialog  which now displays the name of the newly selected directory in the Path field.
    Next, you’ll need to identify the hosts that you want to be able to access (i.e., mount) this directory over the network. To define these, click the Add host button to display the dialog.  This dialog provides several ways to identify the hosts that can mount and access the directory that you are sharing. The Allowed hosts drop-down menu provides four choices:
  • Hosts in the eth0 network: Enables anyone who can reach your machine via your system’s eth0 network interface to mount and access the shared directory.
  • Specify hostname: Enables you to identify the name of a specific host that can mount and access the shared directory. Selecting this item displays an additional field on the basic dialog in which you can enter the fully-qualified or local hostname of a machine that you want to be able to mount and access the shared directory.
  • Specify IP address: Enables you to identify the IP address of a specific host that can mount and access the shared directory. Selecting this item displays an additional field on the basic dialog in which you can enter the IP address of a machine that you want to be able to mount and access the shared directory.
  • Specify network: Enables you to identify the IP specification for a subnet that can mount and access the shared directory. All hosts with IP addresses that are on this subnet will be able to mount and access the shared directory. Selecting this item displays two additional fields on the basic dialog in which you can enter the subnet and netmask of the network whose hosts that you want to be able to mount and access the shared directory.
If you are identifying authorized hosts who can mount and access your shared directory by hostname, IP address, or subnet, you can always explicitly allow multiple hosts to mount and access the shared directory by using Add hosts button multiple times to define a specific set of hosts.
    In this example, I’ll enable access to all hosts on the 192.168.0.0 subnet to my shared directory. Note that this dialog enables you to grant read-only access to a shared directory by selecting the Read only checkbox. This provides a convenient way to give others access to shared data but prevents them from modifying anything in the shared directory. There is also slightly less overhead in exporting a directory to other systems as a read-only directory, so you may want to consider doing this if others need access to the shared data but you’re sure that they’ll never want to change anything there (or you don’t want them to change anything there).
    Clicking OK in the dialog returns you to the dialog shown previously which is now updated to show the /opt2 directory that I am sharing in this example. To continue, click OK to redisplay the dialog originally which now contains the settings for our newly defined NFS shared directory.
    Almost done! To subsequently modify or update the settings for any shared directory, you can right-click its name in the Shared Folder tool and click Properties to display the specific settings for that shared folder. To begin sharing the folder, click OK to start the specified type of file sharing and close the Shared Folder tool.


Verifying NFS Operations
The kernel NFS server package includes a utility called exportfs that you can use to list the directories that an NFS server is currently exporting from your system and reexport any new directories that you have just added to your system’s NFS configuration, which is stored in the file /etc/exports. After you follow the instructions in the previous section, the contents of the /etc/exports file on your Ubuntu system are the following:
# /etc/exports: the access control list for filesystems which may
#                        be exported to NFS clients. See exports(5).
/opt2                    192.168.0.0/255.255.0.0(rw)
Any line in this file that does not begin with a hash mark is an entry that defines a directory that is being exported by NFS, and is commonly referred to as an export specification. To verify that the /opt2 directory is being exported from your system (and to reexport it if necessary), you can use the exportfs –av command, which exports all available directories in a verbose fashion as shown in the following example:
$ sudo exportfs -a
exportfs: /etc/exports [3]: No ‘sync’ or ‘async’ option specified \
for export “192.168.0.0/255.255.0.0:/opt2”.
Assuming default behavior (‘sync’).
NOTE: this default has changed from previous versions
exporting 192.168.0.0/255.255.0.0:/opt2

This output demonstrates that the directory /opt2 is being exported to all hosts whose IP addresses match 192.168.0.0.
                  NFS Users and Authentication
NFS uses the user ID (UID) and group ID (GID) of each user from a system’s password file (/etc/passwd) to determine who can write to and access exported files and directories, based on the UID and GID that owns those directories on the file server. This means that all of your users should have the same user ID and group ID on all systems to which NFS directories such as home directories are exported.
  • In small networks, it is often sufficient to make sure that you create the same user and groups on all of your systems, or to make sure that the password and group files on your file server contain the correct entries for all of the user and groups who will access any directory that it exports.
  • In larger networks, this is impractical, so you may want to consider network-oriented authentication mechanisms, such as the Network Information System (NIS), which was developed by Sun Microsystems specifically for use with NFS. Unfortunately, discussing NIS installation and setup is outside of the scope of this text, but you can find a variety of excellent information about it online in documents such as the NIS HOWTO. This document is available in more languases.
You’ll not that the exportfs also complains about a missing option in the export specification for the /opt2 directory. In the /etc/exports file shown earlier in this section, you’ll notice that the last entry in the /opt2 export specification ends with “(rw)”. This final section of an export specification specifies any options associated with a specific exported directory. In this case, the only option  specified is rw, which means that the directory is being exported as read/write so that authorized users can write to that directory, as well as read from it. (See the quote later in this section entitled “NFS Users and Authentication” for more information about how NFS identifies users.)
    The warning message displayed by the exportfs command has to do with whether changes to files in a read/write directory are immediately written to the remote file server (sync, for synchronous operation), or are written lazily, whenever possible (async, for asynchronous operation). Synchronous operation  is slower, because your system has to wait for writes to the remote file server to complete, but is safer because you know that your changes have been written to the file server (unless the network connection goes down, in which case all bets are off). Older versions of NFS simply assumed synchronous operation, but nowadays, NFS likes you to explicitly specify which option you want to use. To eliminate this error message, you can therefore edit the /etc/exports file directly to change rw to rw,async, which I generally recommend because it is faster than synchronous operation. After you make this change, the /etc/exports file looks like the following:
# /etc/exports: the access control list for filesystems which may be exported
#                        to NFS clients. See exports(5).
/opt2                    192.168.0.0/255.255.0.0(rw,async)
You can now reexport this directory for asynchronous updates, and the exportfs utility is much happier, as in the following example:
       $ sudo exportfs -av
       exporting 192.168.0.0/255.255.0.0:/opt2
The nfs-common package provides a utility called showmount, which you can also run on an NFS server to display the list of directories exported by that file server, but which will not reexport them or change them in any way. Using the showmount command with its –e option (to show the list of exported directories on the test system used here) provides output like the following:
$ sudo showmount -e
 Export list for ulaptop:
 /opt2 192.168.0.0/255.255.0.0
For complete information about the exportfs and showmount utilities, see their online reference information, which is available by typing man exportfs or man showmount from any Ubuntu command line, such as an xterm or the GNOME Terminal application.


Manually Exporting Directories in /etc/exports
Although everyone loves graphical tools, it’s sometimes nice to simply edit the underlying files that these tools manipulate — it can be much faster, and can be done from any device on which you can start a text editor.
    As mentioned in the previous section, the file that contains exported directory information for NFS file servers is /etc/exports. Entries in this file have the following form:
full-path-name-of-exported-directory hosts(mount-options)
Each such entry in the /etc/exports file is referred to as an export specification. Hosts can be listed by IP address, hostname, or subnet to state that only those hosts can access a specific directory exported by NFS. Entries such as 192.168.6.61 would limit access to a specific NFS directory from that host, while entries such as 192.168.6.* or 192.168.6.0 would limit access to a specific NFS directory to hosts on that subnet. By default, all hosts that can reach an NFS server have access to all exported directories (which is represented by a * preceding the mount options).
    As you’d expect, many mount options are available. Some of the more commonly used mount options are the following:
  • all_squash: Maps all NFS read or write requests to a specific user, usually “anonymous.” This option is often used for public resources such as directories of USENET news, public FTP and download areas, and so on. All files written to an NFS directory that is exported with the all_squash mount option will be assigned the UID and GID of the user anonymous or some other UID and GID specified using the anonuid and anongid mount options. The default is no_all_squash, which preserves all UIDs and GIDs.
  • insecure: Enables access to NFS directories by NFS clients that are running on non-standard NFS network ports. By default, this option is off, and NFS requests must originate from ports where the port number is less than 1024. The insecure option may be necessary to enable access from random PC and Macintosh NFS clients. If you need to use this option, you should limit machines using the NFS option to a home network or secure corporate Intranet. You should not use this option on any machines that are accessible from over the Internet, because it introduces potential security problems.
  • no_root_squash: Lets root users on client workstations have the same privileges as the root user on the NFS file server. This option is off by default.
  • ro: Exports directories that you don’t want users to be able to write to because it is read-only. The default is rw, which enables read/write access.
  • sync: Forces writes to the NFS server to be done synchronously, where the client waits for the writes to complete before returning control to the user. This is the default — as explained in the previous section, you can also specify asynchronous operation (async), which is slightly faster.
See the man page for /etc/exports (by using the man 5 exports command) for complete information on the options that are available in this file. Once you have created an entry for a new exported directory in your /etc/exports file, you can export that directory by rerunning the exportfs command with the –r option, which tells the NFS server to reread the /etc/exports file and make any necessary changes to the list of directories that are exported by that NFS server.

              Automounting NFS Home Directories
Automounting is the process of automatically mounting NFS filesystems in response to requests for access to those filesystems. Automounting is controlled by an automount daemon that runs on the client system.
    In addition to automatically mounting filesystems in response to requests for access to them, an automount daemon can also automatically unmount volumes once they have not been used for a specified period of time.
    Using an automount daemon prevents you from having to mount shared NFS directories that you are not actually using at the moment. Mounting all NFS directories on all clients at all times causes a reasonable amount of network traffic, much of which is extraneous if those directories are not actually being used. Using the NFS automount daemon helps keep NFS-related network traffic to a minimum.
    At the moment, two different automount daemons are available for Linux.
  1. The amd automount daemon runs in user space on client workstations and works much like the original SunOS automounter. The amd automounter is configured through the file /etc/amd.conf. For more information about amd, see the home page for the Automount Utilities
  2. The other automount daemon is called autofs and is implemented in Linux kernel versions 2.2 and greater. The kernel automounter starts one user-space automount process for each top-level automounted directory. The autofs automounter daemon is configured through the file /etc/auto.master or through NIS maps with the same name. Because the autofs daemon is part of the kernel, it is faster and the automounter of choice for many Linux distributions (such as Ubuntu).
Both of these packages are available in the Ubuntu repositories, but discussing all of the nuances of automounting is outside the scope of this text.


Getting More Information About NFS and Related Software
Not surprisingly, the Web provides an excellent source of additional information about NFS and NIS. For more information, consult any of the following:

    Ubuntu --- Setting Up a Samba Server

    Samba is probably the best example of the value that open 
    source software can bring to modern computing, 
    enabling seamless data sharing between Unix-
    like and Microsoft Windows systems thanks 
    to a fantastic combination of reverse 
    engineering and insightful
    development.

    CIFS/SMB


    Like it or not, the planet is populated with Windows machines. As you can see  from statements like that, I’m probably as guilty as anyone of propagating the “us vs. them” mentality when it comes to Windows vs. Linux. How I personally feel about Windows and Microsoft really doesn’t matter — the important thing here is to discuss the various ways in which software available on Linux systems makes it easy to integrate Linux and Windows filesystems in both directions, getting features such as automatic printer sharing as freebies along the way.
        You’d have to have been living in a cave for the last eight or so years not to have heard of Samba, arguably one of the most popular applications ever written for Linux and Unix-like systems.
    In a nutshell, Samba is a set of applications that was originally developed to provide support for Microsoft’s networking protocols on Linux systems, but which has been ported to just about every other network-aware operating system.
    A huge number of books are available that are dedicated to discussing Samba, explaining every nuance of its configuration files, installation, and use. My goal here is not to embed another one inside a text on Ubuntu, but rather to provide some interesting background information about Windows networking and Samba, and then to explain how to use Samba to share directories and printers on your Ubuntu system so that Microsoft Windows users in your home, academic, or business computing environment can access them.



    Overview of Microsoft Windows File Sharing

    Networking and related technologies such as routing are probably responsible for more acronyms than any other aspect of the computer industry. MS-DOS and Microsoft Windows networking have contributed their share, largely because of the ubiquity of these operating systems in modern computing environments.
        Because of the popularity of DOS systems (yesterday) and Windows systems (today), today’s Windows systems provide support for almost everyone’s networking protocols. Frankly, Windows does an admirable job of continuing to make forward progress while still maintaining backward compatibility with almost every ancient DOS application and networking protocol. Windows systems still support the Internet Packet Exchange (IPX) and Sequenced Packet Exchange (SPX) networking protocols used by Novell to provide the first PC file servers.
        However, more relevant for our discussion here are the networking protocols and attendant acronyms that were developed by Microsoft and used to provide file and resource sharing over PC networks without requiring the involvement of any third parties, thank you very much.
        The Basic Input and Output System used by PCs to interact with local devices is best known by its initials as the PC’s BIOS. As networks began to appear, Microsoft extended the capabilities of the BIOS to support accessing and sharing information over a network, naming the related protocols the network BIOS, or as it’s more popularly known, NetBIOS. Just as the BIOS provides the basic functions that support all system input and output, the NetBIOS provides the basic functions that let you use and administer network services.  
    NetBIOS commands and functions must be exchanged between networked systems and therefore require a lower-level network transport mechanism to move network packets from one host to another. 
    The lower-level transport protocols that are still in common use in PC networking today are:
    1. IPX (Internet Packet Exchange)
    2. NetBEUI (Network Basic Extended User Interface), and 
    3. TCP/IP (Transmission Control Protocol/Internet Protocol).
    Interestingly, the word “Internet” in the full names of both IPX and TCP/IP refers to inter-network communications, not the Internet as we know it today.

    Modern Windows systems send their NetBIOS requests by using TCP/IP as a transport protocol (NetBT). On top of the NetBIOS level, Windows networking provides a higher-level interface for network services known as the Server Message Block (SMB) protocol, which is a networking protocol that can be easily used by applications.
        SMB is a connection-oriented protocol rather than a broadcast protocol, meaning that it depends on establishing connections to specific networked services provided by other networked hosts rather than simply broadcasting its availability. Once a connection is established, SMB provides four basic types of functions:
    • Session functions that negotiate and establish networked connections between machines (often referred to as virtual circuits), authenticate, and verify the access privileges that each party has with the other.
    • File functions that enable applications to open, close, read, and write remote files, shared directories, and so on.
    • Printer functions that enable applications to spool output to remote output devices.
    • Message functions that enable applications to send and receive control, status, and informational messages between different systems on the network.
    SMB became an Open Group standard for networking interoperability in the early 1990s.
    Samba takes its name from SMB, the addition of two vowels making it easily pronounced and somewhat softer than simply being YAA (Yet Another Acronym).
    In recent developments, an enhanced version of the SMB protocol called CIFS (Common Internet File System) was submitted by Microsoft to the IETF (Internet Engineering Task Force), an open association of people that are interested in the architecture of Internet communication and the smooth operation of the Internet.
    CIFS has been approved as a standard, and extends the capabilities of SMB by expanding its focus to sharing resources using even more open, cross-platform standards such as HTTP URLs (HyperText Transfer Protocol Uniform Resource Locators) and DNS (the Domain Name System used to map hostnames to IP addresses and vice versa).

    Introducing Samba
    When you get right down to it, more data is probably stored on Windows systems than on any other type of computer system. All of those 1TB home and office systems add up to a tremendous number of Windows filesystems holding a staggering amount of data. Samba gives Linux users transparent access to Windows filesystems, but is more commonly used to give Windows users transparent access to Linux, Unix, and Unix-like systems. Samba does this by providing a network interface(SMB) that is compatible with the networked file and printer-sharing protocols used between Windows systems. To a Windows system, a Linux system running Samba looks exactly like a random Windows system that is sharing filesystems across the network.
        This enables Windows users to take advantage of the speed, power, and capacity of Linux systems without even realizing that they are accessing Linux filesystems. Samba is a free and impressive interface (for Linux, Unix, and other types of systems) to any other networked device (also routers-gateways ultimately) that can communicate using the SMB protocol, most notably Windows systems that provide networked access to files, directories, and printers. Samba enables Windows users to access Linux file systems and resources just like any other Windows shared file system or networked resource.
    For example, with Samba running on a Linux system on your network, Windows users can mount their Linux home directories (as networked Windows drives) and automatically print to Linux printers just like any other networking Windows printer. 
    Samba, which was originally authored by Andrew Tridgell, (who has received numerous awards and accolades for it. Tridgell still works on it today, though thousands of others have contributed to Samba and it is now a team effort. Tridgell created Samba in a massive feat of reverse engineering the protocol and how it worked, because Microsoft’s specifications for SMB weren’t publicly available (big surprise). Today, thanks largely to Tridgell and Samba, Microsoft has opened up the SMB specification, which is now part of its larger CIFS --Common Internet File System-- specification.), is one of the most impressive pieces of interoperability software ever developed.
    Samba includes both client and server software — in other words, client software that enables users to communicate from Linux machines to SMB hosts on your network, and server software that provides an SMB interface for your Linux machine. 
    Using the Samba client software is discussed in the next section. After that section,  we focus on explaining how to install and set up a Samba server.

    As a preface for the Samba server section, we can say that a Samba server actually consists of two processes, both of which can be started from the command line or automatically by integrating them into your system’s startup procedure. These processes are:
    1. smbd, the Samba daemon that provides file sharing and print services to Windows clients
    2. nmbd, the NetBIOS name server that maps the NetBIOS names used by Windows SMB requests to the IP addresses used by Linux(and all the world other) systems.
    The Samba daemon is configured by modifying its configuration file, /etc/samba/smb.conf. On Ubuntu systems, you can either
    1. configure specific directories that you want to export via Samba using graphical tools, as explained in the section entitled “Using the Shared Folder Tool to Share Directories,” or 
    2. you can manually modifying the Samba configuration file, as discussed later in the section entitled “ Samba Server Configuration Essentials.”
    Interoperability between Linux and Windows systems is much more than just Samba.
    The Linux kernel provides built-in support for the protocols used to access Windows filesystems, enabling Linux users to mount Windows filesystems via entries in /etc/fstab, just like any other filesystem resource.


    Accessing Shares on Remote Windows Systems(smb's client)
    Your Ubuntu Linux system can easily access shared directories on Microsoft Windows systems (commonly known as shares) thanks to one of the most popular and useful open source software packages ever created, Samba. Samba takes its name from the SMB (Server Message Block) protocol that is the original underlying protocol used for networked file sharing on Microsoft Windows systems. As mentioned, you can also use Samba to share files from a Linux system so that specified directories look like Windows shares and Windows users can access them. This requires that you set up a Samba server as we see in a little.

    This section focuses on accessing a shared Windows directory from the desktop of your Ubuntu Linux system and copying files to or from that Windows share (A Ubuntu's clean install contain smbclient binary that represent the smb's client part).
        Ubuntu’s Places menu provides the Connect to Server command, which makes it easy to create an icon on your desktop for a Windows share and connect to that share using the Nautilus file manager. Selecting Places ➪ Connect to Server displays the dialog. To specify that you want to connect to an Windows SMB server that requires authentication, click the Service type menu at the top of this dialog and select the Windows share entry to display the dialog shown in Figure.

    You must at least enter the name or IP address of the Windows server (Server field in dialog window) that hosts the share that you want to connect to. If you want to provide your own name for the desktop icon that this command will create, you can enter that in the Bookmark name field, which can be handy because the default name of the icon is the name of the Windows server, and you may want to connect to multiple shares on the same server and avoid confusion.
    You can also identify the name of the user that you want to connect as (User field), the name of the Windows domain or workgroup that you want to connect as (Domain name field), and the share that you want to connect to (Share field), but all of this is optional — you’ll be prompted for anything that you don’t specify here when you attempt to connect. I prefer to put as much information as possible here, but that’s up to you. 
    Click Connect to create a desktop icon for the Windows share that you have defined.
        If you look really closely, you’ll see that the icon created for a Windows share by the Connect to Server dialog displays a little network connection beneath it, and displays a small red flag with the letters SMB inside it to its right (all that can be a little bit different in newer Ubuntu versions). If you end up creating lots of files on your desktop and have really good eyesight, this makes it easy to identify the protocol used to connect to various directories. This can come in handy, especially if you also use the Connect to Server mechanism to connect to FTP, WebDAV,SSH servers.
    Simply creating a desktop icon for a Windows share doesn’t actually establish the connection. To do that,  double-click on the appropriate desktop icon to open it. You will see a dialog that prompts you for any required information that you probably have not specified yet.
    Once you enter any remaining information (password etc) about the share that you want to connect to and the server on which it resides, a Nautilus file manager window appears which displays the contents of the default directory associated with the user that you are logged in as.
        Once the Nautilus file manager displays, copying files in either direction uses the standard Nautilus and GNOME desktop conventions. Once you are done accessing the Windows share, close the Nautilus file manager window.
    To discard the desktop icon, right-click on the icon for the Windows share and select Unmount Volume from the pop-up menu to sever the connection and discard the icon. If you want to reuse the share definition, you can simply leave the icon on your desktop when you log out — the next time that you log in, the icon will still be present, but you will have to re-authenticate to access the Windows share again (all that can be a little bit different in newer Ubuntu versions).


    Installing the Samba Server and Friends
    To install the packages required to run and monitor a Samba server on your Ubuntu system:

    Start the Synaptic Package Manager from the System ➪ Administration menu, and click Search to display the search dialog. Make sure that Names and Descriptions are the selected items to look in, enter samba as the string to search for, and click Search.
        After the search completes, scroll down until you see the samba-common and samba-server packages, right-click each of these packages and select Mark for Installation from the pop-up menu. You may also want to select the samba-doc and samba-doc-pdf packages, which respectively provide HTML and PDF versions of all of the official Samba project documentation, plus an online copy of a book entitled Samba 3 By Example by one of the leaders of the Samba project, John Terpstra.
        Depending on what software you have previously installed on your Ubuntu system and what you select in Synaptic, a dialog may display that lists other packages that must also be installed, and ask for confirmation. When you see this dialog, click Mark to accept these related (and required) packages. Next, click Apply in the Synaptic toolbar to install the Samba server and friends on your system. Once the installation completes, you’re ready to share data with any system that supports SMB.



    Samba Server Configuration Essentials

    At the moment, the almost absence (exist few of them because of the *nixes shell orientation) of a graphical tool for setting up and configuring Samba on Ubuntu systems is a rather glaring omission to the standard user-friendliness that Ubuntu users have come to expect. I’m not the only person to have noticed this, and there are active discussions on various Ubuntu lists and forums about developing such tools.
        However, for the time being, you must do your initial Samba configuration in the aging but tried-and-true Linux way — by editing configuration files using a text editor. Samba’s configuration file is /etc/samba/smb.conf. The Samba configuration file contains many helpful comments, which are lines beginning with a hash mark. It also contains many sample, inactive configuration commands, which are lines beginning with a semicolon. These indicate configuration commands that you may want to activate by removing the leading semicolon (instead the default configuration commands --reported only as a reminder-- begin with #).


    Identifying Your Workgroup or Domain
    The Samba configuration file is divided into several sections, each identified by a section name enclosed within square brackets. The entries related to the network identity of your Samba server and how you authenticate to it are located in the [global] section of the Samba configuration file /etc/samba/smb.conf.
    The key entry that you must set to define how your Ubuntu server interacts with Windows systems is the workgroup entry, which identifies the Windows workgroup or Windows domain to which your Samba server belongs.
    On a sample system of mine, this entry looks like the following:

    [global]
       workgroup = WVH

    This entry identifies my machine as belonging to the Windows workgroup or domain named WVH. In this SOHO example, it’s a workgroup, but that is essentially transparent — the key thing here is that your Samba server is either a member of the workgroup/domain that you are already using at your site, or a primary domain controller that defines the domain that you want it to host.
    If you want your Samba server to function as the primary domain controller (PDC) for a Windows domain, you must also change the domain master setting in the Samba configuration file to yes, and make sure that the entry is not commented out. 
    In this case, the entry would look something  like the following:

    domain master = auto

    This means that the Samba server will serve as a primary domain controller for your domain if no other domain controller can be located, which will be the case if your Samba server is hosting the domain.

    Here i explain how to use Samba as part of a workgroup, which is the typical way that Samba is used on a home or SOHO network.
    For information about setting up a complete Windows domain, make sure  that you installed the samba-doc as suggested in the section of this chapter entitled “Installing the Samba Server and Friends,” and consult the online copy of Samba 3 By Example that is provided in the directory /usr/share/doc/samba-doc/htmldocs/Samba3-ByExample.
    Because this example shows a small workgroup, you will probably also want to activate the entries that tell Samba not to function as a WINS (the Windows NetBIOS name service), but to instead use your system’s Domain Name Server to look up Windows hostnames, which should look like the following:

    wins support = no
    dns proxy = yes
    name resolve order = lmhosts host wins bcast


    The last entry tells Samba that clients:
    1. will first look in their local lmhosts file for name information
    2. will then check information in the file /etc/hosts on the Samba server
    3. will then check WINS (which, in this case, is proxied to the local DNS server), and 
    4. will finally use a broadcast to search for the right host
    This combination covers the hostname lookup bases for every Samba server configuration that I’ve ever set up or used.
        This completes the core modifications to the Samba configuration file that are necessary for Samba to be able to share files with the specified workgroup or domain.


    Configuring Samba Authentication
    In the workgroup configuration that we are using as an example here, Samba comes preconfigured to perform the type of authentication required by workgroup members. However, to be able to access shared resources, you must be able to authenticate to the Samba server. Samba maintains its own authentication information — to add login and password information about a user, you must use the smbpasswd command.
    When adding information about a user, you must use the –a (mean add) option, followed by the name of the user that you want to add. 
    The smbpasswd command will prompt you for a password and will then prompt you for that password again to ensure that you have typed the password correctly. A sample transcript of adding the user wvh is the following:

    $ sudo smbpasswd -a wvh
    Password:
    New SMB password:
    Retype new SMB password:

    The first password prompt is from the sudo application, asking for my Ubuntu password so that i can perform this privileged operation. The second two are for entering and verifying the Samba password for the user wvh.
    The default security level in the Samba configuration file, security = user, requires that Samba users must also exist in the password file(/etc/passwd) on the Ubuntu system, or you will receive an error message and the smbpasswd command will fail. For information about other security models and their implications, see the file file:///usr/share/doc/samba-doc/htmldocs/Samba3-HOWTO/ServerType.html from the  samba-doc package.

    Sharing Printers and Home Directories Using Samba
    After setting up:
    1. the workgroup and 
    2. creating a user
    the next thing to consider is the general resources on the Ubuntu system that you want to make available to all users of Windows systems that connect to your Samba server. The most common examples of these are
    1. printers and 
    2. users’ home directories
    The entries in the [global] section of a Samba configuration file that are relevant to printers and printing on Ubuntu are the following, which you should make sure are not commented out (i.e., preceded by a semicolon):

    load printers = yes
    printing = cups
    printcap name = cups 

    Later in the Samba configuration file, the [printers] and [print$] sections provide information about how your Windows system will interact with the Samba server. You won’t need to change any of these, but for your reference, these entries are the following:

    [printers]
       comment = All Printers
       browseable = no
       path = /tmp
       printable = yes
       public = no
       writable = no
       create mode = 0700
    [print$]
       comment = Printer Drivers
       path = /var/lib/samba/printers
       browseable = yes
       read only = yes
       guest ok = no
    

    1. The [printers] section identifies how the Samba server will handle requests for printer identification and incoming print requests from Windows clients, 
    2. while the [print$] section maps the traditional Windows print$ share name to a directory on your Linux system where you can put the print drivers for specific printers so that your Windows clients can locate and load them if they are not already available on a client system.

    Aside from printers, the most natural resource for users to want to access from their Windows systems is their Ubuntu home directories. This provides an easy way to use your home directory on a Samba server as a centralized place to store files, but also enables you to automatically back up files from your Windows system to the Samba server and enables you to leverage any standard backups that you are doing on the Samba server. (See here  for more information about backing up files on your Ubuntu system)
        Happily, Samba’s configuration file is already set up to support sharing home directories by default, as long as your Linux and Windows user names match. The [homes] section of the Samba configuration file that supports this is the following:

    [homes]
        comment = Home Directories
        browseable = no
        writable = no
        create mask = 0700
        directory mask = 0700

    As you can see from this configuration file excerpt, home directories are shared by default but are not  writable from Windows clients. This is pretty inconvenient, so you will want to change the writable entry in  this section of the configuration file to yes, as in the following example:
             writable = yes
    This enables users to both read and write files in their home directories. If your Windows and Ubuntu users happen to have different logins, you can associate the two by creating appropriate entries in the file /etc/samba/smbusers. Entries in this file have the form: 

    UnixLogin = WindowsLogin 
    For example, to map the Ubuntu user wvh to the Windows user bill.vonhagen, I would create the following entry in the /etc/samba/smbusers file:
     wvh = bill.vonhagen
    If your Windows login names contain one or more spaces, you must enclose them within quotation marks in this file. I suggest keeping logins the same across Ubuntu and Samba systems, but this file can be used if you have different naming conventions for the users of different types of systems.


    Verifying the Samba Configuration File
    After making any changes to your Samba configuration file, you should verify that you haven’t accidentally violated the syntax of the file. If a Samba configuration file contains any invalid entries, Samba displays an error message and will not load. This could be extremely discouraging, were it not for the fact that the Samba server package also provides a utility that tests the validity of a Samba configuration file and identifies the exact location of any errors that it finds. This utility is testparm, which you can run with no arguments to test the default Samba configuration file /etc/samba/smb.conf. The output from a run of the testparm utility looks like the following:

    $ testparm
    Load smb config files from /etc/samba/smb.conf
    Processing section “[homes]”
    Processing section “[printers]”
    Processing section “[print$]”
    Loaded services file OK.
    WARNING: passdb expand explicit = yes is deprecated
    Server role: ROLE_STANDALONE
    Press enter to see a dump of your service definitions
    

    As shown in the preceding example, after displaying some general information, the testparm utility prompts you to press Enter on your keyboard to see detailed information about the services that your Samba server has been configured to provide. An example of this detailed information is the following:

    [global]
    workgroup = WVH
    server string = %h server (Samba, Ubuntu)
    obey pam restrictions = Yes
    passdb backend = tdbsam, guest
    syslog = 0
    log file = /var/log/samba/log.%m
    max log size = 1000
    name resolve order = lmhosts host wins bcast
    printcap name = cups
    panic action = /usr/share/samba/panic-action %d
    invalid users = root
    printing = cups
    print command =
    lpq command = %p
    lprm command =
    
    [homes]
    comment = Home Directories
    read only = No
    create mask = 0700
    directory mask = 0700
    browseable = No
    
    [printers]
    comment = All Printers
    path = /tmp
    create mask = 0700
    printable = Yes
    browseable = No
    
    [print$]
    comment = Printer Drivers
    path = /var/lib/samba/printers

    Depending on the version of Samba installed on your Ubuntu system and the changes that you have made to your Samba configuration file, your output may differ slightly.
        The testparm utility only checks the Samba configuration fileit does not consult the running Samba service for information about how it is configured and the services that it provides. To do this, you can use the smbclient utility, as described in the next section.



    Testing Samba Availability and Services
    The Samba applications suite includes an application called smbclient, which is a command-line client for contacting and browsing Samba servers. You can also use this utility to test if your Samba server is up and running, and to list the services that it is providing.
        If you have made any changes to your Samba configuration file, you should restart your Samba server before attempting to verify the services that it provides. To do this, execute the following command:

    /etc/init.d/samba restart

    This command shuts down any instance of the Samba server that is currently running, and then starts the Samba server again, which forces it to read your updated configuration file.
        Once you have done this, you can query your Samba server by using the smbclient application; the –L option to list available resources; the name of the host that you want to contact; and the –U% option to show what a default, non-authenticated user would see. An example of this command and its output is the following:

    $ smbclient -L ulaptop -U%
    Domain=[WVH] OS=[Unix] Server=[Samba 3.0.22]
              Sharename      Type      Comment
             ---------      ----      -------
             ADMIN$         IPC       IPC Service (ulaptop server (Samba,Ubuntu))
             IPC$           IPC       IPC Service (ulaptop server (Samba,Ubuntu))
             print$         Disk      Printer Drivers Domain=[WVH] OS=[Unix] Server=[Samba 3.0.22]
             Server              Comment
             ---------           -------
             ULAPTOP             ulaptop server (Samba, Ubuntu)
             Workgroup           Master
             ---------           -------
             WVH                 ULAPTOP
    1. The first portion of this output lists the shares available to the current user, which is an unauthenticated user. 
    2. The second provides information about the name of the Samba server that you have contacted and 
    3. the workgroup or domain that it is a member of.
    Looking good! The next step is to see what an authenticated user would see. I’ll use the sample user wvh, which I created earlier. This time, the command and its output look like the following:

    wvh@ulaptop:~$ smbclient -L ulaptop -Uwvh
    Password:
    Domain=[ULAPTOP] OS=[Unix] Server=[Samba 3.0.22]
             Sharename      Type      Comment
             ---------      ----      -------
             ADMIN$         IPC       IPC Service (ulaptop server (Samba,Ubuntu))
             IPC$           IPC       IPC Service (ulaptop server (Samba,Ubuntu))
             print$         Disk      Printer Drivers
             wvh            Disk      Home Directories Domain=[ULAPTOP] OS=[Unix] Server=[Samba 3.0.22]
             Server              Comment
             ---------           -------
             Workgroup           Master
             ---------           -------
             WVH                 ULAPTOP

    Note that because i specified an actual username, the smbclient utility first prompted me for my password on the Samba server, and then displayed slightly different information. The only real difference is that an authenticated user has access to his or her home directory on the Samba server, as you can see in the Home Directories entry in the Sharename section of the first portion of the smbclient output.
        Congratulations! Now that it’s clear that the Samba server is working and is correctly configured to allow users to access user-specific resources, it’s time to add some system-wide resources that all authenticated users will be able to take advantage of. Luckily, as explained in the next section, you can even do this with a graphical tool.


    Using the Shared Folder Tool to Share Directories
    Though Ubuntu Linux doesn’t provide(per default) a graphical tool for setting up and configuring Samba itself, it does provide an easy-to-use graphical tool (nautilus-share) that simplifies identifying and configuring specific directories that you want to export via Samba from your Ubuntu system. If you have already exported shared directories using NFS, this tool should already be familiar to you — the Ubuntu and GNOME folks were clever enough to use the same tool for identifying and configuring both Samba shares and NFS exports. Nautilus Share 
    allows you to quickly share a folder from the GNOME Nautilus file manager without requiring root access.
    1.  To start this tool, select Places ➪ Network (in older Ubuntu releases System ➪ Administration ➪ Shared Folders. After supplying your password in the administrative authentication dialog that displays, the Shared Folder tool starts). 
    2. This dialog shows any shares that you have already defined, which in this case shows the /opt2 directory that i am sharing with NFS users. To define a directory that you want to share via Samba, click Add to display a dialog. 
    3. As you can see this dialog is already set up with the basic information required for defining an SMB share. The default exported/shared directory that is initially selected when you start the Shared Folder tool is your home directory. This isn’t all that exciting because Samba shares these automatically, so in this example, I’m going to share the directory that contains my online audio collection.
    4. To specify another directory for sharing, click the Path item and select Other from the drop-down menu to display the directory selection dialog. To select a directory somewhere on your system, click root and navigate through the directory tree on your system to select the directory that you want to export, which in this example is my /opt2 directory. Click Open to select that directory (or whatever directory you want to export) and return to the dialog previously shown, which now displays the name of the newly selected directory in the Path field. 
    5. Next, you’ll need to identify the sharing settings and name resolution mechanism to access (i.e., mount) this directory over the network. 
    6. To verify these, click the General Windows sharing settings button to display the dialog. This dialog shows that this directory will be shared from your Samba server using the WVH workgroup, and that WINS need not be used to locate the server — hosts file or DNS information will suffice
    7. Click OK to return to the dialog previously shown. You may want to enter some general information about the share that you are defining in Name and Comment fields, for future reference. 
    8. This dialog also enables you to use the “Read only” checkbox to specify that you want the directory to be shared in read-only fashion. If this box is not selected, properly authenticated users can create and modify files in directories that they have access to. 
    9. Similarly, you can select the Allow browsing folder checkbox to enable unauthenticated users to browse the shared directory without being able to examine any of the files that it contains. To continue, click OK to redisplay the originally shown dialog , which now contains the settings for your newly defined SMB shared directory.
    10. Almost done! To subsequently modify or update the settings for any shared directory, you can highlight its name in the nautilus and click Properties to display the specific settings for that shared folder. To begin sharing the folder, click OK to start the specified type of file sharing and close the nautilus.
    As you might suspect, defining SMB shared in the Network tool actually just creates the correct entries for those shares in your Samba configuration file, just as creating NFS exports in this tool adds the correct entries to the /etc/exports file used by NFS. The entries that were added to the /etc/samba/smb.conf file for the share that you just defined are the following:

    [opt2]
       path = /opt2
       comment = Music
       available = yes
       browseable = no
       public = yes
       writable = yes

    Adding these entries to the standard Samba configuration file ensures that the newly defined shares will always be available as soon as the Samba daemon (smbd) is started on the Ubuntu system that is acting as a Samba server.


    Getting More Information About Samba
    Not surprisingly, the Web provides an excellent source of additional information about Samba. Somewhat surprisingly, the Samba project itself provides some excellent documentation, as well as an online version of a great book about the current version of Samba, which is Samba version 3. As suggested earlier, you can install your own copies of the Samba documentation by installing the samba-doc and samba-doc-pdf packages when you install the Samba server. For more information, consult any of the following:
    • samba.org: The main site for the Samba project, at which you can find the latest tips, tricks, and source code.
    • /usr/share/doc/samba-doc/htmldocs/Samba3-HOWTO: A directory containing a HOWTO file in HTML format that explains how to install and configure Samba version 3, and answers many common questions. (You’ll want to open the file index.html in this directory from your Web browser). The HTML version of this document is provided in the samba-doc package.
    •  /usr/share/doc/samba-doc/htmldocs/Samba3-ByExample: A directory containing an HTML version of an excellent, hands-on book about Samba 3. This book was written by John Terpstra, one of the leaders of the Samba project, and explains how to install and configure Samba 3 for a complete spectrum of networked environments, from smaller SOHO environments (as discussed here) to enterprise environments with thousands of users. (You’ll want to open the file index.html in this directory from your Web browser). The HTML version of this document is provided in the samba-doc package. You’ll probably also want to buy a paper copy of  John’s book — it’s definitive, complete.
    • Samba old minihowto (pdf)

    06 April 2010

    Ubuntu --- Backing Up and Restoring Files



    Ubuntu is knee-deep in excellent software that you can use to do 
    various types of backups, ranging from backing up to files 
    that you can store on removable media or simply move 
    to other systems. Below we discuss BackupPC, an 
    excellent open source backup application for 
    doing regular, networked backups of 
    multiple systems to a central 
    backup server.

    Backups

    Backups are spare copies of the files and directories that are found on a computer system, written to and stored on removable media that is preferably stored somewhere other than beside your computer. Doing backups is a time-consuming, but absolutely mandatory task if you place any value at all on the files, e-mail, and other data that you have stored on your computer.
        Backups are exactly like auto insurance policies. You rarely need them, and you hope that you never do. They are usually just time-consuming and expensive (your time has some value, right?). However, one rainy night when you discover that you’ve just accidentally deleted your home directory or when a user comes to you and says that they’ve accidentally deleted your company’s personnel records, payroll data, or the source code for your company’s products, you’d better have a good answer. The right answer, of course, is, “I’ll restore that from backups immediately.”
        It’s hard to think of anything that so thoroughly combines the mundane and mandatory as backing up your data. It’s boring. It’s time-consuming. And, of course, it’s critical. This text is oriented toward you as a systems administrator, regardless of how many systems you’re responsible for. As system administrators, our responsibility is to provide secure, well-maintained, and rigorously backed up systems for the benefit of the users of the computer systems we’re responsible for. You should feel even more responsible if you’re only supporting a user community of one (yourself), because you won’t even have anyone else to blame if a catastrophe occurs. Even if you’re a community of one, I’m sure that you feel that whatever you do on your computer system is important. Backups keep it safe.
        Here i explain a variety of solutions for creating backups on Ubuntu Linux systems, ranging from command-line solutions to some impressive graphical tools. It also covers the flip side of making backups, restoring files from them, which is what makes backups worthwhile in the first place.

    Before discussing the different tools used to actually create backups, it’s useful to review some of the basic issues and approaches in backing up any kind of computer system. Though you may already be totally  familiar with these concepts and occasionally mumble backup and restore commands in your sleep, providing a clear picture of what you’re trying to accomplish in doing backups and how backup systems are usually designed provides a firm foundation for discussing the various tools discussed later. I discuss many topics that are overkill for a home computing environment but are mandatory in multisystem business or academic environments.


    Why Do Backups?
     In an ideal world, backups would not be necessary. Computer hardware and software would always work correctly, users would never make mistakes. Unfortunately, in the real world, things  are different. Computer system administrators and other members of an MIS/IT department do backups for many reasons, helping protect you against the following types of problems:
    • Natural disasters such as fires, floods, and earthquakes that destroy computer systems 
    • Hardware failures in disk drives or other storage media that make it impossible to access the data that they contain 
    • System software problems such as filesystem corruption that might cause files and directories to be deleted during filesystem consistency checks
    • Software failures such as programs that crash and corrupt or delete the files that you’re working on
    • Pilot error, AKA the accidental deletion of important files and directories
    Many people tend to confuse RAID (Redundant Array of Independent Disks) arrays with backups. They are not the same thing at all. RAID arrays can be a valuable asset in keeping your existing data online and available in the face of disk failures, but they do not protect against any of the problems identified in the previous list. All of the drives in a RAID array will burn evenly in case of a fire or other natural disaster.
        In addition to protecting you against these sorts of problems accessing the data they, you, and any other users of your systems require, there are a variety of procedural and business reasons to back up the data on your computer systems. Complete and accurate backups provide:
    • A complete historical record of your personal, corporate, or organizational business and financial data. Sadly enough, this includes serving as a source of information that you, your company, or your organization may someday need to defend itself or to prove its case in a lawsuit or other legal proceedings.
    • A source of historical information about research projects and software development.
    • A way of preserving data that you do not need to make continuously available online, but which you may need to refer to someday. This includes things like projects that you’ve completed, the home directories of users who are no longer using your systems, and so on.
     A final issue where backups are concerned is the need for off-site storage of all or specific sets of your backups. The history of personal, business, and academic computing is littered with horror stories about people who did backups religiously, but stored them in a box beside the computer. After a file or natural disaster, all that the administrators of those systems were left with were poor excuses and unemployment benefits.
        Off-site storage (cloud computing) is critical to your ability to recover from a true physical catastrophe, but it also raises another issue — the need for appropriate security in the storage location you select.
    For the same reasons that you wouldn’t leave the door to your house propped open and then go on vacation and wouldn’t put a system that didn’t use passwords on the Internet, you shouldn’t store your backups in an insecure location. This is especially important if you are in charge of computer systems that are being used for business. 
    Wherever you store your company’s current and historical backup media should have a level of security comparable to wherever your computers are in the first place. Though your local cat burglar might not actively target a stack of CDs, removable disks, or storage locker full of backup tapes, any competitors you have would probably be ecstatic to be able to read and analyze the complete contents of your company’s computer systems. Why not just save everybody time and mail them your source code and customer lists?
    A Few Words About Backup Media  Backups take a significant amount of time and require a significant investment in both media and backup devices. Nowadays, even home computer systems store tens or hundreds of gigabytes of information, which means that you either need to have fast, high-capacity backup devices, or you must prepare yourself for a laborious day or two or loading CDs, DVDs, or tapes. 
        Other, more historical solutions such as Zip disks, Jazz disks, LS-120 disks, and so on, provide such a small amount of storage that they’re really only useful for backing up individual files, directories, or sets of system configuration files, and are therefore not discussed here.
        If the mention of backup tapes causes flashbacks to mainframe computer days or old sci-fi movies, you may want to rethink that. Even for home use, today’s backup tape drives are fast, store large amounts of data, are relatively cheap, and use tapes that fit in the palm of your hand. Though disk-to-disk backups are becoming more and more common, especially in networked environments, backup tapes are still quite popular and cost-efficient.
        CD-Rs and DVD-Rs are eminently suitable for backups of home computer systems because they are inexpensive and typically provide enough storage for backing up the selected files and directories that comprise most home backups.
        For home use, I prefer CD-R and DVD-R media over their rewritable brethren because of the cost difference and the fact that rewritable CDs and DVDs are only good for a limited number of writes. 
        On the other hand, CD-Rs and DVD-R’s are rarely appropriate for enterprise backups because even DVD-Rs are not large enough to back up complete systems, it’s tricky to split backups across DVD-R media, and DVD-R’s are relatively slow to write to. They can be useful when restoring a system because of their portability, because you can take them directly to the system you’re restoring without having to move a tape drive, do a network restore, and so on. However, I personally prefer removable hard drives or tapes in enterprise or academic environments.

    Different Types of Backups
    Now that I’ve discussed why to do backups and some of the basic issues related to storing them, let’s review the strategy behind actually doing backups. As mentioned previously, backups take time and have associated costs such as backup media, but there are a variety of ways to manage and minimize those costs.  There are three basic types of backups:
    1. archive backups, which provide a complete snapshot of the contents of a filesystem at a given time
    2. incremental backups, which reflect the changes to the contents of a filesystem since a previous backup
    3. spot backups, which provide a snapshot of specific files or the contents of one or more important directories at a given time
    Spot backups are the most common type of backups done by home computer users, because writing a copy of your current projects, mail folders, or even your entire home directory to a few CD-Rs or DVD-Rs is relatively fast and cheap. There isn’t all that much to say about this approach, because it can easily be done using drag and drop, so the rest of this section focuses on the classic backup models of archives and incremental backups. I’ll discuss some techniques for doing spot backups later.
     Archive backups, often referred to as archives or full backups, are the ultimate source for restoring data,  because they usually contain a copy of every file and directory on a specific filesystem or under a certain directory on your computer at the time that the backup was done. In an ideal world, it would be great to be able to do daily archive backups simply because this would guarantee that no one could ever lose more than a day’s work, regardless of the type of calamity that occurred to your computer system. Unfortunately, archive backups have some drawbacks:
    • They take the maximum amount of time that backups could require because they make a copy of every file and directory on every filesystem on all of your computer systems.
    • The volume of data that is preserved by an archive backup means that they use the maximum amount of space on your backup media.
    • Producing the largest possible volume of backup media maximizes the amount of storage space required to store it, and makes your record keeping as complex (and as critical) as it possibly could be.
    • Archives are best done when no one is working on a computer system. This reduces the amount of time that it takes to do the backups (because they’re not competing with anyone for computer time), and also guarantees the consistency of the files and directories that are being copied to your backup media, because nothing can be changing. This may not be a big point in a home computing environment, but in a business environment, making sure that no one is using a computer system so that you can do an archive backup is often impractical (as on systems that run 24×7 services such as Web servers, database systems, and so on) or, best case, reduces the availability of a computer system to the company and your customers.
     Although the advantages of archive backups as a complete record of everything are significant, these kinds of issues keep archives from being a reasonable approach to daily backups for any home computer, business, or organization. You could always do them less often than daily, but reducing the frequency of your backups increases your exposure to losing a significant amount of data if your disks fail or your computer bursts into flames.

    Enter incremental backups. As mentioned before, incremental backups are backups that contain a copy of  all of the files and directories that have changed on a computer system since some previous backup was done. If a problem occurs and you need to restore files and directories from backups, you can restore an accurate picture of those files and directories by first restoring from an archive backup, followed by restoring from some number of incremental backups up through your most recent ones, which should restore whatever you’ve backed up to the date of your most recent incremental backups. When combined with
    archives, incremental backups provide the following advantages:
    • They help minimize the amount of disk space or other backup media required to do backups. Archives usually require large quantities of most types of backup media, while incrementals inherently require less because they aren’t preserving as much data.
    • They can be done more quickly, because they are copying less data than an archive backup would.
    • The backup media to which incremental backups are written requires less storage space than archive backups, because there’s less of it.
    • Even in business and academic environments, incremental backups can be done while the computer systems and filesystems you’re backing up are available for use.
    Another nice feature of incremental backups is that they record changes to the files and directories on your computer systems since some previous backups, which are not necessarily archives. In corporate environments, most systems administrators organize their backup media and associated procedures in a way similar to the following:
    • Archives are done infrequently, perhaps every six months or so, or just before any point at which major changes to your filesystems or computer systems are being made.
    • Monthly incremental backups are made of all changes since the previous archive. If your budget and backup storage capabilities are sufficient, you usually keep the monthly incremental backups around until you do another archive backup, at which point you can reuse them.
    • Weekly incremental backups are made of all changes since the previous monthly backup. You can reuse these each month, after you do the new monthly backups.
    • Daily backups are made of all changes since the previous weekly backup. You can reuse these each week, after you do the new weekly backups. Some installations even just do dailies since a previous daily or the daily done on some previous day of the week.
    No backup system can make it possible to restore any version of any file on a computer system. Even if you were lucky or compulsive enough to be doing daily archives of all of your computer systems, files that exist for less than a day can’t be restored, and it isn’t possible to restore a version of a file that is less than a day old. Sorry. When designing a backup schedule and the relationships between archive and various incremental backups, you have to decide the granularity with which you might need to restore lost files. For example, the general schedule of archives, monthlies, weeklies, and dailies doesn’t guarantee that you can restore a version of a file that is newer than the previous archive. For example:
    • If the file was deleted one day before the first set of monthly backups were done based on the archive, it would be present on the archive and on the weekly backups for a maximum of one month. At that point, the weekly tape containing that file would be overwritten and the newest version of the file that could be restored was the version from the archive.
    • If the file was deleted one day after the first set of monthly backups were done based on the archive, it would be present on the archive and on the first monthly backup for a maximum of seven months — a new archive would be done at that point, and the monthly tape wouldn’t be overwritten until one month after the new archive. At that point, the monthly tape containing that file would be overwritten and the newest version of the file that could be restored was the version from the most recent archive.
    Selecting a backup strategy is essentially a calculation of how long it will take someone to notice the absence of one or more files and request a restore, taking into account the level of service that you need to provide and the cost of various levels of service in terms of media, backup time, and storage/management overhead. Sometimes you will notice missing files immediately, such as when you accidentally delete the task you’re actively working on. Other problems, such as lost files because of gradual disk failures or filesystem corruption, may not surface for a while.
        Almost all backup systems generally provide automatic support for doing incremental backups since a previous incremental or archive backup. The Linux dump program, which I’ll discuss in the next section, assigns different numbers to different backup “levels,” and keeps track of which levels of backups have been
    done based on the name of the device on which the filesystem is located.
         A final issue to consider when doing backups and restoring files is when to do them, and what privileges are required. It’s generally fastest to do backups during off-peak hours when system usage is generally at a minimum, so that the backups can complete as quickly as possible, and when people are less likely to be modifying the files that you’re backing up. In an enterprise environment, this may mean that you’ll want to have a graveyard shift of operators. In this case, you’ll need to think about how to make sure that operators have the right set of privileges.
    Being able to back up user files that may be heavily protected, or using a backup system that accesses the filesystem at the filesystem level generally requires root privileges. Many people use programs such as sudo (which is already our friend on Ubuntu systems) or set s-bits on privileged binaries such as backup and restore programs so that they don’t have to give the administrative  password to the operators or part-time staff that generally do backups at off-peak hours.


    Verifying and Testing Backups
    Just doing backups isn’t a guarantee that you’re safe from problems, unless you’re also sure that the backups you’re making are readable and that files can easily be restored from them. Though it’s less common today, there’s always the chance that the heads in a tape drive may be out of alignment. This either means that you can only read the tapes back in on the same tape drive that you wrote them on, or that they can’t be read at all. You should always verify that you can read and restore files from backups using another device than the one on which they were made. You don’t have to check every tape every day, but random spot checks are important for peace of mind and for job security. Similarly, tapes can just stretch or wear out from use — be prepared to replace the media used to do various types of incremental backups after some set amount of time. Nobody appreciates WORN backup media — write once, read never — even though its storage capacity is apparently infinite.
        One of the problems inherent to backups is that every type of computer media has a shelf life of some period of time, depending on the type of media, the environment in which it is stored, and how lucky you are. No backup media has infinite shelf life. For example, backup tapes can last for years, but they can also be unreadable after a much shorter period of time. Long-lived media such as write-once CD-Rs and DVD-Rs  are attractive because of their supposed longevity, but they have other problems, as mentioned earlier in the quote entitled “A Few Words About Backup Media.” Media such as these may only be suited for certain  types of backups, depending on whether your backup software writes to the backup device as a filesystem or as a raw storage device. Also, no one yet knows exactly how long those types of media will last, but they certainly take up less room than almost any kind of tape or stack of hard drives.
        In addition to spot-checking  the backup media that you are currently using, you should always make a point to spot-check old archives every few years to make sure that they’re still useful.
    Aside from the fact that backups can be subject to the vagaries of the device on which they’re written, having those devices available when you need to restore backups is an important point to consider. It’s a well-known nerd fact that many government and military sites have huge collections of backup data written on devices that don’t exist anymore, such as super low-speed tape drives and 1” or 7-track tapes. Even if the devices exist, the data is often not recoverable, because it’s written in some ancient, twisted backup format, word size, and so on. When you retire a computer system, deciding if you’ll ever need to restore any of its archive data is an easily overlooked issue. If you’re lucky, you’ll be able to read in the old archives on your new system and write them back out to some newer backup media, using some newer backup format. If you’re not, you’ve just acquired a huge number of large, awkward paperweights that will remind you of this issue forever.


    Deciding What to Back Up
    Aside from cost-saving issues like using higher-density media such as CD-ROMs for archive purposes, another way to reduce the number of old backups that you have to keep around, as well as minimizing the time it takes to do them, is to treat different filesystems differently when you’re backing them up. For example, system software changes very infrequently, so you may only want to back up the partitions holding your operating system when you do an archive. Similarly, even locally developed application software changes relatively infrequently, so you may only want to back that up weekly. I can count on one hand, with one finger, the number of times that I’ve needed to restore an old version of an application. On the other hand, you may not be so lucky. Keeping backups of your operating system and its default applications is important, and is certainly critical to restoring or rebuilding an entire system should you ever need to do so (which is known in backup circles as a bare-metal restore).
        In terms of backups (and thanks to the excellence of the Ubuntu Update Manager), you can usually just preserve your original installation media (or even re-retrieve it over the net) if it is ever necessary to completely restore the system software for your Ubuntu system. However, if your systems run a custom kernel or use special loadable kernel modules, you should always make sure that you have a backup of your current configuration and all of the configuration files in directories such as /etc that describe the state of your system. You’ll be glad you did if the disk on which your finely tuned and heavily tweaked version of an operating system bursts into flames late one night.
    The issues in the first few sections of this text often give system administrators and system managers migraines. Losing critical data is just as painful if you’re only supporting yourself. Thinking about, designing, and implementing reasonable backup policies, schedules, and disaster recovery plans is an important task no matter how many people will be affected by a problem. Backups are like insurance policies — you hope that you never need to use them, but if you do, they had better be available.


    Backup Software for Linux

    Many backup utilities are available for Ubuntu systems. Most of these are traditional command-line utilities that can either create archive files or write to your backup media of choice in various formats, but some interesting open source graphical solutions are also beginning to appear.
        The next few sections discuss the most common open source utilities that are used to do backups on Linux systems, grouping them into sections based on whether they create local backup files or are inherently network-aware. As discussed in the previous section, off-site storage of backups is an important requirement of a good backup strategy. In today’s networked environments, off-site storage can be achieved in two basic ways:
    1. either by writing to local backup media and then physically transporting that media to another location, or 
    2. by using a network-aware backup mechanism to store backups on systems that are physically located elsewhere.

    Local Backup and Restore Software for Linux
    The roots of the core set of Linux utilities lie in Unix, so it’s not surprising that versions of all of the classic Unix backup utilities are available with all Linux distributions. Some of them are starting to show their age, but these utilities have been used for years and guarantee the portability of your backups from any Linux system to another.
        The classic Linux/Unix backup utilities available in the Ubuntu distribution are the following, in alphabetical order:
    • cpio: The cpio utility (copy input to output) was designed for doing backups, taking a list of the files to be archived from standard input and writing the archive to standard output or to a backup device using shell redirection. The cpio utility can be used with filesystems of any type, because it works at the filesystem level and therefore has no built-in understanding of filesystem data structures.
    • dd: The original Unix backup utility is called dd, which stands for dump device, and it does exactly that, reading data from one device and writing it to another. The dd utility doesn’t know anything about filesystems, dump levels, or previous runs of the program — it’s simply reading data from one source and writing to another, though you can manipulate the data in between the two to do popular party tricks like converting ASCII to EBCDIC. The dd utility copies the complete contents of a device, such as a disk partition to a tape drive, for backup purposes. It wasn’t really designed to do backups, though there are situations in which dd is the perfect tool: For example, dd is the tool for you if you want to copy one partition to another when a disk is failing, make on-disk copies of the partitions on a standard boot disk for easy cloning, or use an application that reads and writes directly to raw disk partitions which you can only backup and restore as all or nothing. Because dd reads directly from devices and therefore doesn’t recognize the concept of a filesystem, individual file restores are impossible from a partition archive created with dd without restoring the entire partition and selecting the files that you want.
    • dump/restore: The dump and restore utilities were designed as a pair of utilities for backup purposes, and have existed for Unix since Version 6. Although cpio and tar combine the ability to write archives with the ability to extract files and directories from them and dd can’t extract anything except an entire backup, the dump program only creates backups and the restore program only extracts files and directories from them. Both dump and restore work at the filesystem data structure level, and therefore can only be used to backup and restore ext2 and ext3 filesystems (at the moment, at least). However, the dump/restore programs can accurately back up and restore any type of file that is found in ext2 and ext3 filesystems, including device-special files and sparse files (without exploding their contents and removing their “sparseness”). The dump/restore utilities can only be used to back up entire filesystems, though they have built-in support for doing incremental backups, keeping a record of which filesystems have been backed up and which level of backups has been performed for those filesystems. All of this information is tracked in an easily understood text file named /etc/dumpdates. Archives created with the dump utility can automatically span multiple tapes or other media if the devices support end-of-media detection, but can also span cartridge or magnetic tape media by using command-line options that tell dump the length or capacity of the tape. The most entertaining feature of the restore program is its ability to execute it in interactive mode, in which case it reads the information from the tape necessary to create a virtual directory hierarchy for the archived filesystem that it contains. You can then use standard commands such as cd to explore the list of the files on the tape and mark specific files and directories to be restored.
    The dump/restore programs are not installed as part of a default Ubuntu distribution, but can easily be installed using apt-get (both are located in the dump package)
    • tar: Probably the most widely used and well-known Unix backup utility, the tar command (tape archiver) takes a list of files and/or directories to be backed up and archives those files to an output device or to standard output. The GNU version of tar, once known as gtar to differentiate it from the version of tar that came with the Unix operating system (back when anyone cared), is yet another amazing piece of work from the Free Software Foundation. GNU tar provides capabilities far and above the abilities of classic Unix tar, including the built-in ability to read from compressed tar archives created with gzip, support for incremental backups, support for multivolume archives, and much more. The tar program is filesystem-independent and accesses files and directories without needing to know their low-level data structures. The tar program is far and away the most popular free archiving utility available for Linux, and is used to archive almost every free software package. The DEB and RPM archive formats actually contain tar files that are compressed using the gzip utility, and files with the .tgz or .tar.gz (also gzipped tar files) are commonly used to distribute most Linux source code.
    The utilities discussed in this section all create local archive files or write their archives to local storage  devices. Of course, when you’re using a network-aware operating system such as Ubuntu Linux, the term  local storage devices actually includes anything that appears to be local to your system, which therefore  includes network storage that is mounted on a directory of the system that you are using. Common examples of this are NFS-mounted directories or directories that are mounted on your Linux system via Samba.
        Directories that are mounted over the network enable you to integrate remote storage with local backup commands in ways such as the following:
    • Back up remote directories to local archives by mounting the remote directories on your local system and including them in the backups that you do.
    • Write your backup files to remote storage by creating your backup archives in remote directories that are mounted on your system as local directories.
    Both of these scenarios provide ways of satisfying the basic off-site requirement of backups through the use of network-mounted directories.


    Network-Oriented Backup Software for Linux
    The utilities discussed in the previous section all create local archive files or write their archives to local storage devices (or storage that appears to be local). The backup utilities discussed in this section are slightly different — they are inherently network-aware, and therefore enable you to create and manage localbackups of the contents of remote systems.
        The following are some of the more commonly used, network-aware backup systems that are available for Ubuntu.
    There are many more, which you can find by starting the Synaptic Package Manager and doing a Description and Name search for the term backup. 
    The following are my personal favorites:
    • Amanda: The Advanced Maryland Automated Network Disk Archiver is an open source distributed backup system that was originally developed for Unix systems at the University of Maryland in the early 1990s. Amanda makes it quite easy to back up any number of client workstations to a central backup server, supports Windows Microsoft backups via Samba, and provides a complete backup management system for your Ubuntu system. Amanda supports multiple sets of backups with distinct configurations, supports disc and tape backups, tracks backup levels and dates on its client systems, produces detailed reports that are automatically delivered via e-mail, and keeps extensive logs that make it easy to diagnose and correct the reason(s) behind most problems. Communication between Amanda clients and servers is encrypted to heighten security. Amanda is not installed by default on Ubuntu systems, but is available in the Ubuntu repositories and can easily be installed using Synaptic, apt-get, or aptitude. Amanda consists of two packages, amanda-server and amanda-client. Take a glance at  Amanda’s home Web site.
    • BackupPC: BackupPC is a nice backup system that provides a Web-based interface that enables you to back up remote systems using smb, tar, or rsync. BackupPC creates backups of your remote systems that are stored and managed on your BackupPC server, and also enables authorized users to restore their own files from these archives, removing the number one source of migraines for system administrators. Configuration data for each client system is stored on the BackupPC server, which enables you to back up different types of systems using different commands or protocols, and to easily identify which remote directories or filesystems you want to back up.  One especially nice feature of BackupPC is that it uses standard Linux commands on the server to create backups, and therefore doesn’t require the installation of any software on client systems, though some client-side configuration may be necessary for certain backup commands. See at BackupPC’s home page . See “Installing and Using the backuppc Utility,” later in this section for more information about installing, setting up, and using BackupPC.
    • Bacula: Bacula is an extremely powerful set of programs that provide a scalable network backup and restore system that supports Linux, Unix, and Microsoft Windows systems. Its power and flexibility easily match that of Amanda, but it is more flexible in terms of how and where backups are stored. Bacula is not installed by default on Ubuntu systems, but is available in the Ubuntu repositories and can easily be installed using Synaptic, apt-get, or aptitude. Bacula is quite powerful, but can be complex — if you’re interested in exploring Bacula, you may want to start by installing the bacula-doc package and reading its documentation to determine if it is right for your environment. Bacula is primarily command-line oriented, but provides a graphical console as a wrapper around its command-line interface. Bacula’s home page is here .
    • Rsync: Rsync (remote sync) is a command-line file and directory synchronization program that makes it easy to copy files and directories from one host to another. When both a local and remote copy of a file or directory hierarchy exist, rsync is able to leverage built-in features that help reduce the amount of data that needs to be transmitted to ensure that the local and remote copies of those files and directories are identical. The remote-update protocol used by the rsync utility enables rsync to transfer only the differences between two sets of files and directories. The rsync program is automatically installed as part of a default Ubuntu installation, but requires some configuration on the remote systems that you want to copy to your local host.


    Backing Up Files to Local, Removable Media

    The introductory section of this text entitled introduced the basic concepts of backups, many of  which may seem impractical for home use. Whether or not they are really impractical depends on the problems that you want to be able to solve using your backups.
    • If you’re mostly interested in protecting yourself  against disk failures or the accidental deletion of critical files that you’ re working on, you may not need to worry about doing archive and incremental backups — doing spot backups of important files and directories to a CD-R or DVD-R may suffice. 
    • Similarly, if you don’t need to be able to restore any file from any point in time, but just need to have recent copies of your files, then spot backups of the directories that you want to back up may be sufficient, done with whatever frequency you’re comfortable with. 
    • If you’re not concerned about losing all of your data if your house or apartment is destroyed, then you don’t have to worry about things like storing backups off-site.
    The bottom line is that I can’t tell you what you’re comfortable with — that’s up to you, and defines your backup strategy. The next few sections highlight how you can use some of the utilities mentioned earlier (and even the standard Linux cp command) to create backup copies of important files.
    For home use, the most popular backup method is simply dragging and dropping directories to CD-R or DVD-R media to create spot backups of those directories. The second most popular way of backing up your system is to use hard drives that you can attach to your systems via USB or FireWire  ports. On the plus side, unless you’re using a really small removable hard drive, this gives you a larger pool of available storage for backups than a CD or DVD, and enables you to either store more backups of important files and directories or create a single copy of each important directory on removable storage which you can then just update each time you do backups. On the minus side, a removable hard drive is much more expensive than CD-R or DVD-R disks and is more of a pain to store off-site and retrieve each time you do backups.

    Archiving and Restoring Files Using tar
    The tar program is one of the oldest and most classic Linux/Unix utilities. Though it can write to a backup device, such as a tape drive, the tar command is most commonly used to create archive files, such as source code, that can easily be shared with others. Archive files created using the tar command typically have the .tar file extension. The GNU tar command, which is the version of tar found on Ubuntu and all other Linux systems, provides built-in compression capabilities, being able to automatically compress tar archives on the fly. Compressed tar archives typically have either the file extension .tgz, indicating  that they are compressed (and can be uncompressed) using the gzip application, or the file extension  .tar.bz2, indicating that they are compressed (and can be uncompressed) using the bzip2 application. Archive files produced using the tar utility are typically referred to as tarballs.
    Because of its age, you have to be kind when passing arguments to the tar command, because in some cases they must be specified in a particular order.
    Creating an archive file using tar is easy. For example, to create a tarball called home_dir_backup.tgz that contains all of the directories in /home, you could use commands like the following:

    $ cd /home
    $ sudo tar czvf /tmp/home_dir_backup.tgz *


    Note that you want to write the backup file somewhere other than the directory that you are backing up. Creating a backup file in the directory that you’re working in would cause the tar command to back up the file that it was creating, which would both not work correctly and waste tremendous amounts of space.
     The tar options in this command have the following meanings:
    • c: Create a new archive file. If a file by the specified name already exists, it will be overwritten and its original contents will be lost.
    • z: Compress the archive file using the same techniques used by the gzip application.
    • v: Be verbose, displaying the name of every file added to the archive file as it is added.
    • f: Write the output of the tar command to the file whose name appears as the next argument on the command-line. In this example, the output of the tar command would be written to the file /tmp/home_dir_backup.tgz.
    After a significant amount of output, the file /tmp/home_dir_backup.tgz will be created, containing a complete recursive copy of all files and directories under /home. You can then copy this file to backup media such as a CD or DVD, or to a removable hard drive.
        After you’ve created a tarball of a given set of directories, you can easily create another tarball that only contains files and directories that have changed since a specific date (such as the date on which the first tarball was created) using commands like the following:


    $ cd /home
    $ sudo tar czvf /tmp/home_dir_backup.tgz * --newer “2006-06-23”
    


    This command produces extremely verbose output, even if you drop the v option, which is puzzling at first.  This is an artifact of the format used in tar files. Even when used with the --newer option, the tar file header must contain the complete directory structure in which it is looking for files newer than the specified date. This is necessary so that the tar command can create extracted files in the right directory location. In other words, if you use the tar command to extract the entire contents of a tarball created using  the --newer option, it will create an empty directory hierarchy that only contains files that are newer than the date that was specified when the tarball was created.
        Creating tarballs isn’t much fun without being able to retrieve files from them. You can extract various things from a tarball:
    • Its entire contents. For example, the following command would extract the entire contents of the tarball home_dir_backup.tgz, creating the necessary directory structure under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz
    • One or more directories, which recursively extracts the complete contents of those directories. For example, the following command would extract the directory Ubuntu_Bible and all the subdirectories and files that it contains from the tarball home_dir_backup.tgz, creating the necessary directory structure under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz Ubuntu_Bible
    • One or more specific files, which extracts only those files but creates all of the directories necessary to extract those files in their original location. For example, the following command would create the directory Ubuntu_Bible and extract the file chap22.txt from the tarball home_dir_backup.tgz, creating the Ubuntu_Bible directory under the directory in which you executed the command: $ sudo tar zxvf home_dir_backup.tgz Ubuntu_Bible/chap22.txt
    For more detailed information on the tar command, see its online reference information (man tar). As one of the oldest Linux/Unix commands, it has accumulated a huge number of command-line options over the years, many of which you will probably never use. However, command-line options are like bullets — you can never have too many.


    Making an Up-to-Date Copy of a Local Directory Using cp
    If you’re only backing up a few directories and are primarily concerned with keeping copies of the files that you are actively working on, it’s often simplest to just keep copies of those directories on removable media.
        The traditional Linux/Unix cp command provides options that make it easy to create a copy of a specified directory, and then to subsequently update only files that have been updated or that do not already exist in the copy. For example, to back up all of the directories in /home to a removable drive mounted at /media/LACIE (LACIE is a popular manufacturer of prepackaged USB hard drives), you could use a command like the following:

    $ sudo cp –dpRuvx /home /media/LACIE/home

    The cp options in this command have the following meanings:
    • d: Don’t de-reference symbolic links, i.e., copy them as symbolic links instead of copying what they point to.
    • p: Preserve modes and ownership of the original files in the copies.
    • R: Copy the specified directory recursively.
    • u: Copy files only if the original file is newer than an existing copy, or if no copy exists.
    • v: Display information about each file that is copied. (You may not want to use this option, but it’s interesting, at least the first few times you do this.)
    • x: Don’t follow mount points to other filesystems.
    After running this command, you will have a copy of every directory under /home on your system in the directory /media/LACIE/home. You can then detach your removable drive and store it somewhere safe (preferably off-site). Any time that you want to update your backup, retrieve the drive and update this copy at any time by simply rerunning this command.


     Making an Up-to-Date Copy of a Remote Directory Using rsync
    As mentioned earlier, rsync is a commonly used command-line utility that enables you to push or pull files to or from remote systems. The rsync program must be configured on the remote systems before you can push or pull file or directories to or from those systems.
        To use rsync on an Ubuntu system, you must first enable it so that the system starts rsync as a background process, and then also modify the rsync configuration file to add entries for specific directories  that you want to be able to read from and write to remotely. To enable rsync, edit the file  /etc/defaults/rsync using your favorite text editor and a command like the following:
    $ sudo emacs /etrc/default/rsync

    In the line that begins with RSYNC_ENABLE, change false to true, and then save the updated file.
     Next, create the rsync configuration file before actually starting the rsync daemon.
     Most Linux systems use an Internet service manager such as inetd or xinetd to manage incoming requests for on-demand services such as ftp, tftp, rsync, and vnc. These Internet service managers automatically start the appropriate daemon when an incoming request is received. Though these Internet service managers are available in the Ubuntu repositories, they are not installed by  default. On Ubuntu systems, a specific system startup file that starts rsync in daemon mode is provided as /etc/init.d/rsync. If you subsequently install xinetdrsync requests, you will want to disable this file and create the file /etc/xinetd.d/rsync to make sure that the rsync service is enabled on your system.
    The /etc/defaults/rsync file just determines whether rsync is enabled or not. The actual configuration information for rsync itself is stored in the file /etc/rsyncd.conf, which does not exist by default on an Ubuntu system. To create this file, use your favorite text editor and a command like the following:
    $ sudo emacs /etc/rsyncd.conf

    A minimal rsync configuration file that contains a definition remotely synchronizing the directories under /home on your system would look something like the following:

             uid = root
             transfer logging = true
             log format = %h %o %f %l %b
             log file = /var/log/rsyncd.log
             hosts allow = 192.168.6.255/3
             [homes]
                    path = /home
                    comment = Home Directories
                    auth users = wvh
                    secrets file = /etc/rsyncd.secrets

    The first section of this file sets parameters for how the rsync daemon runs. In order, the rsync daemon runs as root (uid), logs all transfers (transfer logging), uses a specific log file format (log format) and log file (log file), and allows access from any host whose IP address is on the 192.168.6.x subnet  (hosts allow).
        The second section of this file identifies a synchronizable entity known as home that maps to the directory /home on that system. Synchronization to or from this directory is done as the user wvh, whose password must be supplied in the file /etc/rsyncd.secrets.
        After saving this file, use the sudo command and your favorite text editor to create the file  /etc/rsync.secrets, with a command like the following:
    $ sudo emacs /etc/rsyncd.secrets
    This file should contain an entry for each auth users entry in the /etc/rsync.conf file, in this case wvh. Each entry in this file contains the name of a user, a colon, and the plain-text password for that user, as in the following example:
    wvh:hellothere
    Next, save this file and make sure that it is readable only by the root user on your system using a command  like the following:
    $ sudo chmod 600 /etc/rsyncd.secrets
    You can now start the rsync daemon using the following command:
    $ sudo /etc/init.d/rsync restart
    You can now create a local copy of the /home directory on your Ubuntu system using a command like the following, where ubuntu-system is the name or IP address of the system on which you just configured the rsync daemon:
    $ rsync –Havz ubuntu-system-addr::home /media/LACIE/home
    The arguments to the rsync command in this example have the following meaning:
    • H: Preserve hard links if these exist in any directories that are being copied.
    • a: Use archive mode, which preserves ownership, symbolic links, device iles, and so on, and is essentially a shortcut that saves you specifying several other options.
    • v: Be verbose, identifying each file that is copied or considered for copying. (You may not want to use this option, but it’s interesting, at least the first few times you run rsync.)
    • z: Use compression when transferring files, which improves throughput.
    If you have problems using rsync, you should check the /var/log/rsyncd.log file (on the system that you are trying to retrieve files from) for error messages and hints for resolving them. If you are not using the verbose option on the host where you are retrieving these files, you may want to use it to see if you can identify (and resolve) any other errors that the host that is trying to retrieve files is reporting.
    The rsync configuration file created in this section is just a minimal example, and is not particularly secure. For details about all of the options available in an rsync configuration file and information about making rsync more secure, see the man page for the rsyncd.conf file (man rsyncd.conf).


    Installing and Using the backuppc Utility

    This section explains how to install, configure, and use the backuppc utility to back up a variety of hosts on your local network to a central Ubuntu server. Introduced earlier, backuppc is a great application that is both easy to use for a system administrator and empowering for any authorized user. Any authorized user can initiate backups of the machines that they have admin rights to and can also restore files from existing backups of those machines, all using a convenient Web interface.
        If you have more than one machine on your home network, or if you’re working in a multimachine enterprise or academic environment, the BackupPC software is well worth a look. Its Web-based interface is easy to set up and use; various types of supported backups are easy to configure, initiate, and monitor; it can back up your Linux, Unix, Windows, and Mac OS X systems; and the fact that it doesn’t require that you install any special software on the systems that you want to back up makes backuppc a great package.
        The backuppc utility supports four different backup mechanisms (known in the BackupPC documentation as backup transports) to enable you to back up different types of systems. These are the following:
    • rsync: Back up and restore via rsync via rsh or ssh. This is a good choice for backing up Linux, Unix, or Mac OS X systems, and you can also use it to back up Microsoft Windows systems that support rsync, such as those running the Cygwin Linux emulation environment.
    • rsyncd: Back up and restore via rsync daemon on the client system. This is the best choice for Linux, Unix, and Mac OS X systems that are running an rsync daemon. You can also use this mechanism to back up Microsoft Windows systems that support rsyncd, such as those running the Cygwin Linux emulation environment.
    • smb: Back up and restore using the smbclient and the SMB protocol on the backuppc server. This is the best (and easiest) choice to use when backing up Microsoft Windows systems using backuppc, and you can also use it to back up Mac OS X systems or Linux and Unix systems that are running a Samba server.
    • tar: Back up and restore via tar, over ssh, rsh, or nfs. This is an option for Linux, Unix, and Mac OS X systems. You can also use this mechanism to back up Microsoft Windows systems that support tar, ssh, rsh, and/or nfs, such as those running the Cygwin Linux emulation environment.
    A default backup transport value for all backups is set in the primary backuppc configuration file, /etc/backuppc/config.pl. The specific mechanism used to back up any particular host can be identified in that host’s configuration file, as discussed later in the sections entitled “Defining a Backup Using  rsyncd” and “Defining a Backup Using SMB.”
    Although backuppc does a great job of backing up systems running Microsoft Windows and Mac OS X, you should be aware of a few issues. First, backuppc is not suitable for backing up Windows systems so that you can do a bare-metal restore. Backuppc uses the smbclient application on your Ubuntu system to back up Windows disks, so it doesn’t back up Windows ACLs and can’t open files that are locked by a Windows client that is currently running (such as, most commonly, things like Outlook mailboxes). Similarly, backuppc doesn’t preserve Mac OS file attributes. See here for a list of current limitations in using backuppc. It’s a surprisingly short document!

    Installing backuppc
    Special-purpose backup solutions such as backuppc aren’t installed as part of a default Ubuntu installation because they’re probably overkill for most people. However, as with all software packages on Ubuntu, the Synaptic Package Manager makes it easy to install backuppc and the other software packages that it requires. To install backuppc, start the Synaptic Package Manager from the System ➪ Administration menu and supply your password to start Synaptic. Once the Synaptic application starts, click Search to display the search dialog. Make sure that Description and Name are the selected items to search through, enter backup as the string to search for, and click Search. After the search completes, scroll down in the search results until you see the backuppc package, right-click its name, and select Mark for Installation to select that package for installation from the pop-up menu.
        Depending on what software you have previously installed on your Ubuntu system and what you select in Synaptic, a dialog may display that lists other packages that must also be installed, and asks for confirmation. If you see this dialog, click Mark to accept these related (and required) packages. After you are finished making your selections, click Apply in the Synaptic toolbar to install backuppc and friends on your system.
        Once the installation completes, the configuration phase starts. During this phase, Synaptic automatically runs a script that sets up the initial account that you will use to access backuppc via your Web server. This process displays a dialog, which tells you the initial password for the Web-based backuppc interface.
        Once you see this dialog, write down the password for the backuppc interface and click Forward. Once the remainder of the installation and configuration process completes, you’re ready to back up the system you’re using and the other systems on your network.


    Configuring backuppc
     On Ubuntu systems, backuppc stores its configuration information in two locations.
    1. General backuppc configuration information and passwords are stored in files in the directory /etc/backuppc
    2. Backup files themselves and host-specific backup configuration information is stored in subdirectories of /var/lib/backuppc.
    Backups of a single system take a significant amount of space, which is only compounded when you begin to back up other hosts to a central backup server.
    If you didn’t specify using logical volumes when you installed your Ubuntu system, you may want to add a new disk to your system before starting to use backuppc and format that disk as a logical volume. 
    You can then copy the default contents of /var/lib/backuppc to the new disk (preserving file permissions and ownership), and mount that disk on the directory /var/lib/backuppc on the system that you are using for backups. When you need more space to store backups in the future, this will enable you to add other disks to your system and add their space to the logical volume used to store backups. The backuppc utility also provides an archive capability that enables you to migrate old backups to other hosts for archival purposes, freeing up disk space on your primary backup server.
        Though not discussed here, setting up archives hosts is discussed in  the BackupPC document — which is great, by the way!

    The first thing that you should do is to change the backuppc password to something easier to remember than the random string generated during the backuppc installation process. You can do this by issuing the  following command:
    $ sudo htpasswd /etc/backuppc/htpasswd backuppc
    This sequence uses sudo to run the htpasswd command to change the password for the user backuppc in the file /etc/backuppc/htpasswd. When you are prompted for a new password, enter something easier to remember than “TLhCi25f,” which was the default password generated for my backuppc installa-
    tion. You will be prompted to reenter the new password to make sure that you typed it correctly.


    Identifying Hosts to Back Up
    Each host that you want to back up must be identified in the file /etc/backuppc/hosts. Like all backuppc configuration files, this file is easy to update. Any characters in any lines in this file that follow a hash mark are comments, which help explain the meaning of the various fields used in the file. A minimal backuppc configuration file looks like the following:
    host                     dhcp        user                    moreUsers
    localhost                0           backuppc
    The first non-comment line in /etc/backuppc/hosts defines the names of the various fields in each line, and should therefore not be modified (This is the line beginning with the word “host” in the example). All other lines represent entries for hosts that will be backed up. The first actual host entry, for localhost, is a special entry used for backing up system configuration information on the backuppc server, and should not be changed. The fields in each entry that define a host have the following meanings:
    • The first field identifies a particular machine, either by hostname, IP address, or NetBios name.
    • The second field should be set to 0 for any host whose name can be determined by DNS, the local hosts file, or an nmblookup broadcast. This field can be set to 1 to identify systems whose names must be discovered by probing a range of DHCP addresses, as is the case in some environments where DHCP and WINS are not fully integrated. Setting this field to 1 requires changes in the host-specific configuration file’s $Conf{DHCPAddressRanges} variable to define the base IP address and range of IP addresses that should be probed.
    • The third field identifies the name of the person who is primarily responsible for backing up that host. This primary user will receive e-mail about the status of any backup that is attempted. I tend to leave this as the backuppc user, so that this user maintains an e-mail record of all backup attempts, but you can set this to a specific user if you wish.
    • The fourth field (which is optional) consists of one or more users who also have administrative rights to initiate backups or restore files for this machine. The names of multiple users must be separated by a comma.
    As an example, the hosts file on one of my backuppc servers looks like the following:
    host                     dhcp        user                    moreUsers
    localhost                0           backuppc
    192.168.6.64          0           backuppc                wvh
    64bit                       0           backuppc                wvh,djf
    64x2                       0           backuppc                 juser
    win2k                     0           backuppc                wvh,djf
    The backuppc program checks the timestamp on the /etc/backuppc/hosts files each time the backuppc process wakes up, and reloads this file automatically if the file has been updated. For this reason, you should not save changes to the hosts file until you have created the host-specific configuration files, as described in the examples in the next two sections. If the backuppc process reloads  the hosts file before you have created the host-specific configuration data and another authorized user initiates a backup of this system, you will either back up the wrong thing or a backup failure will occur. You can always make changes to the hosts file and leave them commented out (by putting a # as the first character on the line) until you have completed the host-specific configuration.


    Defining a Backup Using rsyncd
    The section earlier in this chapter entitled “Making an Up-to-Date Copy of a Remote Directory Using rsync” explained how to set up rsync in daemon mode on an Ubuntu system and how to define synchronization  entries that can be remotely accessed via rsync. The sample rsync configuration file created in that section defined a synchronization entry called homes that would enable an authorized user to synchronize the contents of all directories under /home on a sample Ubuntu system. We’ll use that same configuration file in the example in this section.
        The previous section showed how to define entries in the /etc/backuppc/hosts file for the various hosts that you want to back up via backuppc. The first step in host-specific configuration is to use the sudo command to create a directory to hold host-specific configuration data, logs, and so on. Throughout this section, I’ll use the sample host entry 64bit, which I defined in the section entitled “Identifying Hosts to Back Up” as an example.
    • The first step in host-specific configuration is to use the sudo command to create the directory /var/lib/backuppc/64bit, as in the following command: $ sudo mkdir /var/lib/backuppc/64bit

    • Next, use the sudo command and your favorite text editor to create a host-specific configuration file named config.pl in that directory, using a command like the following: $ sudo emacs /var/lib/backuppc/64bit/config.pl The contents of this file should be something like the following;
               $Conf{XferMethod} = 'rsyncd';
               $Conf{CompressLevel} = '3';
               $Conf{RsyncShareName} = 'homes';
               $Conf{RsyncdUserName} = 'wvh';
               $Conf{RsyncdPasswd} = 'hellothere';
      The first line identifies the backup mechanism used for this host as rsyncd, which overrides the default backup mechanism specified in the generic /etc/backuppc/config.pl file. The second line sets the compression level for this host’s backups to level 3, which provides a good tradeoff between the CPU load and time required to do compression and the amount of compression that you actually get. The last three entries in this file correspond to the synchronization entry in the sample rsyncd.conf and associated rsyncd.secrets file created in “Making an Up-to-Date Copy of a Remote Directory Using rsync” earlier.  
    When using backuppc to do automated backups, I like to create a separate authorized user to use rsync for backup purposes, so that the system logs show who actually requested a  remote sync operation. To do this, you would add this user (I usually use backuppc) to the auth users entry in the remote host’s /etc/rsyncd.conf file and create an appropriate username/password pair in the remote host’s /etc/rsyncd.secrets file. You would then modify the host-specific backuppc configuration file to use this username and password. I didn’t do this here for simplicity’s sake, but doing this would provide more accurate log data on the client system.

    • If the remote system uses an rsync binary other than the default /usr/bin/rsync or the rsync program is listening on a port other than the standard port (873), you should add correct definitions for these to the host-specific configuration file. The default settings for the associated configuration parameters are the following:        $Conf{RsyncdClientPort} = 873;
              $Conf{RsyncClientPath} = ‘/usr/bin/rsync’;
      Next, change the ownership and group of the /var/lib/backuppc/64bit directory to backuppc and change the protection of the configuration file /var/lib/backuppc/64bit/config.pl so that it is not publicly readable (because it contains password information) using the following commands:
              $ sudo chmod -Rv backuppc:backuppc /var/lib/backuppc/64bit
              $ sudo chmod 600 /var/lib/backuppc/64bit/config.pl
    • The last step in creating a host-specific backup definition for backuppc is to cause the backuppc process to reread its configuration data, which you can do by explicitly reloading the configuration file, explicitly restarting the backuppc process, or by sending the associated process a hang-up (HUP) signal. You can force backuppc to reload the configuration file using the following command:
              $ sudo /etc/init.d/backuppc reload
      The definition for your backup host can now be selected via the backuppc Web interface.

    • At this point, you can follow the instructions in the section entitled “Starting Backups in backuppc” to back up this host. The example in this section only backs up the home directories of users on the remote machine. To recursively back up other directories, you would simply create other synchronization entities for those directories in the remote host’s /etc/rsyncd.conf file, and then add entries for those synchronization entities to the host-specific configuration file. For example, to back up synchronization entries named homes, /, and /boot, you would change the host-specific RsyncShareName entry to look like the following:
              $Conf{RsyncShareName} = [‘/’, ‘homes’, ‘/boot’];
      If you back up multiple filesystems or synchronization points, you may create a custom set of arguments to the rsync command in the host-specific configuration file. This enables you to add options such as --one-file-system, which causes backuppc to back up each filesystem separately, simplifying restores.
           You can also add options to exclude certain directories from the backups, which you will certainly want to do if you are backing up a remote system’s root directory (‘/’), as in following examples:
              $Conf{RsyncArgs} = [
                            # original arguments here
                            ‘--one-file-system’,
                            ‘--exclude’, ‘/dev',
                            ‘--exclude’, ‘/proc’,
                            ‘--exclude’, ‘/media’,
                            ‘--exclude’, ‘/mnt’,
                            ‘--exclude’, ‘/lost+found’,
              ];

    These settings would prevent backups of /dev, which contains device nodes and is dynamically populated at boot time on modern Linux systems, /proc, which is the mount point for an in-memory filesystem that contains transient data, directories such as /media and /mnt on which removable media is often temporarily mounted, and /lost+found, which is a directory used during filesystem consistency checking. You can also exclude directories from rsync backups using the BackupoFilesExclude directive, as in the following example:
    $Conf{BackupFilesExclude} = [‘/dev’, /proc’, ‘/media’, ‘/mnt’, ‘/lost+found’];
    The backuppc program reads the configuration settings in /etc/backuppc/config.pl first, and then loads host-specific configuration settings, which enables the /etc/backuppc/config.pl file to provide default settings for all backups. After you have used backuppc for a while and are comfortable with various settings, you may want to consider modifying the default settings in the /etc/backuppc/config.pl file for configuration variables such as $Conf{RsyncArgs}, $Conf{BackupFilesExclude}, and $Conf{CompressLevel}, to minimize the number of entries that you have to create in each of your host-specific configuration files.



    Defining a Backup Using SMB
    The section of this chapter entitled “Identifying Hosts to Back Up” showed how to define entries in the /etc/backuppc/hosts file for the various hosts that you want to back up via backuppc. The first step in host-specific configuration is to use the sudo command to create a directory to hold host-specific configuration data, logs, and so on. Throughout this section, I’ll use the sample host entry win2k from the sample hosts file as an example. As you might gather from its name, this is indeed a system running Microsoft Windows 2000. There’s no escaping from the Borg.
    • The first step in host-specific configuration is to use the sudo command to create the directory /var/lib/backuppc/win2k, as in the following command:
              $ sudo mkdir /var/lib/backuppc/win2k

    • Next, use the sudo command and your favorite text editor to create a host-specific configuration file named config.pl in that directory, using a command like the following:
              $ sudo emacs /var/lib/backuppc/win2k/config.pl
      The contents of this file should be something like the following;
              $Conf{XferMethod} = 'smb';
              $Conf{CompressLevel} = '3';
              $Conf{SmbShareName} = ['wvh', 'djf'];
              $Conf{SmbShareUserName} = 'backuppc';
              $Conf{SmbSharePasswd} = 'hellothere';
      The first line identifies the backup mechanism used for this host as smb, which overrides the default backup mechanism specified in the generic /etc/backuppc/config.pl file. The second line sets the compression level for this host’s backups to level 3, which provides a good tradeoff between the CPU load and time required to do compression and the amount of compression that you actually get. The last three entries in this file define the Windows shares that you want to back up, the name of an authorized user who has access to these shares, and the password for that user.
    When using backuppc to back up Microsoft Windows systems, you should create a Windows user that you will only use to do backups, and then add this user to the standard Windows Backup Operators group. This prevents you from having to put your Windows administrator password in the  backuppc configuration files. Even though you’ll protect those files so that randoms can’t read them, the fewer places where you write down a password, the better, especially one with the keys to your entire Windows kingdom.
    • Next, change the ownership and group of the /var/lib/backuppc/win2k directory to backuppc and change the protection of the configuration file /var/lib/backuppc/win2k/config.pl so that it is not publicly readable (because it contains password information) using the following commands:         $ sudo chmod -Rv backuppc:backuppc /var/lib/backuppc/win2k
               $ sudo chmod 600 /var/lib/backuppc/win2k/config.pl
    • The last step in creating a host-specific backup definition for backuppc is to cause the backuppc process to reread its configuration data, which you can do by explicitly reloading the configuration file, explicitly restarting the backuppc process, or by sending the associated process a hang-up (HUP) signal. You can force backuppc to reload the configuration file using the following command:         $ sudo /etc/init.d/backuppc reload
       The definition for your backup host can now be selected via the backuppc Web interface. At this point, you can follow the instructions in the section entitled “Starting Backups in backuppc” to back up this host.
    The example in this section only backs up shares that correspond to the home directories of selected users on the remote machine. As mentioned earlier in this text, backuppc backups do not support bare-metal restores of Windows systems, and I therefore typically don’t back up shares such as C$, which is a default Windows share that represents your system’s boot drive. You may find it useful to do so to make sure that you have backup copies of drivers, the registry, and so on, but I find it simpler to start from scratch when reinstalling Windows.
    Windows systems accumulate so much crap in their filesystems over time that doing a fresh installation from your distribution media often frees up a surprising amount of space. 
    If you have several identical systems, restoring partition images created with Norton Ghost or the Linux partimage or g4u utilities is always the fastest way to rebuild a Windows system without having to locate the drivers for every device that you will ever want to use with your rebuilt system and reinstalling all  of your favorite applications.

    The backuppc program reads the configuration settings in /etc/backuppc/config.pl first, and then loads host-specific configuration settings, which enables the /etc/backuppc/config.pl file to provide default settings for all backups. After you have used backuppc for a while and are comfortable with various settings, you may want to consider modifying the default settings in the /etc/backuppc/config.pl file for configuration variables, such as $Conf{CompressLevel}, to minimize the number of entries that you have to create in each of your host-specific configuration files.


    Starting Backups in backuppc
    Thanks to backuppc’s Web orientation, starting backups, viewing the status of those backups, and checking the backup history for any host is impressively easy.
    1. To start a backup in backuppc, connect to the backuppc Web interface using the URL http://hostname/backuppc, where hostname is the name of the host on which the backuppc server is running. A dialog displays in which you are prompted for the login and password of an authorized user. Once you enter the user/password combination for a user listed in the file /etc/backuppc/htpasswd, the backuppc server’s home page displays.
    2. Once this screen displays, click the Select a host... drop-down box and select one of the hosts from the list that displays.
    3. Selecting the name of any host takes you to a summary page for that host, which provides status information, lists authorized users who can back up and restore files to this host using backuppc, and displays the last e-mail that was sent about this host. Each system’s home page displays the subject of the last e-mail sent to the owner of this host. E-mail is only sent occasionally, so seeing a historical problem report does not mean that this problem is still occurring. 
    4. Once this page displays, you can scroll down on the page to see additional status information about available backups, any transfer errors that occurred during backups, and other tables that show the status of the pool where backup files are archived and the extent to which existing backups have been compressed to save disk space. 
    5. To start a backup, click either Start Full Backup to start a full (archive) backup of the system, or Start Incr Backup to start an incremental backup containing files that have changed since the last full backup. 
    6. The  confirmation page displays. Clicking Start Full Backup (or Start Incr Backup for an incremental backup) queues the backup and displays a link that you can click to return to the main page for that host to monitor the state of the backup.
       

    Restoring from Backups in backuppc
    Thanks to backuppc’s Web orientation and the fact that backuppc backups are stored online on the backup server, restoring files from backuppc can be done online, by any authorized user whose name is associated with that host in the /etc/backuppc/hosts file. Backuppc enables you to browse through online backups, interactively select the files and directories that you want to restore, and restore them in  various ways.
    1. To begin restoring files or directories, click the name of the full or incremental backup in which they are located. A screen displays.The bottom of the screen displays a hierarchical listing of the files and directories that are contained in the full or incremental backup that you selected. If you selected an incremental backup, the contents of that incremental backup are overlaid on the contents of the previous full backup to give you an accurate snapshot of the contents of your system when the backup was done. You can drill down into the backup by selecting directories from the tree view at the left, or you can drill down into individual directories by selecting from the view of the current directory shown at the right of the main window.
    2. Once you have selected all of the files and directories that you want to restore, scroll to the bottom of the restore page and click restore selected files. A page that enables you to specify how you want to restore those files displays. You have three options when restoring files using the backuppc Web interface:





      1. Direct restore: Selecting this option restores files directly to the host from which they were backed up. When doing a direct restore, you have the option of restoring files in the locations from which they were originally backed up, or into a subdirectory that backuppc will create for you if it does not already exist. (The latter is almost always a good idea so that you don’t accidentally overwrite any files that you don’t actually mean to.) To select this option, enter the name of any subdirectory that you want to use (I usually specify one called tmp) and click Start restore.
      2. Download Zip archive: Selecting this option restores the selected files and directories into a zip-format archive that you can download to your desktop and manually extract the contents of. When selecting this option, you can optionally specify the compression level used in the zip file, which can be important if you are restoring large numbers of files. To select this option, click Download Zip file.
      3. Download Tar archive: Selecting this option restores the selected files and directories into a tar-format archive that you can download to your desktop and manually extract the contents of. To select this option, click Download Tar file.
      If you selected the Direct restore option, backuppc displays a confirmation screen. This lists the files and directories that you selected for restoration and confirms the location to which they will be restored, including the name of any subdirectory that you specified. 
    3. To proceed, click Restore, If you selected the Zip or Tar archive options, the backuppc application displays your Web browser’s standard file download dialog after the archive file has been created.
    As you can see from this section (and the preceding sections), backuppc provides a powerful, flexible interface for backing up and restoring files on many different systems to a single backuppc server. All you need are a few configuration files and sufficient disk space, and lost files (and the lost time that is usually associated with them) can be a thing of the past.