Setting Up a Web Server
Most of the significant advances in computing technology have what is known as a killer app (killer application) — one significantly unique, powerful, and compelling type of application that draws people to that technology in droves and makes it a part of the computing landscape for the foreseeable future.
- For personal computers in general, that application was the spreadsheet.
- For the Apple Macintosh, that application was desktop publishing.
- For the Internet, that application was the World Wide Web. Sure, everyone loved e-mail, but the World Wide Web has turned the Internet into a seething pool of e-commerce, personal and technical information, social networking, and who knows what else in the future.
World Wide Web 101
If you are new to the Web, this section provides some quick history and a sampling of Web buzzwords so that I won’t surprise you by using new terms at random.
In 1989, what has become the World Wide Web first entered the world in the mind of Tim Berners-Lee at CERN (Conseil Européenne pour la Recherche Nucleaire), the European Laboratory for Particle Physics near Geneva, Switzerland. The term World Wide Web wasn’t actually coined until 1990, when Tim Berners-Lee and Robert Cailliau submitted an official project proposal for developing the World Wide Web. They suggested a new way of sharing information between researchers at CERN who used different types of terminals and workstations. The unique aspect of their information sharing model was that the servers would host information and deliver it to clients in a device-independent form, and it would be the responsibility of each client to display (officially known as render) that information. Web clients and servers would communicate using a language (protocol) known as HTTP, which stands for the HyperText Transfer Protocol.You Say URL, I Say URI...Different Web-aware applications often use different terms to what you and I might simply think of as “Web addresses.” URL (Uniform Resource Locator) is the traditional acronym and term for a Web address, but the acronym and term URI (Uniform Resource Identifier) is actually more technically correct. Another acronym and term that you may come across is URN (Universal Resource Name).
The relationship between these acronyms is the following:
The bottom line is that most people think of and use the terms URI, URL, and “Web address” interchangeably. If you want to pick one to use, URI is the right term to use.
- a URI is any way to identify a Web resource.
- A URL is a URI that explicitly provides the location of a resource and the protocol used to retrieve it.
- A URN is a URI that simply provides the name of a resource, and may or may not tell you how to retrieve it or where it is located.
Hypertext is just text with embedded links to other text in it. The most common examples of hypertext outside of the World Wide Web are various types of online help files, where you navigate from one help topic to another by clicking on keywords or other highlighted text. The most basic form of hypertext used on the Web is HTML, the HyperText Markup Language, which is a structured hypertext format that I’ll talk about a little later in this section.On the World Wide Web, the servers are Web servers and the clients are typically browsers, such as Firefox, Opera, SeaMonkey, Netscape, Microsoft Internet Explorer, Apple’s Safari, and many others, running on your machine. To retrieve a Web page or other Web resource, you
- enter its address as a Uniform Resource Identifier (URI) in your browser by either typing it in or clicking on a link that contains a reference to that URI.
- Your browser contacts the appropriate Web server, which uses that URI to locate the resource that you requested and
- returns that resource as a stream of hypertext information that your browser displays appropriately, and you’re off and running!
scheme://host/pathnameThe scheme is one of http, ftp, file, and many more, and specifies how to contact the server running on host, which the Web server then uses to determine how to act on your request. The pathname is an optional part of the URI that identifies a location used by the server to locate or generate information to return to you.
Web pages consist of a static or dynamically generated text document that can contain text, links to other Web pages or sites, embedded graphics in a variety of formats, references to included documents such as style sheets, and much more. These text documents are created using a structured markup language called HTML, the HyperText Markup Language.
A structured markup language is a markup language that enforces a certain hierarchy where different elements of the document can appear only in certain contexts. Using a structured markup language can be useful to guarantee that, for example, a heading can never appear in the middle of a paragraph. Like documents in other modern markup languages, HTML documents consist of logical elements that identify the type of each element — it is the browser’s responsibility to identify each element and determine how to display (ren-
der) it. Using a device-independent markup language simplifies developing tools that render Web pages in different ways, convert the information in Web pages to other structured formats (and vice versa), and so on.
Introduction to Web Servers and Apache
As mentioned in the previous section, the flip side of a Web browser is the Web server, the application that actually locates and delivers content from a specified URI to the browser. What does a Web server have to do? At the most basic level, it simply has to deliver HTML and other content in response to incoming requests. However, to be useful in a modern Web-oriented environment, a Web server has to do several things. The most important of these are the following:
- Be flexible and configurable to make it easy to add new capabilities, Web sites, and support increasing demand without recompilation and/or reinstallation.
- Support authentication to limit users who can access specific pages and Web sites.
- Support applications that dynamically generate Web pages, such as Perl and PHP, to support a customizable and personal user experience.
- Maintain logs that can track requests for various pages so that you can both identify problems and figure out the popularity of various pages and Web sites.
- Support encrypted communications between the browser and server, to guarantee and validate the security of those communications.
Many different Web servers are available today, depending on your hardware platform, the software requirements of third-party software that a Web site depends on, your fealty to a particular operating system vendor, and whether or not you are willing to run open source software, get additional power, and save money.
As you might expect, the first Web server in the world went online at CERN, along with the first Web browser. These were written and ran on NeXT workstations, not exactly the world’s most popular platform (sadly enough). The first test of a Web server outside of Europe was made using a server running at the Stanford Linear Accelerator Center (SLAC) in the United States.
The development focus of Web servers that ran on more popular machines was initially the NCSA (National Center for Supercomputing Applications) Web server, known NCSA httpd (HTTP Daemon). Their development of a freely available Web server paralleled their development of the NCSA browser, known as Mosaic. When one of the primary developers of NCSA httpd (Rob McCool) left the NCSA, a group of NCSA httpd fans, maintainers, and developers formed to maintain and support a set of patches for NCSA httpd. This patched server eventually came to be known as the Apache Web server. Though the official Apache Web site used to claim that the name “Apache” was chosen because of their respect for the endurance and fighting skills of the Apache Indians, most people (myself included) think that this was a joke, and that the name was chosen because the Web server initially consisted of many patches — in other words, it was “a patchy Web server.”
Two Apache servers are available, contained in the packages apache and apache2. The primary differences between these two versions of the Apache Web server are their code base, their vintage, and how you install and maintain them.
- The apache package is the latest and greatest version of the Apache 1.x family of Web servers, which was excellent in its day, is still extremely popular, and is still in use in many Web sites across the Net.
- However, the apache2 package contains the latest and greatest version of the Apache 2.x Web server, which is essentially “Apache, the Next Generation.” Though things work differently in Apache 2.x, especially from a system administrator’s point of view, Apache 2.x is a far superior Web server and where future Apache extension development is going to take place.
Apache is installed in different ways depending on whether you are running a system installed from an Ubuntu Server CD, an Ubuntu Alternate CD, or an Ubuntu Desktop CD. The differences boil down to whether or not your system has a GUI as follows:
- If you installed your system from an Ubuntu server CD and chose the Install to hard disk option, your system does not have a GUI unless you subsequently installed one. You will probably want to install the Apache 2 Web server using aptitude, because this will also install some recommended packages that you will find useful, such as the Apache documentation.
- If you installed your system from an Ubuntu server CD and chose the Install a LAMP server option, your system does not have a GUI unless you subsequently installed one. However, the Apache 2 Web server was installed as part of your LAMP (Linux, Apache, MySQL, and Perl) server installation. You can skip this installation section and move on.
- If you installed your system from an Ubuntu Alternate CD, you have even more options:
If you selected the Install in text mode option, your system has a GUI and you will probably want to install Apache using Synaptic, as explained in the section entitled “Installing Apache Using Synaptic.”
If you selected the Install in OEM mode option, your system has a GUI and you will probably want to install Apache using Synaptic, as explained in the section entitled “Installing Apache Using Synaptic.”
If you selected the Install a server option, your system does not have a GUI unless you subsequently installed one. You will probably want to install the Apache 2 Web server using aptitude, as explained in the section entitled “Installing Apache from the Command Line,” because this will also install some recommended packages that you will find useful, such as the Apache documentation.
- If you installed your system from an Ubuntu Desktop CD, your system has a GUI and you will probably want to install Apache using Synaptic, as explained in the section entitled “Installing Apache Using Synaptic.”
Installing Apache from the Command Line
It is easiest to install the Apache Web server from the command line using either apt-get or aptitude. Of these two, I suggest that you use aptitude to take advantage of its ability to install recommended packages as well as the basic packages required to run and monitor an Apache Web server on your Ubuntu system.
As mentioned previously, two versions of the Apache Web server are available in different packages, which have different dependencies and recommended packages. This section focuses on installing the Apache 2 Web server. To install the older, Apache 1.3.x Web server, you must have the universe repositories enabled, and you would specify the apache package on the command line rather than the apache2 package. I strongly suggest that you use the Apache 2 Web server unless you must use the Apache 1.3.x Web server because you need to use libraries or modules that are not yet available for Apache 2.To install the Apache 2 Web server from the command line using aptitude, execute the following command:
$ sudo aptitude -r install apache2You will be prompted for your password, and then again to confirm that you want to install the apache2 packages, required packages for apache2, and recommended packages for use with the apache2 package. Press return or type Y and press return to accept these packages, and the Apache 2 Web server and friends will be installed, added to your system’s startup sequence, and started for you. You’re now ready to configure your Web server and add content. Skip to the section entitled “Configuring Apache” for more information.
Installing Apache Using Synaptic
To install the packages required to run and monitor an Apache Web server on your Ubuntu system, start the Synaptic Package Manager from the System ➪ Administration menu, and click Search to display the search dialog. Make sure that Names and Descriptions are the selected items to look in, enter apache as the string to search for, and click Search.
After the search completes and, depending on how your repositories are configured, you will see that two Apache servers are available, contained in the packages apache and apache2. The primary differences between these two versions of the Apache Web server are their code base, their vintage, and how you install and maintain them. The apache package is the latest and greatest version of the Apache 1.x family of Web servers, which was great in its day and is still extremely popular and in use in a zillion Web sites across the Net. However, the apache2 package contains the latest and greatest version of the Apache 2.x Web server, which is essentially “Apache, the Next Generation.” Though things works differently in Apache2, especially from a system administrator’s point of view, Apache 2.x is a far superior Web server and where future Apache extension development is going to take place. Telling you to install anything else would be doing you a disservice.
Right-click on the apache2 package and select Mark for Installation to select that package for installation from the pop-up menu. You may also want to select the apache-doc package, which provides all of the official Apache project documentation for Apache 2.
A dialog will display that lists other packages that must also be installed and asks for confirmation. When you see this dialog, click Mark to accept these related (and required) packages.
Next, click Apply in the Synaptic toolbar to install the Apache 2 server and friends on your system. Once the installation completes, you’re already running an Apache 2 Web server, though it is somewhat limited in its initial capabilities. See the next few sections for information on how to configure it, install Web pages, and generally make your Apache 2 Web server more useful.
Apache 2 File Locations
This & provides a quick overview of the default locations of the configuration files, binaries, and content associated with the Apache 2 Web server on your Ubuntu system:
- /etc/apache2: A directory containing the configuration files for the Apache 2 Web server. The primary configuration file in this directory is the file apache2.conf.
- /etc/apache2/conf.d: A directory containing local configuration directives for Apache 2, such as those associated with third-party or locally installed packages.
- /etc/apache2/envvars: A file containing environment variables that you want to set in the environment used by the apache2ctl script to manage an Apache 2 Web server.
- /etc/apache2/mods-available: A directory containing available Apache 2 modules and their configuration files.
- /etc/apache2/mods-enabled: A directory containing symbolic links to actively enable Apache 2 modules and their configuration files, located in the /etc/apache2/mods-available directory. This is analogous to the use of symbolic links to start various processes from the scripts in /etc/init.d at different run levels.
- /etc/apache2/sites-available: A directory containing files that define the Web sites supported by this server.
- /etc/apache2/mods-enabled: A directory containing symbolic links to actively enabled Web sites for this server, located in the /etc/apache2/mods-available directory. This is analogous to the use of symbolic links to start various processes from the scripts in /etc/init.d at different run levels.
- /etc/default/apache2: A configuration file that determines whether the Apache 2 should automatically start at boot time.
- /etc/init.d/apache2: A shell script that uses the apache2ctl utility to start and stop an Apache 2 Web server.
- /etc/mime.types: The default MIME (Multipurpose Internet Mail Extensions) file types and the extensions that they are associated with.
- /usr/lib/cgi-bin: The location in which any CGI-BIN (Common Gateway Interface scripts) for a default Apache 2 Web server will be installed.
- /usr/sbin/apache2: The actual executable for the Apache 2 Web server.
- /usr/sbin/apache2ctl: An administrative shell script that simplifies starting, stopping, restarting, and monitoring the status of a running Apache 2 Web server.
- /usr/share/apache2-doc: A directory that contains the actual Apache 2 manual (in the manual subdirectory). This directory is present only if you’ve installed the apache2-doc package (as suggested earlier).
- /usr/share/apache2/error: A directory containing the default error responses delivered.
- /usr/share/apache2/icons: A directory containing the default set of icons used by an Apache 2 Web server. This directory is mapped to the directory /icons in your Apache server’s primary configuration file.
- /var/log/apache2/access.log: The default access log file for an Apache 2 Web server. This log file tracks any attempts to access this Web site, the hosts that they came from, and so on.
- /var/log/apache2/error.log: The default error log file for an Apache 2 Web server. This log file tracks internal Web server problems, attempts to retrieve non existent files, and so on.
- /var/run/apache2/apache2.pid: A text file used by Apache 2 to record its process ID when it starts. This file is used when terminating or restarting the Apache 2 server using the /etc/init.d/apache2 script.
- /var/www/apache2-default: A directory containing the default home page for this Web server. Note that the default Apache 2 Web server does not display the content of this directory correctly — I’ll use that as an example of configuring a Web site in the next section.
Some of these directories, most specifically the /etc/apache2 configuration directory, contain other files that are included or referenced by other files in that same directory.
As mentioned in the previous section, the configuration files for the Apache 2 Web server are located in the directory /etc/apache2. Configuration files for Web sites that are available in an Apache 2 Web server are located in the directory /etc/apache2/sites-available. To actually support a site from your Web server, you must create a configuration file for that Web server in /etc/apache2/site-available, and then create symbolic links to that configuration file in the /etc/apache2/sites-available directory.
The only Web site that is provided out of the box with a standard Apache 2 installation is its default Web site, which you would expect to be able to access at http://hostname. Unfortunately, attempting to access this URI on a newly installed Ubuntu Web server often displays the Web page Index of /.
If you are creating a new Web site and want it to be your Web server’s default page, you can simply put your content in the /var/www directory, where things would work fine immediately. I’m using the vagaries of Ubuntu’s default Web page to demonstrate some of the statements in a server configuration file.Let’s use that as an opportunity to explore the configuration file for this Web site, explore its syntax, and change anything that we need to change to see a standard default Apache Web site. The following is a listing of the file /etc/apache2/sites-available/default, to which /etc/apache2/sites-available/000-default is a symbolic link to activate the site on this server. (I’ve added line numbers to make it easier to refer to different entries — they do not actually appear in the file!)
1. NameVirtualHost *
3. ServerAdmin webmaster@localhost
4. DocumentRoot /var/www
6. Options FollowSymLinks
7. AllowOverride None
10. Options Indexes FollowSymLinks MultiViews
11. AllowOverride None
12. Order allow,deny
13. allow from all
14. # Uncomment this directive is you want to see apache2’s
15. # default start page (in /apache2-default) when you go to /
16. #RedirectMatch ^/$ /apache2-default/
18. ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
20. AllowOverride None
21. Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
22. Order allow,deny
23. Allow from all
25. ErrorLog /var/log/apache2/error.log
26. # Possible values include: debug, info, notice, warn, error, crit,
27. # alert, emerg.
28. LogLevel warn
29. CustomLog /var/log/apache2/access.log combined
30. ServerSignature On
31. Alias /doc/ “/usr/share/doc/”
33. Options Indexes MultiViews FollowSymLinks
34. AllowOverride None
35. Order deny,allow
36. Deny from all
37. Allow from 127.0.0.0/255.0.0.0 ::1/128
The first thing that I want to change here is line 3, which sends any mail directed to the Webmaster for this site to webmaster@localhost, which probably doesn’t exist on your machine. You can either set up a local alias for Webmaster in your mail server configuration or simply change this to an explicit site-wide address that you’ve already assigned somewhere. I would change this to email@example.com.
The next thing to fix is line 16, which maps the top-level URI (i.e., anything that begins with a slash, followed immediately by the end of the line) for the site to the DocRoot’s /apache2-default directory. To fix this, simply remove the hash mark at the beginning of the line.
Now, let’s restart the Web server to see if this has changed things:
$ sudo /etc/init.d/apache2 restartVisiting the same URI as before now shows the right page now, which is more like what you expect to see from a vanilla Apache Web server.
Poking around on this page, you can see that the author of the page created a hyperlink called documentation that points to /manual/. However, there is no such directory or an entry in the server’s configuration file defining a redirect to some other directory. So let’s make one. Create something like the entry for the
/doc/ directory that’s shown in lines 31 through 38, but simplify it a bit:
1. Alias /manual/ “/usr/share/doc/apache2-doc/manual/”
3. Order deny,allow
4. Deny from all
5. Allow from 192.168.6 127.0.0.1
The first line defines an alias called /manual/ that actually points to the directory /usr/share/doc/apache2-doc/manual/, which is where Apache’s online manual lives. The rest of the lines define who has access to that directory and under what circumstances. Line 2 defines the beginning of directives related to the directory /usr/share/doc/apache2-doc/manual/, and line 6 identifies the end of a block of
directives for a specific directory. Lines 3, 4, and 5 specify how authentication works. Line 3 says that any statements denying access to the directory are processed before any that allow access to the directory. Line 4 denies all access to that directory, while line 5 allows access to that directory from any host whose first three octets are 192.168.6 (the subnet on which this Web server is running), and from the loopback address for the host. After adding these changes to the file (they must come before the directive shown in line 39 of the previous example because they are part of the definition for this host on this Web server) you can restart the Web server using the same command as before:
$ sudo /etc/init.d/apache2 restartVisiting the same URI as before and trying to access the Apache documentation hyperlink now shows the desired page, which is more like documentation.
You may note that there was no equivalent to line 33 of the original server configuration file. This is because there was no need to provide these directory browsing options because I knew that the directory contained HTML files, so that the following options were not necessary:
- Indexes: Shows an index of the directory if no index.html file is present.
- MultiViews: Enables content negotiation, where the browser tries to find the best match for a request. In my case, I only want to see the docs in my default language, locale, and character set, so no negotiation is necessary.
- FollowSymlinks: I know that there are no symbolic links in this directory, so there’s not need to specify that they should be followed.
As in any debugging or troubleshooting exercise, log files are your friends. Lines 25, 28, and 29 in the original server configuration file shown earlier identify the log files used by this server, and the level of logging that occurs.
- Line 25 identifies the name of the error log file as /var/log/apache2/error.log.
- Line 28 sets the logging level to warn (warnings), which is slightly more useful than only logging errors, but is not as useful as debug when actually debugging a new site or server.
- Line 29 tells the server to create a single log file named /var/log/apache2/access.log that will log all access requests to the server in NCSA combined log format.
- access.log: Shows all attempts to access the server, listing the IP address of the host that attempted access, a timestamp, the actual request that was made, and information about the browser that the request was received from.
- error.log: Shows all errors of level warning or above (i.e., more serious) that the server encountered when trying to process an access request. This includes pages that can’t be found, directories to which access was denied, and so on.
- emerg: only reports emergency conditions that make the Web server unstable.
- alert: logs situations requiring immediate action, and which may identify problems in the host system
- crit: logs critical errors that may indicate security, server, or system problems
- error: reports noncritical errors that indicate missing pages, bad server configuration directives, and general error conditions
- warn: logs messages that warn of noncritical problems or internal conditions that should be investigated
- notice: reports normal but significant conditions that should still be looked into
- info: logs informational messages that may help you identify potential problems or suggest possible reconfigurations
- debug: logs pretty much every state change on the system, such as every file open, every server activity during initialization and operation, and so on
- Ubuntu Linux Bible by William von Hagen ISBN-13: 978-0-470-03899-4
- In addition to reference material, the Apache2 docs include several tutorials and how-to style articles that provide practical, hands-on information.
- Apache Server 2 Bible, 2nd Edition by Mohammed J. Kabir (Wiley, 2002, ISBN: 0-7645-4821-2)
- Hardening Apache by Tony Mobily (Apress, 2004; ISBN: 1590593782).