Harrykar's Techies Blog: FILES

15 September 2015

FILES

In the Linux operating system, everything is treated as a file except for the process.
What does this mean? Among other things, Linux file commands can be issued on entities that are not traditional files. The entities treated like files include

directories,
physical devices,
named pipes,
file system links.

Aside from physical devices, there are also some special-purpose programs that are treated like files (for instance, a random number generator).

Files versus Directories

The directory it's a named entity that contains files and sub-directories (or devices, links, etc.). The directory offers the user the ability to organize their files in some reasonable manner, giving the file space a hierarchical structure.

Directories can be created just about anywhere in the file system and can contain just about anything from empty directories to directories that themselves contain directories.

The directory differs from the file in a few significant ways.

we expect directories to be executable. Without that permission, no one (including the owner) can cd into the directory.
the directory does not store content like a file; instead it merely stores other items. That is, whereas the file ultimately is a collection of blocks of data, the directory contains a list of pointers to files.
there are some commands that operate on directories and not files (e.g., cd, pwd, mkdir) and some commands that operate on files but not directories (e.g., wc, diff, less, more). We do find that most Linux file commands will operate on directories themselves, including for instance cp, mv, rm (using the recursive version), and wildcards apply to both files and directories.

Nonfile File Types

Many devices are treated as files in Linux. These devices are listed under the /dev directory. We categorize these devices into two subcategories:

character devices : Character devices are those that input or output streams of characters a char at a time ; like the keyboard, the mouse, a terminal (as in terminal window), and serial devices such as older MODEMs and printers.
block devices : Block devices communicate via blocks of data. The term “block” is traditionally applied to disk drives where the files are broken into fixed-sized blocks. However, here, block is applied to any device that communicates by transmitting chunks of data at a time (as opposed to the previously mentioned character type)

Aside from the quantity of data movement, another differentiating characteristic between character and block devices is how input and output are handled.

For a character device, a program executing a file command must wait until the character is transferred before resuming.
For a block device, blocks are buffered in memory so that the program can continue once the instruction has been issued. Further, as blocks are only portions of entire files, it is typically the case that a file command can request one portion of a file. This is often known as random access. The idea is that we do not have to request block 1 before obtaining block 2. Having to read blocks in order is known as sequential access. But in random access, we can obtain any block desired and it should take no longer to access block j than block i.

Another type of file construct is the domain socket (or local socket) This is not to be confused with a network socket.
The domain socket is used to open communication between two local processes. This permits interprocess communication (IPC) so that the two processes can share data.
We might, for instance, want to use IPC when one process is producing data that another process is to consume. This would be the case when some application software is going to print a file. The application software produces the data to be printed, and the printer’s device driver consumes the data.
The IPC is also used to create a rendezvous between two processes where process B must wait for some event from process A.

There are several distinctions between a network and domain socket.

The network socket is not treated as a file (although the network itself is a device that can interact via file system commands) while the domain socket is.
The network socket is created by the operating system to maintain communication with a remote computer while domain sockets are created by users or running software. Network sockets provide communication lines between computers rather than between processes.

Yet another type of file entity is the named pipe. The named pipe differs from the pipe in that it exists beyond the single usage that occurs when we place a pipe between two Linux commands.
To create a named pipe, you define it through the mkfifo operation. The expression FIFO is short for “first-in-first-out.” FIFO is often used to describe a queue (waiting line) as queues are generally serviced in a first-in, first-out manner. In this case, mkfifo creates a FIFO, or a named pipe. Once the pipe exists, you can assign it to be used between any two processes.

Unlike an ordinary pipe that must be used between two Linux processes in a single command, the named pipe can be used in separate instructions.

Let us examine the usage of a named pipe. First, we define our pipe:

mkfifo a_pipe

This creates a file entity called a_pipe. As with any file or directory, a_pipe has permissions, user and group owner, creation/modification date, and a size (of 0). Now that the pipe exists, we might use the pipe in some operation:

ps aux > a_pipe

Unlike performing ps aux, or even ps aux | more, this instruction does not seem to do anything when executed. In fact, our terminal window seems to hang as there is no output but neither is the cursor returned to us. What we have done is opened one end of the pipe(in writing). But until also the other end of the pipe is open, there is nowhere for the ps aux instruction’s output to “flow.”
To open the other end of the pipe, we might apply an operation (in a different terminal window since we do not have a prompt in the original window) like:

cat a_pipe

Now, the contents “flow” from the ps aux command through the pipe to the cat command. The output appears in the second terminal window and when done, the command line prompt returns in the original window.

You might ask why use a named pipe? In fact, the pipe is being used much like an ordinary pipe. Additionally, the named pipe does roughly the same thing as a domain socket— it is a go between for IPC. There are differences between the named pipe and pipe.

The named pipe remains in existence. We can call upon the named pipe numerous times. Notice here that the source program is immaterial. We can use a_pipe no matter what the source program is.
Additionally, the mkfifo instruction allows us to fine tune the pipe’s performance. Specifically, we can assign permissions to the result of the pipe. This is done using the option –M mode where mode is a set of permissions such as –M 600 or –M u=rwx,g=r,o=r.

The difference between the named pipe and the domain socket is a little more obscure.

The named pipe always transfers one byte (character) at a time. The domain socket is not limited to byte transfers but could conceivably transfer more data at a time.

Links as File Types

The link is a file type. There are two forms of links:

hard links : A hard link is stored in a directory to represent a file. It stores the file’s name and the inode number. When creating a new hard link, it duplicates the original hard link, storing the new link in a different directory.
soft (or symbolic) links (or symlinks) : The symbolic link instead merely creates a pointer to point at the original hard link.

The difference between the two types of links is subtle but important. If you were to create a symbolic link and then attempt to access a file through the symbolic link rather than the original link, you are causing an extra level of indirect access.

The operating system must first access the symbolic link, which is a pointer. The pointer then provides access to the original file link. This file link then provides access to the file’s inode, which then provides access to the file’s disk blocks.

Hard link's drawbacks are:

hard links cannot link files together that exist on separate partitions.
hard links can only link together files whereas symbolic links can link directories and other file system entities together.

On the positive side for hard links, they are always up to date. If you move the original object, all of the hard links are modified at the same time. If you delete or move a file that is linked by a symbolic link, the file’s (hard) link is modified but not the symbolic link; thus you may have an out-of-date symbolic link. This can lead to errors at a later time.

In either case, a link is used so that you can refer to a file that is stored in some other location than the current directory. This can be useful when you do not want to add the file’s location to your PATH variable.

For instance, imagine that user zappaf has created a program called my_program, which is stored in ~zappaf. You want to run the program and are you in your home directory. The symbolic link instead merely creates a pointer to point at the original hard link(and zappaf was nice enough to set its permissions to 755). Rather than adding /home/zappaf to your PATH, or use an absolute pathname you create a symbolic link from your home directory to ~zappaf/my_program. Now you can issue the my_program command from your home directory.

You can determine the number of hard links that exist for a single file when you perform an ls –l. The integer value after the permissions is the number of hard links.

$ ls -l
drwxr-xr-x  5 harrykar harrykar    4096 2011-05-05 23:31 perl5

This number will never be less than 1 because with no hard links, the file will not exist. However, the number could be far larger than 1. Deleting any of the hard links will reduce this number. If the number becomes 0, then the file’s inode is returned to the file system for reuse, and thus access to the file is lost with its disk space available for reuse.

If you have a symbolic link in a directory, you will be able to note this by its type and name when viewing the results of an ls –l command. First, the file type is indicated by an ‘l’ (state for link) and the file name will contain the symbolic link’s name, an arrow (->) and the location of the file being linked.

$ ls -l
lrwxrwxrwx  1 harrykar harrykar      22 2012-04-16 16:17 squeak -> /home/harrykar/.squeak

Unfortunately, unlike the hard link usage, if you were to use ls –l on the original file, you will not see any indication that it is linked to by symbolic links.

Collectively, all of the special types of entities are treated like files in the following ways:

Each item is listed when you do an ls.
Each item can be operated upon by file commands such as mv, cp, rm and we can apply redirection operators on them.
Each item is represented in the directory by means of an inode.

You can determine a file’s type by using ls -l (long listing). The first character of the 10-character permissions is the file’s type. In Linux, the seven types are denoted by the characters in Table .

file type identifiers in ls -l

Every file (no matter the type, e.g., regular file, character type, block type, named pipe) is stored in a directory. The directory maintains the entities stored in it through a list. The listing is a collection of hard and soft links. A hard link of a file stores the file’s name and the inode number dedicated to that file. The symbolic link is a pointer to a hard link stored elsewhere.

As the user modifies the contents of the directory, this list is modified. New files require new hard links pointing to newly allocated inodes. The deletion of a file causes the hard link to be removed and the numeric entry of hard links to a file to be decremented. The inode itself remains allocated to the given file unless the hard link count becomes 0.

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

Harrykar's Techies Blog

Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

Translate