Harrykar's Techies Blog: September 2015

15 September 2015

inode

When the file system is first established, it comes with a set number of inodes. The inode is a data structure used to store file information. The information that every inode will store consists of

The file type
The file’s permissions
The file’s owner and group
The file’s size
The inode number
A timestamp indicating when the inode was last modified, when the file was created,and when the file was last accessed
A link count (the number of hard links that point at this file)
The location of the file (i.e., the device storing the file) and pointers to the individual file blocks (if this is a regular file)

these pointers break down into

A set number of pointers that point directly to blocks
A set number of pointers that point to indirect blocks; each indirect block contains pointers that point directly to blocks
A set number of pointers that point to doubly indirect blocks, which are blocks that have pointers that point to additional indirect blocks
A set number of pointers that point to triply indirect blocks, which are blocks that have pointers that point to additional doubly indirect blocks

Typically, an inode will contain 15 pointers broken down as follows:

12 direct pointers
1 indirect pointer
1 double indirect pointer
1 triply indirect pointer

An inode is illustrated in Figure (which contains 1 indirect pointer and 1 doubly indirect pointer but no triply indirect pointer because of a lack of space).

Let us take a look at how to access a Linux file through the inode. We will make a few assumptions.
First, our Linux inode will store 12 direct pointers, 1 indirect pointer, 1 doubly indirect pointer, and 1 triply indirect pointer. Blocks of pointers will store 12 pointers no matter whether they are indirect, doubly indirect, or triply indirect blocks. We will assume that our file consists of 500 blocks, numbered 0 to 499, each block storing 8 KB (the typical disk block stores between 1 KB and 8 KB depending on the file system utilized).
Our example file then stores 500 * 8 KB = 4000 KB or 4 MB. Here is the breakdown of how we access the various blocks.

Blocks 0–11: direct pointers from the inode.
Blocks 12–23: pointers from an indirect block pointed to by the inode’s indirect pointer.
For the rest of the file, access is more complicated.

• We follow the inode’s doubly indirect pointer to a doubly indirect block. This block contains 12 pointers to indirect blocks. Each indirect block contains 12 pointers to disk blocks.

−− The doubly indirect block’s first pointer points to an indirect block of 12 pointers, which point to blocks 24–35.
−− The doubly indirect block’s second pointer points to another indirect block of 12 pointers, which point to blocks 36–47.
−− …
−− The doubly indirect block’s last pointer points to an indirect block of 12 pointers, which point to blocks 156–167.

• We follow the inode’s triply indirect pointer to a triply indirect block. This block contains 12 pointers to doubly indirect blocks, each of which contains 12 pointers to indirect blocks, each of which contain 12 pointers to disk blocks. From the triply indirect block, we can reach blocks 168 through 499 (with room to increase the file to block 1895).

Earlier, we noted that the disk drive supports random access. The idea is that to track down a block, block i, we have a mechanism to locate it. This is done through the inode pointers as just described.
The above example is far from accurate. A disk block used to store an indirect, doubly indirect, or triply indirect block of pointers would be 8 KB in size. Such a sized block would store far more than 12 pointers. A pointer is usually 32 or 64 bits long (4 or 8 bytes). If we assume an 8 byte pointer and an 8 KB disk block, then an indirect, doubly indirect, or
triply indirect block would store 8 KB/8 B pointers = 1 K or 1024 pointers rather than 12.
When a file system is created, it comes with a set number of inodes. The actual number depends on the size of the file system and the size of a disk block. Typically, there is 1 inode for every 2–8 KB of file system space.
If we have a 1 TB file system (a common size for a hard disk today), we might have as many as 128 K (approximately 128 thousand) inodes. The remainder of the file system is made up of disk blocks dedicated to file storage and pointers. Unless nearly all of the files in the file system are very small, the number of inodes should be more than sufficient for any file system usage.

Linux Commands to Inspect inodes and Files

There are several tools available to inspect inodes. The following commands provide such information:

stat—provides details on specific file usage, the option –c %i displays the file’s inode number
ls –i displays the inodes of all entries in the directory
df –i this command provides information on the utilization of the file system, partition by partition. The -i option includes details on the number of inodes used.

The stat command itself will respond with the name of the file, the size of the file, the blocks used to store the file, the device storing the file (specified as a device number), the inode of the file, the number of hard links to the file, the file’s permissions, UID, GID in both name and
number, and the last access, modification, and change date and time for the file. The stat command has many options. The most significant are:

-L : follow links to obtain inode information of files stored in other directories (without this, symbolic links in the given directory are ignored)
-f : used to obtain statistics on an entire file system rather than a file
-c FORMAT : output only the requested information where FORMAT uses the characters listed in Table , additionally when used with -f (file system stats) there are other formatting characters available (see second half of Table )

Formatting Characters for -c, Bottom Half for -c -f

We use stat to provide for us the size of each file in blocks and bytes, the file name, the inode number of the file, and the time of last access. This command is given as:

$ stat   -c "%b %s %n %i %x" *
736 373337 firestarter-events.txt~ 1486 2010-04-28 14:37:10.843398916 +0300
472544 241938432 FreeBSD-10.2-RELEASE-amd64-bootonly.iso 285743 2015-09-10 20:01:12.310727882 +0300
119344 61098928 FreeBSD-10.2-RELEASE-amd64-bootonly.iso.xz 4349 2015-09-10 19:53:37.250723649 +0300

we inspect devices from /dev where we look at the file’s type (%F) the device number, UID of the owner, number of hard links, inode
number, file name, and file type.

$ stat   -c "%d %u %h %i %n %F" /dev/*
5 0 1 4695 /dev/adsp character special file
5 0 1 4699 /dev/audio character special file
5 0 2 1901 /dev/block directory
5 0 2 2342 /dev/bsg directory
5 0 3 1801 /dev/bus directory
5 0 1 4001 /dev/cdrom symbolic link
5 0 2 2001 /dev/char directory
5 0 1 1592 /dev/console character special file
5 0 1 3249 /dev/core symbolic link
5 0 1 1834 /dev/cpu_dma_latency character special file
5 0 5 2377 /dev/disk directory
5 0 1 4698 /dev/dsp character special file
5 0 1 4004 /dev/dvd symbolic link
5 0 1 1585 /dev/ecryptfs character special file
5 0 1 2382 /dev/fb0 character special file
5 0 1 3250 /dev/fd symbolic link
5 0 1 1233 /dev/full character special file
5 0 1 1586 /dev/fuse character special file
5 0 1 2392 /dev/hidraw0 character special file
5 0 1 1662 /dev/hpet character special file
5 0 4 1811 /dev/input directory
5 0 1 1236 /dev/kmsg character special file
5 0 1 4784 /dev/log socket
5 0 1 1716 /dev/loop0 block special file
5 0 1 1719 /dev/loop1 block special file
5 0 1 1722 /dev/loop2 block special file
5 0 1 1725 /dev/loop3 block special file
5 0 1 1728 /dev/loop4 block special file
5 0 1 1731 /dev/loop5 block special file
5 0 1 1734 /dev/loop6 block special file
5 0 1 1737 /dev/loop7 block special file
5 0 2 1819 /dev/mapper directory
5 0 1 1575 /dev/mcelog character special file
5 0 1 1229 /dev/mem character special file
5 0 1 4702 /dev/mixer character special file
5 0 2 1744 /dev/net directory
5 0 1 1835 /dev/network_latency character special file
5 0 1 1836 /dev/network_throughput character special file
5 0 1 1230 /dev/null character special file
5 0 1 5256 /dev/nvidia0 character special file
5 0 1 5255 /dev/nvidiactl character special file
5 0 1 1237 /dev/oldmem character special file
5 0 2 1741 /dev/pktcdvd directory
5 0 1 1231 /dev/port character special file
5 0 1 1743 /dev/ppp character special file
5 0 1 1814 /dev/psaux character special file
5 0 1 1661 /dev/ptmx character special file
12 0 2 1 /dev/pts directory
5 0 1 1668 /dev/ram0 block special file
5 0 1 1671 /dev/ram1 block special file
5 0 1 1698 /dev/ram10 block special file
5 0 1 1701 /dev/ram11 block special file
5 0 1 1704 /dev/ram12 block special file
5 0 1 1707 /dev/ram13 block special file
5 0 1 1710 /dev/ram14 block special file
5 0 1 1713 /dev/ram15 block special file
5 0 1 1674 /dev/ram2 block special file
5 0 1 1677 /dev/ram3 block special file
5 0 1 1680 /dev/ram4 block special file
5 0 1 1683 /dev/ram5 block special file
5 0 1 1686 /dev/ram6 block special file
5 0 1 1689 /dev/ram7 block special file
5 0 1 1692 /dev/ram8 block special file
5 0 1 1695 /dev/ram9 block special file
5 0 1 1234 /dev/random character special file
5 0 1 397 /dev/rfkill character special file
5 0 1 4497 /dev/root symbolic link
5 0 1 2177 /dev/rtc symbolic link
5 0 1 1818 /dev/rtc0 character special file
5 0 1 2374 /dev/scd0 symbolic link
5 0 1 2344 /dev/sda block special file
5 0 1 2345 /dev/sda1 block special file
5 0 1 2346 /dev/sda2 block special file
5 0 1 2347 /dev/sda3 block special file
5 0 1 2348 /dev/sda4 block special file
5 0 1 2349 /dev/sda5 block special file
5 0 1 2350 /dev/sda6 block special file
5 0 1 2351 /dev/sda7 block special file
5 0 1 2352 /dev/sda8 block special file
5 0 1 2353 /dev/sda9 block special file
5 0 1 4371 /dev/sequencer character special file
5 0 1 4375 /dev/sequencer2 character special file
5 0 1 2341 /dev/sg0 character special file
5 0 1 2364 /dev/sg1 character special file
16 0 2 3261 /dev/shm directory
5 0 1 1576 /dev/snapshot character special file
5 0 3 4314 /dev/snd directory
5 0 1 3252 /dev/sndstat symbolic link
5 0 1 2361 /dev/sr0 block special file
5 0 1 3253 /dev/stderr symbolic link
5 0 1 3254 /dev/stdin symbolic link
5 0 1 3255 /dev/stdout symbolic link
5 0 1 1591 /dev/tty character special file
5 0 1 1593 /dev/tty0 character special file
5 0 1 1598 /dev/tty1 character special file
5 0 1 1607 /dev/tty10 character special file
5 0 1 1608 /dev/tty11 character special file
5 0 1 1609 /dev/tty12 character special file
5 0 1 1610 /dev/tty13 character special file
5 0 1 1611 /dev/tty14 character special file
5 0 1 1612 /dev/tty15 character special file
5 0 1 1613 /dev/tty16 character special file
5 0 1 1614 /dev/tty17 character special file
5 0 1 1615 /dev/tty18 character special file
5 0 1 1616 /dev/tty19 character special file
5 0 1 1599 /dev/tty2 character special file
5 0 1 1617 /dev/tty20 character special file
5 0 1 1618 /dev/tty21 character special file
5 0 1 1619 /dev/tty22 character special file
5 0 1 1620 /dev/tty23 character special file
5 0 1 1621 /dev/tty24 character special file
5 0 1 1622 /dev/tty25 character special file
5 0 1 1623 /dev/tty26 character special file
5 0 1 1624 /dev/tty27 character special file
5 0 1 1625 /dev/tty28 character special file
5 0 1 1626 /dev/tty29 character special file
5 0 1 1600 /dev/tty3 character special file
5 0 1 1627 /dev/tty30 character special file
5 0 1 1628 /dev/tty31 character special file
5 0 1 1629 /dev/tty32 character special file
5 0 1 1630 /dev/tty33 character special file
5 0 1 1631 /dev/tty34 character special file
5 0 1 1632 /dev/tty35 character special file
5 0 1 1633 /dev/tty36 character special file
5 0 1 1634 /dev/tty37 character special file
5 0 1 1635 /dev/tty38 character special file
5 0 1 1636 /dev/tty39 character special file
5 0 1 1601 /dev/tty4 character special file
5 0 1 1637 /dev/tty40 character special file
5 0 1 1638 /dev/tty41 character special file
5 0 1 1639 /dev/tty42 character special file
5 0 1 1640 /dev/tty43 character special file
5 0 1 1641 /dev/tty44 character special file
5 0 1 1642 /dev/tty45 character special file
5 0 1 1643 /dev/tty46 character special file
5 0 1 1644 /dev/tty47 character special file
5 0 1 1645 /dev/tty48 character special file
5 0 1 1646 /dev/tty49 character special file
5 0 1 1602 /dev/tty5 character special file
5 0 1 1647 /dev/tty50 character special file
5 0 1 1648 /dev/tty51 character special file
5 0 1 1649 /dev/tty52 character special file
5 0 1 1650 /dev/tty53 character special file
5 0 1 1651 /dev/tty54 character special file
5 0 1 1652 /dev/tty55 character special file
5 0 1 1653 /dev/tty56 character special file
5 0 1 1654 /dev/tty57 character special file
5 0 1 1655 /dev/tty58 character special file
5 0 1 1656 /dev/tty59 character special file
5 0 1 1603 /dev/tty6 character special file
5 0 1 1657 /dev/tty60 character special file
5 0 1 1658 /dev/tty61 character special file
5 0 1 1659 /dev/tty62 character special file
5 0 1 1660 /dev/tty63 character special file
5 0 1 1604 /dev/tty7 character special file
5 0 1 1605 /dev/tty8 character special file
5 0 1 1606 /dev/tty9 character special file
5 0 1 1667 /dev/ttyS0 character special file
5 0 1 1664 /dev/ttyS1 character special file
5 0 1 1665 /dev/ttyS2 character special file
5 0 1 1666 /dev/ttyS3 character special file
5 0 1 1235 /dev/urandom character special file
5 0 1 1749 /dev/usbmon0 character special file
5 0 1 1753 /dev/usbmon1 character special file
5 0 1 1808 /dev/usbmon2 character special file
5 0 1 7508 /dev/vboxdrv character special file
5 0 1 7512 /dev/vboxdrvu character special file
5 0 1 7561 /dev/vboxnetctl character special file
5 0 2 7754 /dev/vboxusb directory
5 0 1 1594 /dev/vcs character special file
5 0 1 1596 /dev/vcs1 character special file
5 0 1 2609 /dev/vcs2 character special file
5 0 1 2618 /dev/vcs3 character special file
5 0 1 2627 /dev/vcs4 character special file
5 0 1 2636 /dev/vcs5 character special file
5 0 1 2645 /dev/vcs6 character special file
5 0 1 2662 /dev/vcs7 character special file
5 0 1 1595 /dev/vcsa character special file
5 0 1 1597 /dev/vcsa1 character special file
5 0 1 2610 /dev/vcsa2 character special file
5 0 1 2619 /dev/vcsa3 character special file
5 0 1 2628 /dev/vcsa4 character special file
5 0 1 2637 /dev/vcsa5 character special file
5 0 1 2646 /dev/vcsa6 character special file
5 0 1 2663 /dev/vcsa7 character special file
5 0 1 387 /dev/vga_arbiter character special file
5 0 1 1232 /dev/zero character special file

Some considerations :
All except pts are located on device number 5.
All are owned by user 0 (root);
Most of the items have only one hard link, found in /dev. Both input and pts are directories and have more than one hard link. The inode numbers vary from 1 to 7754.
The file type demonstrates that “files” can make up a wide variety of entities from block or character files (devices) to symbolic links to directories to domain sockets. This last field varies in length from one word (directory) to three words (character special file, block special file).
Each new file is given the next inode available. As your file system is used, you will find newer files have higher inode numbers although deleted files return their inodes.

Whenever any file is used in Linux, it must first be opened. The opening of a file requires a special designator known as the file descriptor. The file descriptor is an integer assigned to the file while it is open. In Linux, three file descriptors are always made available:

0 stdin
1 stdout
2 stderr

Any remaining files that are utilized during Linux command or program execution need to be opened and have a file descriptor given to that file.

When a file is to be opened, the operating system kernel gets involved.

First, it determines if the user has adequate access rights to the file.
If so, it then generates a file descriptor.
It then creates an entry in the system’s file table, a data structure that stores file pointers for every open file. The location of this pointer in the file table is equal to the file descriptor generated. For instance, if the file is given the descriptor 185, then the file’s pointer will be the 185th entry in the file table.
The pointer itself will point to an inode for the given file.

As devices are treated as files, file descriptors will also exist for every device, entities such as the keyboard, terminal windows, the monitor, the network interface(s), the disk drives, as well as the open files.

You can view the file descriptors of a given process by looking at the fd subdirectory of the process’ entry in the /proc directory (e.g., /proc/16531/fd). There will always be entries labeled 0, 1, and 2 for STDIN, STDOUT, and STDERR, respectively. Other devices and files in use will require additional entries. Alternatively, the lsof command will list any open files.

FILES

In the Linux operating system, everything is treated as a file except for the process.
What does this mean? Among other things, Linux file commands can be issued on entities that are not traditional files. The entities treated like files include

directories,
physical devices,
named pipes,
file system links.

Aside from physical devices, there are also some special-purpose programs that are treated like files (for instance, a random number generator).

Files versus Directories

The directory it's a named entity that contains files and sub-directories (or devices, links, etc.). The directory offers the user the ability to organize their files in some reasonable manner, giving the file space a hierarchical structure.

Directories can be created just about anywhere in the file system and can contain just about anything from empty directories to directories that themselves contain directories.

The directory differs from the file in a few significant ways.

we expect directories to be executable. Without that permission, no one (including the owner) can cd into the directory.
the directory does not store content like a file; instead it merely stores other items. That is, whereas the file ultimately is a collection of blocks of data, the directory contains a list of pointers to files.
there are some commands that operate on directories and not files (e.g., cd, pwd, mkdir) and some commands that operate on files but not directories (e.g., wc, diff, less, more). We do find that most Linux file commands will operate on directories themselves, including for instance cp, mv, rm (using the recursive version), and wildcards apply to both files and directories.

Nonfile File Types

Many devices are treated as files in Linux. These devices are listed under the /dev directory. We categorize these devices into two subcategories:

character devices : Character devices are those that input or output streams of characters a char at a time ; like the keyboard, the mouse, a terminal (as in terminal window), and serial devices such as older MODEMs and printers.
block devices : Block devices communicate via blocks of data. The term “block” is traditionally applied to disk drives where the files are broken into fixed-sized blocks. However, here, block is applied to any device that communicates by transmitting chunks of data at a time (as opposed to the previously mentioned character type)

Aside from the quantity of data movement, another differentiating characteristic between character and block devices is how input and output are handled.

For a character device, a program executing a file command must wait until the character is transferred before resuming.
For a block device, blocks are buffered in memory so that the program can continue once the instruction has been issued. Further, as blocks are only portions of entire files, it is typically the case that a file command can request one portion of a file. This is often known as random access. The idea is that we do not have to request block 1 before obtaining block 2. Having to read blocks in order is known as sequential access. But in random access, we can obtain any block desired and it should take no longer to access block j than block i.

Another type of file construct is the domain socket (or local socket) This is not to be confused with a network socket.
The domain socket is used to open communication between two local processes. This permits interprocess communication (IPC) so that the two processes can share data.
We might, for instance, want to use IPC when one process is producing data that another process is to consume. This would be the case when some application software is going to print a file. The application software produces the data to be printed, and the printer’s device driver consumes the data.
The IPC is also used to create a rendezvous between two processes where process B must wait for some event from process A.

There are several distinctions between a network and domain socket.

The network socket is not treated as a file (although the network itself is a device that can interact via file system commands) while the domain socket is.
The network socket is created by the operating system to maintain communication with a remote computer while domain sockets are created by users or running software. Network sockets provide communication lines between computers rather than between processes.

Yet another type of file entity is the named pipe. The named pipe differs from the pipe in that it exists beyond the single usage that occurs when we place a pipe between two Linux commands.
To create a named pipe, you define it through the mkfifo operation. The expression FIFO is short for “first-in-first-out.” FIFO is often used to describe a queue (waiting line) as queues are generally serviced in a first-in, first-out manner. In this case, mkfifo creates a FIFO, or a named pipe. Once the pipe exists, you can assign it to be used between any two processes.

Unlike an ordinary pipe that must be used between two Linux processes in a single command, the named pipe can be used in separate instructions.

Let us examine the usage of a named pipe. First, we define our pipe:

mkfifo a_pipe

This creates a file entity called a_pipe. As with any file or directory, a_pipe has permissions, user and group owner, creation/modification date, and a size (of 0). Now that the pipe exists, we might use the pipe in some operation:

ps aux > a_pipe

Unlike performing ps aux, or even ps aux | more, this instruction does not seem to do anything when executed. In fact, our terminal window seems to hang as there is no output but neither is the cursor returned to us. What we have done is opened one end of the pipe(in writing). But until also the other end of the pipe is open, there is nowhere for the ps aux instruction’s output to “flow.”
To open the other end of the pipe, we might apply an operation (in a different terminal window since we do not have a prompt in the original window) like:

cat a_pipe

Now, the contents “flow” from the ps aux command through the pipe to the cat command. The output appears in the second terminal window and when done, the command line prompt returns in the original window.

You might ask why use a named pipe? In fact, the pipe is being used much like an ordinary pipe. Additionally, the named pipe does roughly the same thing as a domain socket— it is a go between for IPC. There are differences between the named pipe and pipe.

The named pipe remains in existence. We can call upon the named pipe numerous times. Notice here that the source program is immaterial. We can use a_pipe no matter what the source program is.
Additionally, the mkfifo instruction allows us to fine tune the pipe’s performance. Specifically, we can assign permissions to the result of the pipe. This is done using the option –M mode where mode is a set of permissions such as –M 600 or –M u=rwx,g=r,o=r.

The difference between the named pipe and the domain socket is a little more obscure.

The named pipe always transfers one byte (character) at a time. The domain socket is not limited to byte transfers but could conceivably transfer more data at a time.

Links as File Types

The link is a file type. There are two forms of links:

hard links : A hard link is stored in a directory to represent a file. It stores the file’s name and the inode number. When creating a new hard link, it duplicates the original hard link, storing the new link in a different directory.
soft (or symbolic) links (or symlinks) : The symbolic link instead merely creates a pointer to point at the original hard link.

The difference between the two types of links is subtle but important. If you were to create a symbolic link and then attempt to access a file through the symbolic link rather than the original link, you are causing an extra level of indirect access.

The operating system must first access the symbolic link, which is a pointer. The pointer then provides access to the original file link. This file link then provides access to the file’s inode, which then provides access to the file’s disk blocks.

Hard link's drawbacks are:

hard links cannot link files together that exist on separate partitions.
hard links can only link together files whereas symbolic links can link directories and other file system entities together.

On the positive side for hard links, they are always up to date. If you move the original object, all of the hard links are modified at the same time. If you delete or move a file that is linked by a symbolic link, the file’s (hard) link is modified but not the symbolic link; thus you may have an out-of-date symbolic link. This can lead to errors at a later time.

In either case, a link is used so that you can refer to a file that is stored in some other location than the current directory. This can be useful when you do not want to add the file’s location to your PATH variable.

For instance, imagine that user zappaf has created a program called my_program, which is stored in ~zappaf. You want to run the program and are you in your home directory. The symbolic link instead merely creates a pointer to point at the original hard link(and zappaf was nice enough to set its permissions to 755). Rather than adding /home/zappaf to your PATH, or use an absolute pathname you create a symbolic link from your home directory to ~zappaf/my_program. Now you can issue the my_program command from your home directory.

You can determine the number of hard links that exist for a single file when you perform an ls –l. The integer value after the permissions is the number of hard links.

$ ls -l
drwxr-xr-x  5 harrykar harrykar    4096 2011-05-05 23:31 perl5

This number will never be less than 1 because with no hard links, the file will not exist. However, the number could be far larger than 1. Deleting any of the hard links will reduce this number. If the number becomes 0, then the file’s inode is returned to the file system for reuse, and thus access to the file is lost with its disk space available for reuse.

If you have a symbolic link in a directory, you will be able to note this by its type and name when viewing the results of an ls –l command. First, the file type is indicated by an ‘l’ (state for link) and the file name will contain the symbolic link’s name, an arrow (->) and the location of the file being linked.

$ ls -l
lrwxrwxrwx  1 harrykar harrykar      22 2012-04-16 16:17 squeak -> /home/harrykar/.squeak

Unfortunately, unlike the hard link usage, if you were to use ls –l on the original file, you will not see any indication that it is linked to by symbolic links.

Collectively, all of the special types of entities are treated like files in the following ways:

Each item is listed when you do an ls.
Each item can be operated upon by file commands such as mv, cp, rm and we can apply redirection operators on them.
Each item is represented in the directory by means of an inode.

You can determine a file’s type by using ls -l (long listing). The first character of the 10-character permissions is the file’s type. In Linux, the seven types are denoted by the characters in Table .

file type identifiers in ls -l

Every file (no matter the type, e.g., regular file, character type, block type, named pipe) is stored in a directory. The directory maintains the entities stored in it through a list. The listing is a collection of hard and soft links. A hard link of a file stores the file’s name and the inode number dedicated to that file. The symbolic link is a pointer to a hard link stored elsewhere.

As the user modifies the contents of the directory, this list is modified. New files require new hard links pointing to newly allocated inodes. The deletion of a file causes the hard link to be removed and the numeric entry of hard links to a file to be decremented. The inode itself remains allocated to the given file unless the hard link count becomes 0.

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

File Space

STORAGE ACCESS

A collection of storage devices present a file space. This file space exists at two levels:

a logical level defined by partitions, directories, and files, and
a physical level defined by file systems, disk blocks, and pointers.

The users and system administrators primarily view the file space at the logical level. The physical level is one that the operating system handles for us based on our commands (requests).

The devices(disk, usb, optical, solid state ecc drives) that make up the file space collectively provide us with storage access, where we store executable programs and data files.
We generally perform one of two operations on this storage space:

we read from a file (load the file into main memory, or input from file)
and we write to a file (store/save information from main memory to a file, or output to file).

Disk Storage and Blocks

As hard disk storage is the most common implementation of a file system, although some of the concepts apply to other forms of storage as well.

To store a file on disk, the file is decomposed into fixed-sized units called blocks. Figure illustrates a small file (six blocks) and the physical locations of those blocks. Notice that the last block may not fill up the entire disk block space, so it leaves behind a small fragment.

The operating system must be able to manage this distribution of files to blocks in three ways:

given a file and block, the operating system must map that block number into a physical location on some disk surface.
the operating system must be able to direct the disk drive to access that particular block through a movement of both the disk and the drive’s read/write head.
the operating system must be able to maintain free file space (available blocks), including the return of file blocks once a file has been deleted.

All of these operations are hidden from the user and system administrator.

Let us consider how a disk file might be broken into blocks.

The files are distributed across all of the disk’s surfaces (a hard disk drive will contain multiple disk platters and each platter has two surfaces, a top and bottom). Given a new file to store, the file is broken into blocks.
The first block is placed at the first available free block on disk.
Where should the next disk block be placed? If the next block after the first is available, we could place the block there, giving us two blocks of contiguous storage. This may or may not be desirable. The disk drive spins the disks very rapidly. If we want to read two blocks, we read the first block and transfer it into a buffer in the disk drive. Then, that data are transferred to memory. However, during that transfer, the disk continues to spin.
When we are ready to read the second disk block, it is likely that this block has spun past the read/write head and now the disk drive must wait to finish a full disk revolution before reading again. Distributing disk blocks so that they are not contiguous will get around this problem.

In Figure , you will see that the first three disk blocks are located near each other but not in a contiguous block. Instead, the first block lies at location 3018, the second at 3020, and the third at 3022.

Whether initial blocks are contiguous or distributed, we will find that further disk blocks may have to be placed elsewhere because we have reached the end of the available disk blocks in this locality. With the deletion of files and saving of other files, we will eventually find disk blocks of one file scattered around the disk surfaces. This may lead to some inefficiency in access in that we have to move from one location of the disk to another to read consecutive blocks and so the seek time and rotational latency are lengthened.
Back to Figure we might assume that the next available block after 3022 is at 5813 and so as the file continues to grow, its next block lies at 5813 followed by 5815. As the next block, according to the figure, lies at 683, we might surmise that 683 was picked up as free space because a file was deleted.

Block Indexing Using a File Allocation Table

That is, block i’s successor location is stored at location i

How do we locate a particular block of a disk? File systems use an indexing scheme. MS DOS and earlier Windows systems used a file allocation table (FAT).
For every disk block in the file system, the next block’s location is stored in the table under the current block number.
The FAT is loaded from disk into main memory at the time the file system is mounted (e.g., at system initialization time).

In Figure , a partial listing of a FAT is provided. Here, assume a file starts at block 151. Its next block is 153 followed by 156, which is the end of the file (denoted by “EOF”).
To find the file’s third block, the operating system will examine the FAT starting at location 151 to find 153 (the file’s second block) and then look at location 153 to find 156, the file’s third block.
Another file might start at block 154. Its second block is at location 732. The entry “Bad” indicates a bad sector that should not be used.

Other Disk File Details

Aside from determining the indexing strategy and the use/reuse of blocks, the file system must also specify a number of other details. These will include naming schemes for file entries (files, directories, links). Nowadays It is common for names to permit just about any character, including blank spaces; however, older file systems had limitations such as eight-character names and names consisting only of letters and digits (and perhaps a few types of punctuation marks such as the hyphen, underscore, and period). Some file systems do not differentiate between uppercase and lowercase characters while others do. Most file systems permit but do not require file name extensions.

File systems will also maintain information about the entries, often called metadata. This will include the creation, last modification and last access date/time, owner (and group in many cases), and permissions or access control list. The access control list enumerates for each user of the system the permissions granted to that user so that there can be several levels of permissions over the Linux user/group/other approach.

Many different file system types have been developed over the years. Many early mainframes had their own, unique file systems. Today, operating systems tend to share file systems or provide compatibility so that a different file system can still be accessed by many different types of operating systems. Aside from the previously mentioned FAT and NTFS file systems, some of the more common file systems are the extended file system family (ext, ext2, ext3, ext4, derived originally from the Minix OS file system) used in Linux. NFS (the network file system) is also available in Linux. Files-11, which is a descendant of the
file system developed for DEC PDP mainframes, and the Vax VMS operating system (and itself a precursor of NTFS) are also available. While these multitudes of file systems are available, most Linux systems primarily use the ext family as the default file system type.

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

07 September 2015

System Administrator's duties

The Linux operating system provides for two classes of users :

normal users
superusers
software accounts

The term superuser is commonly used in many operating systems although in Linux, we call such a user root. The root user (or users) has access to all system commands and so can access all system resources through those commands.
Normal users have greatly restricted access in that they can execute public programs, access public files and access their own file space.

The role of root is to properly administer the computer system. Operating systems divide accessibility into two or more categories, the ordinary user and the administrator (sometimes called privileged mode). Some operating systems have intermediate categories between the two extremes where a user is given more privileges(e.g. through the sudo command) but not full administrator privileges.
The reason for the division between normal user and administrator modes is to ensure that normal users can not impact other users.
In a work environment, keeping data secure becomes even more important. Different users would have access to different types of data (financial, personnel, management, research), based on their identified role within the organization. Performing administrator duties (creating accounts, scheduling backups, installing software, etc.), if performed by the wrong person, can cause disastrous results if the person does not do things correctly. E.g. imagine that when installing new software, the person unknowingly wipes out the boot sector of the disk. Upon reboot, the computer no longer functions correctly.
And so we have a unique account in all operating systems that is capable of full system access. It is through this account that all (or most) administrative functions will be performed.

What does a system administrator do? The role of the administrator will vary based on the number of users of the computer system(s), the complexity of the computer system(s), the types of software made available, and more significantly, the size of the organization.

A small organization with just a few employees might employ a single system administrator who is also in charge of network administration, computer security, and user training.
In a large organization, there may be several system administrators, several network administrators, a few people specifically in charge of all aspects of security, and another group in charge of training.

The following list is common to many system administrators:

Install the operating system
Update the operating system when needed
Configure the operating system to fit the needs of the users in the organization
Secure the operating system
Configure and maintain network communication
Install, configure, and maintain application software
Create and manage user accounts and ensure the use of strong passwords
Install and troubleshoot hardware connected to computers directly or through a network
Manage the file system including partitioning the disk drives and performing backups
Schedule operations as needed such as backing up file systems, mounting and unmounting file systems, updating the operating system and other application software, examining log files for troubleshooting and suspicious activity
Define (for your organization’s management) computer usage policies and disaster recovery plans
Create documentation and training materials for users
Make recommendations for system upgrades to management

System administrators may not be responsible for all of the above duties. Other forms of administration (e.g., network administration, webserver administration, database administration, DNS administration, and computer security specialist) may take on some of the duties or have overlapping duties with the system administrator(s).
For instance, a network administrator would be in charge of installing, configuring, and securing the network but the system administrator may also be involved by configuring each individual workstation to the network. A webserver administrator would be in charge of configuring, maintaining, and troubleshooting the webserver but the system administrator may be in charge of installing it and setting up a special account for the webserver administrator so that he/she can access some system files.

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

UNIX AND LINUX a short history

Unix is an old operating system, dating back to 1969. Its earliest incarnation, known as MULTICS, was developed for a single platform. It was developed by AT&T Bell Labs. Two of the employees, Dennis Ritchie and Ken Thompson, wanted to revise MULTICS to run as a platform-independent operating system. They called their new system Unics, with its first version being written in the assembly language of the DEC PDP-11 computer so that it was not platform-independent.
They rewrote the operating system in the C programming language (which Ritchie developed in part for Unix) to make it platform independent. This version they named Unix.
Numerous versions of Unix were released between 1972 -- early 1980's including a version that would run on Intel 8086-based computers such as the early IBM PC and PC-compatible computers.
Unix was not a free operating system. In spite of it being implemented as a platform-independent operating system, it was not available for all hardware platforms.
In 1983, Richard Stallman of MIT began the GNU Project, an effort to complete a Unix-like operating system that was both free and open source. GNU stands for GNU’s Not Unix, an indication that GNU would be a Unix-like operating system but separate from Unix. His goal was to have anyone and everyone contribute to the project. He received help from programmers around the world who freely contributed to the GNU operating system, which they wrote from scratch. Although a completed version of GNU was never released, the approach taken was to lead to what we now call the open source community.
Stallman formed the Free Software Foundation (FSF) and defined the GNUs General Public License (GPL).
At around the same time as the initiation of the GNU Project, researchers at the University of California Berkeley developed their own version of Unix, which was given the name BSD (Berkeley Standard Distribution) Unix. This version includes networking code to support TCP/IP so that these Unix computers could easily access the growing Internet.
In time, BSD 4.2 would become one of the most widely distributed versions of Unix.
The result of several competing forms of Unix led to what some have called the “Unix Wars.” The war itself was not restricted to fighting over greater distribution. In 1992, Unix System Laboratories (USL) filed a lawsuit against Berkeley Software Design, Inc and the Regents of the University of California. The lawsuit claimed that BSD Unix was built, at least partially, on source code from AT&T’s Unix, in violation of a software license that UC Berkeley had been given when they acquired the software from AT&T. The case was settled out of court in 1993.
By 1990, the Open Software Foundation (OSF), members of the open source community had developed standardized versions of Unix based on BSD Unix.
Today, there are still many different distributions of Unix available which run on mainframe, minicomputers, and servers.

In 1991, a student from Finland, Linus Torvalds, was dissatisfied with an experimental operating system that was made available through an operating systems textbook of Andrew Tanenbaum. The operating system was called Minix. It was a scaled down Unix-like operating system that was used for educational purposes. Torvalds decided to build his own operating system kernel and provide it as source code for others to play with and build upon(Torvalds has claimed that had the GNU project kernel been available, he would not have written his own). Early on, his intention was just to explore operating systems. Surprisingly, many programmers were intrigued with the beginnings of this operating system, and through the open source community, the operating system grew and grew.
The development of Linux in many ways accomplished what Stallman set out to do with the GNU project. Stallman and many in the FSF refer to Linux as GNU/Linux as they claim that much of Linux was built on top of the GNU project code that had been developed years earlier. According to some surveys, roughly 75% of Linux has been developed by programmers who work for companies that are investing in Linux. The GPL causes many of these programmers to publish their code rather than keeping the code proprietary for the companies they work for. Additionally, 18% of the code is developed strictly by volunteers who are eager to see Linux grow.
Today, Linux stands on its own as a different operating system from Unix. Linux is freely available in source code and the open source community continues to contribute to it. And like Unix, there are many distributions(many many more than Unix distros really).

Unlike Unix, however, Linux’s popularity is far greater because, while Unix can run on personal computers, Linux is geared to run on any platform and is very effective on personal computers.

Although there are dozens of dialects of Unix, there are hundreds of different Linux distributions. Navigating between the available dialects can be challenging. Nearly all of the dialects can be categorized into one of four ancestor paths.

Debian: This branch includes the very popular Ubuntu which itself has spawned dozens of subdialects. Another popular spin-off of Debian is Knoppix.
Red Hat: There are as many or more subdialects of Red Hat as Debian. The most popular subdialect is Fedora. Another popular descendant is Mandrake and another is CentOS. Another distribution that is increasing in popularity is Scientific Linux, produced by Fermi National Accelerator Laboratory and CERN.
SLS/Slackware: This branch was first produced by a German company which led to SuSE Linux. Although there are dozens of spin-offs of SLS/Slackware, it is far less popular than either Debian or Red Hat.
Miscellany: There are dozens of dialects that either led nowhere or have few successors.

Linux and Unix operating systems are partially or completely POSIX conforming. POSIX is the Portable Operating System Interface, a set of standards that operating system developers might attempt to target when they implement their systems. POSIX defines an Application Programming Interface (API) so that programmers know what functions, data structures, and variables they should define or utilize to implement the code they are developing for the operating system.

In the development of Linux, the POSIX API has been used to generate a standard called the Linux Standard Base (LSB).

Anyone implementing a dialect of Linux who wishes to include this standard knows what is expected by reading the LSB. The LSB, among other things, defines the top-level directory structure of Linux and the location of significant Linux files such as libraries, executables, and configuration files, a base set of Linux commands and utilities to be implemented, and implementations for such programs as gcc, the Gnu's C compiler.

Thus, underlying most dialects of Linux, you will find commonalities. In this way, learning one version of Linux is made easier once you have learned any other version of Linux.

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

VIRTUAL MACHINES

A Virtual Machine(VM) is an extension to an older idea known as software emulation. Through emulation, a computer could emulate another type of computer. More specifically, the emulator would translate the instructions of some piece of incompatible software into instructions native to the computer. This would allow a user to run programs compiled for another computer with the right emulator.

The VM is a related idea to the emulator. The VM, as the name implies, creates an illusionary computer in your physical computer. The physical computer is set up to run a specific operating system and specific software. However, through emulation, the VM then can provide the user with a different operating system running different software.

One form of VM that you might be familiar with is the Java Virtual Machine (JVM),which is built into web browsers. Through the JVM, most web browsers can execute Java Applets. The JVM takes each Java Applet instruction, stored in an intermediate form called byte code, decodes the instruction into the machine language of the host computer, and executes it. Thus, the JVM is an interpreter rather than a compiler. The JVM became so successful that other forms of interpreters are now commonly available in a variety of software so that you can run, for instance, Java or Ruby code. Today, just about all web browsers contain a JVM.

Today’s VMs are a combination of :

software : is a program that can perform emulation
data : consist of the operating system, applications software, and data files that the user uses in the virtual environment.

With VM software, you install an operating system. This creates a new VM. You run your VM software and boot to a specific VM from within. This gives you access to a non-native operating system, and any software you wish to install inside of it. Interacting with the VM is like interacting with a computer running that particular operating system. In this way, a Windows-based machine could run the Mac OS X or a Macintosh could run Windows 7.

Commonly, VMs are set up to run some version of Linux. Therefore, as a Linux user, you can access both your physical machine’s operating system (e.g., Windows) and also Linux without having to reboot the computer.

The cost of a VM is as follows:

The VM software itself—although some are free, VM software is typically commercially marketed and can be expensive.
The operating system(s)—if you want to place Windows 7 in a VM, you will have to purchase a Windows 7 installation CD to have a license to use it. Fortunately, mostly versions of Linux are free and easily installed in a VM.
The load on the computer—a VM requires a great deal of computational and memory overhead, however modern multicore processors are more than capable of handling the load.
The size of the VM on hard disk—the image of the VM must be stored on hard disk and the size of the VM will be similar in size to that of the real operating system, so for instance, 8 GBytes is reasonable for a Linux image and 30 GBytes for a Windows 7 image.

You could create a Linux VM, a Windows VM, even a mainframe’s operating system, all accessible from your computer. Your computer could literally be several or dozens of different computers.

Each VM could have its own operating system, its own software, and its own file system space. Or, you could run several VMs where each VM is the same operating system and the same software, but each VM has different data files giving you a means of experimentation.

VM software is now available to run on Windows computers, Mac OS X, and Linux/Unix computers. VM software titles include vSphere Client and Server, VMware Workstation, VMware Player, Virtual Box, CoLinux, Windows Virtual PC, Parallels Desktop, VM from IBM, Virtual Iron, QEMU, and Xen. The latter two titles are open source and many of these titles have free versions available.

With VM software readily available, we can expand on its capabilities by implementing virtualization.
With virtualization, an organization hosts a number of VMs through one or more VM servers. The servers operate in a client–server networking model where a client runs a VM program on the user machine and requests access to one of the stored VMs.
The VM servers typically store the VMs on a storage area network (SAN). The collection of VM servers, the SAN, and the data that make up individual VMs can be called a VM Farm.
Now, accessibility to your VM is one of starting your VM client software and logging into the VM server. You select your VM from a list of choices, log into it, and the VM server then runs (emulates) the VM for you. Your client operates as an input/output device while the processing and storage take place on the VM server. As you make modifications to your VM, they are saved in the SAN.

There are numerous advantages to virtualization.

First is accessibility. You are able to access your VM from any computer that has Internet access and runs the proper VM client software.
Second is cost savings. If your company is small and you cannot afford all of the hardware needed for your employees, you can lease or purchase time and space in a VM farm where the VM servers and SAN become the hardware that you use.
You can also rely on the company hosting your VMs to handle security and backups alleviating some of the IT needs from your own organization.

Today, more and more organizations are taking advantage of virtualization to improve their efficiency and lower costs.

Reasons to Use Virtualization

Resources

Linux with Operating System Concepts by Richard Fox
isbn:9781482235906, goodreads:20792170

Harrykar's Techies Blog

Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

Translate

15 September 2015

inode

Linux Commands to Inspect inodes and Files

FILES

Files versus Directories

Nonfile File Types

Links as File Types

Resources

File Space

STORAGE ACCESS

Disk Storage and Blocks

Block Indexing Using a File Allocation Table

Other Disk File Details

Resources

07 September 2015

System Administrator's duties

Resources

UNIX AND LINUX a short history

Resources

VIRTUAL MACHINES

Resources