Filesystems

Table of Contents

1. Tour the Virtual File System (VFS)
1.1. Items Found in the Filesystem
1.2. Hierarchy
1.3. Assessment
2. Filesystem Permissions
2.1. Basic Unix Permissions
2.2. Special Permissions
2.3. Assessment
3. Archival and Backup
3.1. Using tar
3.2. Investigating Backups
3.3. Assessment
3.4. Rdiff-backup Example

This lab should be reasonably relaxed, assuming you read the lab before you come to class. We will have a stroll around the standard Unix file hierarchy on your virtual machine Client1, and study some of the things we find there. For people familiar with the Unix file system, you may find it rather easy.

We will talk of file-system permissions and how they can be managed, and finally we shall learn about creating archives and backups.

1. Tour the Virtual File System (VFS)

1.1. Items Found in the Filesystem

Unlike some other Operating Systems you may have heard of, the Unix filesystem is a tree structure. It has only one root (/). It doesn’t use any concept of drive letters. Every filesystem you use is mounted (attached, or made available) onto a directory (mount-point) of the filesystem.

There are various types of items you find in the file-system.

Regular files and directories

Nothing unusual about these. In ls -F output (ls is short for list), directories have / appended to them. In ls -l output, the first character on the line is a - for a regular file, or d for a directory.

Hard link

Recall that Unix filesystems are inode based. The inode is a number that points to the data. Directory entries point to an inode, not to the data directly. A hard link, as created using the ln command (short for link), creates another directory entry that points to the same inode.

A hard link can only point to a file on the same filesystem, and are invisible to the naked eye in standard ls -l output. You can tell by using ls -li command to display the inode column, and looking for files with the same inode..

Symbolic links, soft link or symlinks.

These are similar to shortcuts under Windows, or aliases under MacOS or Mac OS X, but are more transparent. When a program opens a symbolic link, it opens the file it points to. Symbolic links can also point to directories. Actually, symbolic links can contain any text. If the target does not exist, the link is said to be a broken link or dangling link.

Symbolic links are created with the ln -s command. In ls -l output the first character on the line is a l, the link target is also printed after the filename.

The only way I can remember in what order the arguments of ln go is to think of the ln command a bit like cp, where the source goes first, and the destination (or new thing) comes last.

Device node

One of the design philosophies of Unix is “Everything is a file”. While this isn’t entirely true in Linux (network interfaces, for example) all other devices are represented by a device node in /dev, which might or might not be a virtual filesystem.

For example, the first serial port on a machine would be presented by /dev/ttyS0 (equivalent to COM1 in DOS). An application that interacts with the serial port, such as minicom, a serial terminal emulator, will open /dev/ttyS0.

This allows access-control to be placed on such devices, in the same way as you would any other file-system object.

Device nodes are defined with a major number and minor number, and whether or not the device is a block special file or a character special file.

ls -l output that starts with a b is a block special file, a c indicates a character special file. A device such as a hard disk is a block device, most others are character devices.

The major number indicates the type of device, such a hard disk. The minor number indicates the particular device, such as a particular partition on the a particular hard disk. It is this major:minor, not the name, that tells the operating system kernel which device the user is opening, and thus which driver to pass the request to.

Socket

Unix systems have a form of network connection known as a Unix domain socket[29], which is parallel to an IP socket, but is entirely local to the machine. Unix domain sockets can do some things that can’t be done over other socket types, such as passing open files and credentials, and because they use the filesystem, additional access control can be applied.

A server process would create a socket file as its socket address, and client processes on the same machine can connect to the server by using the socket file as the destination address (i.e. the server address).

We can know which file is a socket file with the following commands. In ls -F output, a = is printed after a socket file. In ls -l output, a s at the first character means a socket file.

There is no command to create a socket file. It is created automatically when a program binds a socket to the file with the system call bind(2) .

Named pipe or FIFO

Pipes are very useful in Unix. Anonymous pipes in particular form of the most important constructs for the Unix command-line environment. They allow the output of one process to be fed into the input of another, such as when we use the command-line construct such as ls -l | sort -n +4, which sorts the output of ls -l. Unlike sockets, pipes are unidirectional.

However, sometimes you want to be able to create something more flexible which would be impractical to express using the shell’s pipe (|) operator. To do this, you can create a named pipe somewhere in the filesystem, using mkfifo. In ls -l output, p at the start indicates a FIFO.

Named pipes are somewhat deprecated in favour of sockets. However, because they are more useful in shell scripts compared to sockets, they are still occassionally used, and don’t appear to be in any danger of disappearing soon.

1.2. Hierarchy

The file hierarchy in Unix systems, including Linux, can be a confusing beast at times. In this section, I want to give you some familiarity with the purpose of various directories. Afterwards, you can have a brief read of hier(7), which should describe the hierarchy of the Unix filesystem on such a system.

/

The root of the entire filesystem.

/bin

Programs (binary files) that don’t require admistrative rights, and need to be available at boot time. Commands such as ls and mkdir can be found here.

/sbin

Supervisor programs (binary files) that do require administrative rights to make full use of them, and need to be available at boot time. Commands such as mount and network configuration commands such as ifconfig can be found here. Note that normal users can use these commands, but for querying only, they won’t be able to use the command to make changes to the system.

/lib

Libraries that are needed when the system is booting (before /usr is mounted). It also includes kernel modules (device drivers and such).

/usr

This is for static (non-changing) data. Most of the software is installed here. A system should be able to run well if this directory (which is often on its own filesystem or mounted from across the network) is mounted read-only.

Historically, users home directories were stored in here, which may explain the peculiar name.

This contains bin, sbin and lib directories, in addition to the following:

include

Header files (*.h) that are included into programs used for compiling programs against libraries.

local

Has a structure much like /usr, but is for software that the system administrator has compiled and installed. This directory is outside the scope of the package management system.

share

Used for files that are shared amongst different system/processor architectures, such as documentation, pictures etc.

doc

Documentation for all the installed software. This does not include manual pages.

man

This is where the manual pages can be found, although you generally don’t go there yourself, but use man.

src

Used to store software source code such as Linux kernel source code, though there is no requirement that source code by kept there.

/var

This is for variable (changing) data, such as databases, mail queues, logs, lock files and other things.

log

Log data produced by the system and its services.

pid

Process ID files so startup/shutdown scripts (init scripts) know which process to kill. PID files are generally written by daemons (background services) when they start.

mail

Location of e-mail queues and mailboxes.

/etc

This is where configuration files are stored. Historically, its where anything else in the system was stored, which explains the name.

/dev

This is where device special files are stored. It may be a virtual filesystem, meaning the contents might not exist on disk, but are determined by the operating system device drivers and hardware management facilities (eg. USB insertions).

/tmp

A temporary directory that anyone can write to. The contents may be purged occasionally, or on system boot. If you want a more permanent location, use /var/tmp.

/proc, /sys

These are virtual directories that give you information about processes and the system.

/root

On Linux systems, this is the home directory of the root user.

/home

On Linux systems, this is where the home directories for the users can usually be found. It is often network mounted and possibly a symbolic link to elsewhere in the system.

1.3. Assessment

1.3.1.

Read the ls(1) to familiarise yourself with the available options (don’t try to remember them all, except -l (that’s lowercase L, not I), -R, -a and -h).

Write down the ls command you would use to do the following; most of them will require the -l option and generally some others:

  1. List the contents of your home directory recursively. You can use either ~ or $HOME to refer to your home directory from the shell.

    Note

    There will be a lot of output generated. You can pipe it to less, which will allow you to page through the output using the Space key and the Page Up and Page Down keys. Type q to exit less.

    command | less

    less is a pager which supercedes an older program called more.

  2. List all files, including hidden files and directories, in your home directory, non-recursively. Hidden files or directories are those that begin with a dot, and are not usually shown in ls output.

  3. List the contents of /lib using ls -l /lib. What is meant by the foo -> bar notation?

  4. List the files in your home directory, with file sizes shown with human-friendly units (ie. bytes, kilobytes, megabytes, gigabytes).

  5. List the info about a directory, and only that directory, without ls recursing into that directory. In other words, the permissions etc. of the directory itself, not the contents of the directory.

    The “size” of a directory in the ls listing does not tell you how much disk space the contents of the directory consume. Use du -s directory to measure that.



[29] Also known as local sockets