12. Appendix: Deploying Many Machines

This is optional material that is made available for those students who find nothing particularly new in installing Ubuntu. There is no work that needs to be done, but provides some valuable further insight into some real-world related tasks.

If you install multiple, or many, machines by hand, you are wasting a lot of valuable time, plus you’re much more likely to make a mistake. Thus, we need a way of being able to get a working system onto many machines in a short amount of time. This is what operating system deployment is all about.

12.1. Deploying Using Disk Images

Perhaps the most common paradigm for deploying workstations is that of the disk image. This is probably due to the predominance of Windows in the IT space, and the experience IT staff have with this paradigm. Norton Ghost is a very well-known piece of software for the creation and deployment of disk images, which has been around for many years and supports nice features such as multicast image distribution and also now supports Linux filesystems. A selection of other disk imaging products can be found at Top Ten Reviews, and The Free Country has good coverage of free tools.

As imaging goes, there are basically two common methods: sector-based and file-based images. Sector-based images are made by copying each sector on disk, and so don’t care about what file-system is used on top of them. File-based imagers need to understand the particular filesystems involved. File-based imagers create smaller images and can offer richer features (such as knowing what files not to copy) but can be more complex and even more timeconsuming.[38]

Imagers work somewhat as follows: first the machine to be imaged is prepared, typically removing any temporary files, defragmenting the filesystem and zeroing out any free-space; file-based imagers can often do this for you. This is all done to reduce the size of the resulting, typically compressed, image.

On Linux and similar machines, which use filesystems that automatically defragment as needed, both defragmentation and zeroing of free-space can be done using the following commands as root: dd if=/dev/zero of=/largefile; rm /largefile[39] The command works by writing a file containing zeros to the disk, the file will be written until all the space is consumed. Near-filling of the filesystem is one of the events that causes a defragmentation event on filesystems such as ext3. As a side effect, the parts of the filesystem that were free-space are now filled with long strings of zeros (NUL bytes) which compress very nicely. Deleting the file doesn’t cause these blocks to be overwritten. As another side-effect, any old data blocks will have been overwritten, but you should not rely on this for security.

Apple have made this easy in Mac OS X. In the Disk Utility program, you can use the Erase Free Space… Unlike the basic naïve dd approach, you can securely remove old data this way.

Most imaging programs understand the Windows file-systems sufficiently well enough to ignore those data-blocks that are not currently allocated. Windows machines will need to have a tool called SysPrep run on them to remove their identity, otherwise after deploying the image, multiple instances of the same computer (same identity) will show up on the network, which causes problems. This identity is called the Security ID or SID.

The imaging program is not run from the system you are imaging, rather you boot from another system, either from a bootable CD or floppy-disk, or you can remove the hard-disk to be imaged, and plug it into a different machine as the non-boot device. Apple have made this very easy, as Apple Macintosh machines have a feature called Target-Disk Mode[40]. You boot the machine to be imaged, holding down T. Now the storage devices on the machine can be accessed as a Firewire storage device, and can be imaged from a neighbouring Mac. A very nice feature, especially for laptops.

When you create the image file, you generally store it on a network, on a machine with plenty of hard disk space, and is compressed as it is created. This is particularly true for Windows deployments, because with Windows, you generally need one image for each hardware configuration.

Deployment can be done individually, typically booting the machine to be imaged either over a network, CD or floppy, or mounted on another machine. The process is then run in reverse, writing the disk image contents onto a selected hard disk.

However, because you often want to deploy many machines at once, it can be much easier to boot each into the imaging software, such as Norton Ghost, and then multicast the image from the imaging server (generally on the same subnet as the clients) to all the clients at once, which reduces network traffic. This assumes that your network can handle multicast traffic; many cannot[41].

One advantage of disk images, that Apple takes advantage of, is that you can use them for network booting. Apple calls this NetBoot, as opposed to NetInstall, which is for installing (er, deploying) an image over the network to the hard disk. With NetBoot, clients boot from the network, and don’t require a hard disk, but rather run from the read-only image on the network. This is useful when you have a lab of Macs that need to run a particular environment for a short period of time, or when you want to reset the machine to a known state very easily. It does require a fast network though. Each client will typically have some writable storage on the server (called a “shadow file”) so the entire computer still feels writable, and users will have home directories on the network.

Disk images are most commonly used for deploying a minimal base-system. Other technologies are often used in tandem for deploying applications and managing configuration; one particularly well-known, if not well-loved by users, example of this is Novell’s ZEN (Zero Effort Networking) product.

12.2. Scripted Installations

Installation can be a tedious process, and deployment can be impractical when you have heterogeneous devices. Therefore, what would be really nice is the ability to perform an unattended or scripted installation. This can commonly be performed either by booting from the network, or providing some install-time parameter that points to a set of answers for the questions asked during installation. This file is commonly retrieved via HTTP, but may also be made available on a floppy etc. If booting from the network, this information may be configured based on the particular machine.

In the Linux space, two tools are commonly used: Kickstart for RPM-based systems, such as Fedora; while Debian-based systems use pre-seed files, which can answer questions not only at operating-system install-time, but also when individual packages are installed later. For information on using pre-seed files, see Unattended Ubuntu Deployment over Network; for Kickstart, see the Kickstart Installations part of the Redhat Linux manual. Kickstart also has a nifty little program to make the process of creating a kickstart file much easier, and a suitable Kickstart file is made when you install the OS manually, which provides a useful starting point.

Windows also supports scripted installations. Mac OS X notably does not; Apple prefers people to use imaging technologies for a base install. Given Apple’s fairly uniform hardware offering and simple installation process, there is less need for scripted installations, so long as the operating system you are installing is at least as modern as the most modern machine you wish to deploy onto. Further client administration is expected to be done using Apple’s OpenDirectory and Remote Desktop products.

12.3. The “Golden Client” Methodology

One way to deploy, and maintain deployed machines, is to use a golden-client methodology, as made popular by software such as SystemImager and Radmind.

The basic concept is thus: you install a system (your Golden Client) and get it to the state you want to deploy to. You run some software which uploads all the files (excluding certain files which should be unique to the particular machine) to a server. You then run an initial install-client on the machines you want to deploy onto, which downloads the files from the server. Some scripting may be used to provide for tasks such as partitioning the hard disks. A concept of “classes” may be used to provide different files to different machines, so for example you might have a class for machines with ATI graphics cards and another class for NVidea, or you might have a class for particular machines which need some software with a limited number of licences.

When you want to make an update, you make the update to your Golden Client, run the agent on the Golden Client to synchronise the changes up to the server. You then cause all the other clients (perhaps by initiating a command over the network, or by some scheduled task) which runs a differencing engine to make the clients' files the same as the client files installed on the server.

This has the advantage of making updates relatively easily, especially when you have a lot of software being installed from different formats. However, there can be a fair bit of work involved in ensuring you don’t upload something to the server that should be local to the machine. The granularity is typically limited to a file which means modifications to isolated parts of a file (such as adjusting a configuration file) become rather more difficult. Configuration engines such as cfengine and puppet are much more capable in this regard.

12.4. Dealing with Heterogeity

Deployment can be much easier when the hardware platform you are deploying to is consistent. There is some room for flex, depending on the operating system in question. For example, Windows can be quite perplexed if it finds itself running with different hardware than the last time it booted, so Microsoft have a tool called SysPrep that removes some of the more unique aspects of the underlying registry, ready for deployment.

Linux systems tend to be rather more forgiving about suddenly finding itself on different hardware. So long as the system is not configured to particular hardware, and there is some hardware detection performed at bootup, then the process can be quite smooth. A lot of problems can be worked around by editing startup-scripts which check to see what devices are used in the system (lspci) and, for example, re-pointing a symbolic link to use the most suitable X.org configuration file, depending on what video card is installed. Thankfully, modern Linux systems have done a lot of work to make this less and less neccessary.

Mac OS X, with its narrow hardware divergence, has no apparent problems with this either.

When deploying, you want to avoid as much local configuration as possible. For example, avoid unneccesary local user accounts and instead using network-based accounts and home directories. Linux systems have less ability to store their configuration in directory servers, which we we shall cover later in this paper. However work is being done to make Linux more Enterprise friendly, especially by Novell (which aquired SuSE Linux, another major player in the European Linux space). Microsoft has done a good job on making many information elements configurable via a directory server. Apple has also done a lot of good work, and is maturing nicely. Linux has a harder time because of its plain-text configuration file heritage of Unix.

12.5. Maintenance

Inevitably, you will need to maintain these machines, and this will include some degree of maintenance to a) possibly numerous disk images, b) installation scripts, or c) a golden client.

Disk images are easily the most amount of work to update, especially if you have to update multiple images, because you need to restore the image, update the system, recreate the image, and repeat for all images.

Installation scripts probably won’t need a lot of work, often just adding or removing a directive to install a particular package, although this might be managed in some other way, such as cfengine.

A Golden Client will need updated, synced to the server, and then synced to all the clients. This is typically fairly fast, and I have found it to be an effective way maintaining a lab of similar machines, especially where the files have been customised post-installation.



[38] So dd is a very naïve sector—based imager whereas dump is a file-based imager—we shall see them in a later lab about filesystems.

[39] This assumes that there is only one filesystem on the disk. If there are multiple, the command should be run replacing /largefile with /mountpoint/largefile for each local filesystem.

[40] An excellent example of how the legacy BIOS impedes PC innovatation, and a sign of things to come with EFI.

[41] You need switches that employ IGMP Snooping, otherwise the traffic turns into broadcast traffic, which floods the network and can bring a local network to its knees.