LinuxPlanet: How would you judge the quality of a system's setup?
Surprisingly, I would use the same attributes as those I employ for describing the quality of a software artifact: functionality, reliability, usability, efficiency, maintainability, and portability. I would ask questions like the following. What services does the system offer to its users? Has the system been setup in a way that it can run uninterrupted for months? How easy is it to manage its services, or to restore a file from a backup? Is there any waste in the utilization of the CPUs, the main memory, or the disk storage? How difficult would it be to upgrade the operating system or the installed applications? How difficult would it be to move the services to a different platform?
LP: So what can system administrators learn from programmers?
System administration is sadly a profession that doesn't get the type of attention given to software development. This is unfortunate, because increasingly the reliability of an IT system depends as much on the software comprising the system as on the support infrastructure hosting it. Nowadays, especially when dealing with open source software, you don't simply install an application; you often install with it a database server, an application server, a desktop environment, and libraries providing functionality like XML parsing and graphics drawing. Furthermore, for the application to work on the Internet you need network connectivity, routing, a working DNS server, and a mail transfer agent. And for reliable operation underneath you often deploy redundant servers, RAID disks, and ECC memory.
A well-setup system, say a Linux installation, has many quality attributes that are the same as those of a well-written program. Both system administrators and programmers can use similar techniques for implementing quality systems.
LP: Give me an example. I have a system with serious time performance problems. Where do I start?
I suggest that the first step you should undertake would be to characterize the system's load. Using the top command available for Linux systems you will first see how your system spends its time. Near the screen's top you will see a line like the following.
CPU states: 80.7% user, 17.1% system, 0.0% nice, 2.2% idle
You can typically distinguish three separate cases.
LP: It looks like you can often trade space for time. Is this so?
Yes, I wrote that space and time form a ying/yang relationship, and there are many cases where you can escape a tight corner by trading one of the two to get more of the other. Consider a sluggish database system. You can often improve its performance by adding appropriate indexes, and even precomputing the results of some queries and storing them in new tables. Both the indexes and the new tables take additional space, but this space gives you increased performance. Now consider the opposite case where your data overflows your available storage. (Star Trek's "Space the final frontier" opening line was true in more than one sense.) If you have sufficient CPU time at your disposal you can devote that spare time to compress and decompress your data. Your MP3 player and digital camera have succeeded as products by adopting exactly this design tradeoff.