Linux desktop projects often overlook formal usability testing. Attempts to introduce it are generally short-lived. After a few experiments, most developers fall back on a series of informal alternatives.
Yet as Linux becomes increasingly popular, the need for usability becomes more pressing. Implemented properly, it might have prevented or mitigated some of the upheavals of the last few years.
Why Is Usability Testing Neglected?
The relative neglect of usability testing is hardly surprising. Of all the parts of the development process, it is probably the one most often skipped or short-changed throughout IT, especially in smaller companies.
Compared to code development, usability testing is considered unglamorous everywhere in the software industry. (One indication of this is the existence of more female testers than coders—even today, women enter low-prestige IT jobs more easily than elite ones).
Moreover, until the last few years, the distinction between developers and users was blurred in open source. For many years, developers were users, so that Eric Raymond could legitimately describe the start of a project as “a developer scratching his own itch.”
Add in what Raymond describes as a “release early, release often” philosophy, and the lack of usability barely mattered. If a problem emerged, it could easily be fixed in the next release. Under these circumstances, Raymond’s other admonitions to listen to users or customers have always been given less priority than his more colorful statements.
Another problem is that project members are often scattered around the world, meeting at most only a few times a year. As a result, arranging any testing that involves directly observing users is next to impossible on a large scale. Individual developers, no matter how much they support usability testing, are simply unlikely to have the resources to doing testing with more than a few people.
For all these reasons, usability testing has seldom been a priority in free software. Whenever one of the increasing numbers of non-developers using an application does register a complaint, the replies still frequently include a sarcastic invitation to contribute the necessary code themselves.
In other words, the users of the free desktop have grown far beyond the original practices, but communication between developers and users has not grown with it. The result is a gap that systematic usability testing could bridge directly.
Attempts at Testing
Now and then, projects do recognize the need for usability testing. For example, in 2006, Fedora announced plans for a usability sub-project. As I recall, there was even some ambitious talk about remote testing in which volunteers would be given testing scripts and log on to a testbed server.
But the project was never enthusiastically endorsed. Today, its home page was last modified in 2010. Discussion on its desktop mailing list is sparse and focuses on specific features more than major usability issues.
Similarly, the KDE Usability Project posted usability testing resources complete with templates and samples. A few KDE applications even posted profiles to help with testing. Yet, although the material on the page looks professional enough, as I write, it was last modified eight months ago. If anyone has actually used the resources on the page, they have yet to publish their findings.
While some exceptions may exist, large-scale usability testing is so rare that the last one connected to a major project was that done by Calum Benson of Sun Microsystems conducted on GNOME in 2001. This detailed study made headlines in its day, and its results were given permanent if abstracted form in the GNOME Human Interest Guidelines. But, so far as I’m aware, nothing comparable has been done since in any project.
Instead, so far as usability testing has been attempted at all, it has generally been on a small scale, with a variety of stopgap measures.
Alternatives to Formal Testing
Much of the time, changes to interfaces are due to an individual’s interest—in other words, a continuation of the “scratch your own itch” approach. For example, although recent work on Plasma Active, KDE’s tablet interface, appears to have been a collaboration between developers and interface designers, many changes in KDE are due to individual initiative, such as Aaron Seigo’s modifications of KRunner.
Such “eating of your own dog food” can sometimes be useful, especially when working on individual applications. The difficulty is that senior developers are not typical users today because they have a familiarity with the software and a sophistication that casual users lack.
That may have been a main source of the poor reception for GNOME 3. According to Vincent Untz’s blog, the basic design elements of GNOME 3 were determined by leading developers at the User Experience Hackfest in 2008.
Untz’s description does sound like those involved made an honest effort to imagine new users’ experience, criticizing several aspects of the design as “too hard.” All the same, I suspect that GNOME 3’s overall success would have been radically different if they had studied other users’ reactions instead of merely voicing their own. As things are, the best parts of GNOME 3 tend to be specific features, such as the notifications, rather than the overall design.
Similar shortcomings occur when projects rely on bug-reports or documentation. Almost by definition, anyone who is comfortable enough to file a bug or contribute to documentation is not a typical user. Although perhaps they are closer to average users than developers, bug-reporters and documentation writers are most reliable when they can project themselves into the experience of less-skilled users. Otherwise, their contribution can be inconsistent, particularly in projects where non-developers are held in low regard and ignored.
Lacking formal usability testing by projects, many developers resort to their own small scale studies. For instance, Allan Day, a GNOME designer, talking about GNOME 3, tells me that “Personally speaking, I have done two rounds of usability testing. The first was conducted around the time of the GNOME 3.0 release, and was done with friends and family. More recently, I conducted a small study on the lock screen last development cycle.” He adds that “I’m not the only one who has done this type of self-initiated testing.” Almost certainly, the majority of free software projects include people who could make a similar report.
What Day calls “ad hoc usability testing” is considerably better than nothing. However, much of the time, it is done on a developer’s own time, rather than as something they are expected to do. It is, perhaps, a reflection of their frustration at the lack of information with which they have to work.
The problem with such small-scale testing means that is its uncertain validity. The small pool of testers can be a problem, especially if you want the opinion of an inexperienced user. If asked too often, friends and family may soon cease to be new users, and their feedback can become less valid or useful over time.
Moreover, to make matters worse, the results of such testing often receive limited circulation. Unlike large-scale usability testing, they may receive only limited attention because only a few developers ever learn about them. Instead of promoting a climate in which usability testing is the norm, their long-term result is to make the little that is done invisible, and to encourage similar informal solutions to continue.
This situation is no one’s fault. It is simply the way that free software development has evolved. The result is that, while most contributors to free software projects would agree in theory that usability testing should be implemented, in practice a lack of resources and examples discourages much action. Throughout free software, any usability testing that occurs remains consistently small scale.
Studies like Aakanksha Gaur‘s current project within GNOME for the Outreach Program for Women, are decidedly the exception. With Day as her mentor, Gaur is attempting to answer basic questions that most developers never ask—questions like, “What are the usability issues encountered by new and existing users of GNOME 3?” and “How is GNOME 3 perceived by new and existing users?”
The increasing number of free software users make the need for answers to such large scale questions more urgent than ever.
Looking for the Magic Bullet
Whether usability testing could have prevented the decisions that created the user revolts in KDE, GNOME and Unity is open to debate. Although usability testing is useful, it can only be as good as the questions it sets out to answer. If, for instance, testers never set out to discover whether users prefer an old or a new design, then they are unlikely to ever answer that question.
Just as importantly, the later that usability is done in the development process, the more likely developers to will be to resist major change. At that point they have too much invested to consider scrapping the code or even making major revisions.
Other factors may also intervene. For example, KDE 4.0 was framed as a developer’s release. It was never intended for general use, but distributions, eager to have the latest releases, failed to make its status clear.
All the same, begun early enough and repeated throughout development, usability testing could at least help developers to make better decisions—especially if the testing is done by people who did not write the original code and who have some authority to return the code for improvement before it is released.
Today, usability testing is only occasionally integrated into free software projects. Yet, as more casual users are attracted to free software, the need to understand their habits is growing—and the current methods of feedback contribute imperfectly to that need. Properly carried out, usability testing could reconnect developers and users in ways that are badly needed. However, whether it will ever take hold in free software development is another matter altogether.