There are a lot of great sound-related applications for Linux, from basic audio drivers and sound servers to sophisticated mixers, editors, and special effects engines. Which is not much consolation for the user who just wants sound to work on her system, and these days everything old is new again–once again, getting sound to work correctly, or at all, is almost as fun as in the olden days. Only these days it’s because of “progress,” not because of immaturity. This is due to PulseAudio becoming the default sound server on an increasing number of Linux distributions, such as Ubuntu Hardy, Fedora 8 and up, and Mandriva 2008.1. A number of other distributions include it as an option, such as openSUSE, Debian, Arch Linux, and Gentoo. It promises superior audio functionality, but brings with it a few woes as well, and has a lot of users asking “How does adding Yet Another Sound Server to Linux help anything?”
The current major Linux sound servers are Enlightened Sound Daemon (ESD or EsounD) for Gnome, analog Real time synthesizer (aRts) for KDE2/3, and Advanced Linux Sound Architecture (ALSA), which works everywhere. Network Audio System (NAS) is a client/server networked sound system for thin clients. For applications that require OSS, ALSA, ESD, and aRts all include an OSS emulator. JACK is a popular professional-level low-latency audio server. One thing these all have in common is they require ALSA to provide the audio hardware drivers.
ESD and aRts both support networked sound, and manage sound streams from multiple sources. ESD is hard-coded into Gnome, but thanks to the PulseAudio developers it should soon be divorced from Gnome, as a proper modular Linux application should be. aRts was designed from the beginning as an independent, portable audio framework. ALSA provides device drivers, multi-device management, basic mixing and recording, and works in any Linux environment, including the console.
ESD and aRts both perform both low-level and higher-level functions. Both interface between sound hardware and applications, and also encode and decode your various file and streaming audio formats. aRts does everything; ESD handles sound server duties, and GStreamer handles the encoding and decoding. Both eventually pass everything down the pipeline to ALSA.
aRts has been officially deprecated by the KDE team for KDE4, and will be replaced by Phonon. Phonon promises a simpler API (application programming interface) by functioning more as a universal interface between existing audio engines such as ALSA, Xine, MPLayer, and VLC. The Phonon developers also have the worthy goal of designing a friendlier mixer interface that doesn’t require knowledge of sound engineering terminology, but uses sensible labels like Notifications, Music, and Communications.
But that’s not all. Some applications, for example MPlayer and Xine, do everything themselves and do not rely on a sound server. While this might not bother end users, it is a nightmare for developers who have to write support for all these different beasts into their applications. In fact, the fractured nature of audio systems and their many diverse APIs in Linux is a chronic problem for developers.
While describing the current state of Linux audio would require a book, this should give you an idea of its complexity, pitfalls, and boobytraps. The good news is we have a lot of great audio applications. The bad news is it’s all rather a messy jumble. But that, perhaps, is changing.
If your Linux audio needs are simple, stick with ALSA. It works on all Linuxes and it works just fine. If your needs are more complex, then you want to look at more complex sound servers.
PulseAudio is intended to be a drop-in replacement for ESD on Gnome. It is designed to be cross-platform, running on POSIX-compliant operating systems (like Linux), and on Win32. Before I discuss PulseAudio further, I must share an amusing true anecdote, which I promise is relevant. A good friend of mine has a number of health problems, so he spends a lot of time seeing doctors and taking a lot of medications. His favorite doctor is a Vietnamese woman with a bent sense of humor. He told her he didn’t like how a certain drug was making him feel. She prescribed an additional medication. He asked why didn’t she give him something to replace the nasty one, and she said “We never replace, we only add.” She wasn’t serious, but there was a grain of truth in it. And so it is with Linux applications and subsystems–it seems we never replace, only add.
However, PulseAudio has the potential to become the common Linux audio server, and actually replace some legacy servers like ESD and aRts. Why would we even want this? For one thing, it has a great advanced feature set:
- Individual volume controls for each playback stream
- Modular, extensible architecture
- Multiple backends for compatibility with other audio servers
- A consistent and common API
- Auto-discovery of other Pulse-enabled computers on a network
- Network sound server
- Mix-and-match multiple sound devices and playback streams
- USB hotplug support
- Both GUI and command-line controls
Ubuntu Hardy users in particular are experiencing a bumpy transition to PulseAudio. It is the default sound server, but Hardy’s implementation is incomplete, and there is no mention of it in the release notes. So users experiencing difficulties waste time looking for the wrong thing. Flash Player and Skype, among other popular proprietary applications, don’t work with Pulse. (Isn’t it funny how those big companies with paid developers can’t keep up with FOSS devs.) However, all of these difficulties are being ironed out (except Ubuntu’s chronically incomplete release notes), and Ubuntu Hardy users can find a lot of help on this PulseAudio Wiki page.
Next week we’ll install PulseAudio on some random Linux and learn some useful and cool things to do with it, and some tips and tricks for getting past some of the bumpy parts.
This article was first published on LinuxPlanet.com.