Digital audio file formats demystified.
By Douglas Whates 15th August 2017
A selection of loudspeakers currently in our listening room.
Image © The Music Room
That’s a wrap.
Digital audio can be recorded and subsequently encoded in a variety of formats and qualities. When you're downloading a file from a download site or streaming from an online subscription service it can be hard to know which file format to choose from the bewildering array on offer.
99.999% of all digital audio is originally recorded in one of two formats: PCM or DSD. Those original files are then “wrapped” for delivery in a few different ways.
For PCM, the most common “wrapper” you will see for an uncompressed digital audio file is WAV (or “wave”) or AIFF. When downloading the files, or when viewing them on your computer or streamer, you will often be provided with information about the sample rate (expressed in kilohertz, e.g. 48kHz) and bitrate (expressed in bits, e.g. 24bit).
Without getting too technical and at the risk of over-simplifying things, when implemented well, anything at or above 16bit/44.1kHz can be considered high enough quality for playback on a good quality, well resolving hi-fi. A CD, for example, contains 16bit/44.1kHz PCM files.
For DSD, the most common “wrapper” is DSF or DFF—both essentially identical except the former can embed metadata about the file (DSF is generally preferred for that reason). Again you will often be provided with information about the sample rate of the file, such as DSD64 or 64fs. The “fs” stands for sampling frequency and the “64” is as compared to the standard sampling rate for CD (44.1kHz). So if you see DSD128 or 128fs it means the sampling rate is 128 x 44.1kHz, i.e. 5,644.8kHz or 5.6448MHz (megahertz). DSD64 was the original spec for DSD as delivered on SA-CD and so you will often see it described as “single rate”, with DSD128 described as “double rate”, and DSD256 as “quad rate”.
At this point an innocent bystander would be forgiven for thinking that DSD at 64-times the sampling rate of PCM at 44.1Khz (i.e. CD sample rate) must by default be vastly superior to PCM. In reality, they are quite incomparable. PCM and DSD encode digital audio differently (DSD is 1-bit, for example) and both are highly capable formats in the correct hands (and perhaps even indistinguishable from each other in terms of sonics).
Once again I’ll risk oversimplifying things and state that, well implemented, DSD64 can be considered high enough quality for playback on a good quality, well resolving hi-fi, and is practically indistinguishable from PCM at 44.1kHz.
Lossy vs. lossless
So far we have been talking about uncompressed formats. It is also possible to compress (make smaller) digital audio files. We can do this in a “lossy” fashion or a “lossless” fashion.
Lossy would be formats like mp3, where the file size is smaller, but the trade-off is a compromise in sound quality. MP3 quality is expressed in kilobits per second (kbps). The lower this “bitrate” the more compressed and compromised the sound quality. In simple terms, a 128kbps MP3 will sound worse that a 320kbps MP3. (By the way, uncompressed files can also be expressed in kbps. A CD, for example, plays back at 1411kbps.)
MQA is the codec everyone is talking about at the moment. It is a lossy format, but less of a compromise in terms of sound quality than MP3. You can read my thoughts on MQA in a previous article.
Lossless would be formats like FLAC. Most of you will be familiar with the ubiquitous “.zip” and “.rar” files on computers; FLAC is just like that except specially formulated for audio. It compresses your audio file to make is smaller, but when you uncompress it, the original data remains completely intact; indistinguishable from the original uncompressed file. Other common lossless compressed formats are ALAC and OGG. Like WAV and AIFF, FLAC quality is expressed in bits and kilohertz, e.g. 24bit/96kHz.
Cut to the chase: which is best?
For me, if I had to chose one format: PCM at 20bit or higher. If the mastering engineer was good and knew how to downsample well (or if the original recording was at 44.1kHz), I can settle with 44.1kHz. Otherwise 88.2kHz or higher. Whether the file is wrapped/encoded as an AIFF, WAV or FLAC file, I don’t mind. Generally I prefer FLAC because it takes up less disk space.
And finally, my hot tip: please don't waste your money spending more on crazy hi-res 24bit/192kHz recordings. If you have the option of 96kHz and 192kHz, just get the 96kHz version. I guarantee you won't hear a difference unless the mastering engineer is useless (in which case the recording isn't worth hi-res playback anyway).