random things and thoughts … and bad ideas
fun with audio codecs
2015-01-21Posted by on
Have you ever wondered what the stuff a lossy audio codec throws away actually looks or sounds like? After watching the awesome videos with Xiph’s Monty about digital media (which everyone interested in that topic should have watched by the way) I thought that it should be possible to visualize and sonify what exactly is lost during lossy compression. I wondered if I could create some form of diff between the source and the compressed audio. Turns out this is actually pretty easy.If you remember your physics lessons you hopefully know that sound is a wave of air pressure and that waves can amplify or weaken each other if they overlap in a specific way. Overlapping two identical waves with one being inverted will even lead to total cancellation. I think you can see where this is going. So I ripped a CD I recently got, compressed it with different codecs, inverted the wave form and combined the result back with the source. It actually worked and those audio diffs also changed depending on codec and encoder settings exactly like I expected. But let’s do this all over again from square one with material that is already available on the net.
It’s not easy to find free, high quality recordings that have a variety of instruments and are not already degraded by lossy compression. Digging through archive.org’s community audio section I did find some lossless orchestra recordings that should work hopefully. I decided on this recording of Igor Stravinsky’s “Petrushka” and used the first 5 minutes. I will not host or embed the actual sound files on this blog. If you want to listen to the files you can download the set at the end of this post. Here’s the spectral view of the source file.
As you can see the file contains the whole range of frequency data up to a not so sharp cutoff at 21 kHz and some noise above. This most probably comes from me up-sampling the source file to 48 kHz after it was previously down-sampled to 44,1 kHz by the uploader on archive.org. Let me explain why I did this. The Opus codec does not support a sample rate of 44,1 kHz instead the encoder always up-samples to 48 kHz internally. So the output from the Opus decoder will also be 48 kHz unless you sample it down again. To have a level playing field I decided to do all tests from the same source file and at the same sample rate so I up-sampled the source again. Also sample rate conversion can be done practically lossless especially if you are up-sampling. I also down-mixed the stereo source to mono to save ~50% space and make it easier to check the spectral views.
Because I want everyone to be able to repeat this and check for themselves if I’m talking nonsense here I will explain how did this step by step. After creating the source file I encoded it with the four, in my opinion, most popular lossy codecs MP3, Vorbis, AAC and Opus and each of them with two different average bitrates 32 and 64 kbit/s. The exact commands and version information about the tools used for compression can be found in the download below. So I had eight different lossy compressed files. I fired up Audacity and imported one of the compressed files. Using the effect “Invert” apparently does exactly what it says and flips every positive sample from above the 0-line down and every negative sample up. If you import the same file again and merge it with the inverted wave form you will get a flat line of silence because for every given sample point you will have X + −X = 0. Instead I imported the uncompressed source file and merged those two together leaving only the difference between both wave forms. I exported the result using FLAC so it would not be degraded by any compression and repeated the same procedure with the other files.
What are those expected results I mentioned earlier? Let’s make some very basic assumptions. With higher bitrate the same encoder should create files that are more similar to the source so the difference will be smaller and thus less pronounced, loud or generally noticeable. Also any “information” that is lost in the compressed file, that is usually audible as an artifact, should show up in the difference. Meaning the more noticeable an audio artifact is in the compressed file the more detailed and clear that lost piece of “information” should sound in the difference. And the other way with more subtle audio artifacts in the compressed file there will be less detail in the difference which should sound more mushy. So let’s have a look.
There are two things about this codec I should mention. First LAME usually tries to change the sample rate to something appropriate for the resulting file to save some space. As you can see the frequency cutoff for the lower bitrate MP3 is at 8 kHz so only a sample rate of something over 16 kHz would be needed for that file and anything beyond that would waste space. Still I prevented that re-sampling because I wanted to do the test at exactly the same sample rate. So if you let LAME decide everything automatically the result should sound a little better than what I created. The other thing is LAME apparently shifts the wave form by a few microseconds. So to properly merge the waves together I had to manually align the MP3 wave form to the source in Audacity. I don’t know if this is a general issue with MP3 or just with LAME.
So what is there to see? The first assumption definitely applies here because comparing both difference views the one for the 64 kbit MP3 is generally more silent. For the second assumption take a look at the difference view of the 32 kbit file and compare it to the source’s view. You will see that over the cutoff frequency of 8 kHz both views are identical apart from a slight increase of noise in the inaudible range above 20 kHz. So anything above 8 kHz is completely unchanged in the difference and instruments that use that frequency range are clearly audible.
Nothing special to say about this codec. The compression is sample exact and the tools are straight forward to use.
Again it’s clearly visible that assumption one is true comparing the both difference views. Comparing the difference views of MP3 and Vorbis however doesn’t tell much at first glance other than looking different. I recommend listening to the difference files instead and comparing those. You will notice the distinct differences between both compression artifacts.
This codec has a few quirks again. First is the same problem MP3 had with the shifted wave form. Manually aligning the wave form in Audacity solved that. The second one is that some decoders seem to always decode to stereo even if the source file used for compression was mono. This is not a big deal as the decompressed stereo stream can be converted to mono easily in Audacity. Still it’s quite confusing.
While AAC and HE-AAC are not exactly the same at double the bitrate AAC still looks lot closer the source file. Looking just at the spectral view of all the codecs the 64 kbit AAC seems to be the closest to the source at least in the lower frequency spectrum.
Again, like Vorbis, absolutely no issues with this one except that output will be at a sample rate of 48 kHz as I explained above.
Nothing special to say here either. I think nobody is surprised that assumption one still is valid. One interesting thing in the 64 kbit difference view are the clearly visible frequency bands where compression obviously differs from other bands and one band with nearly no difference to the source file at all.
One important thing to mention is that looking at the spectral views alone won’t tell you everything about the quality of a codec. All modern codecs use psychoacoustics to evaluate how to compress different frequencies. A missing frequency that is clearly visible in the spectral view might not be audible at all because of how our ear and brain receive and more importantly perceive sound. If you want to find out what codec sounds best you won’t get around doing listening tests. Still I think it is possible to say just from looking at the difference views that Opus and AAC are generally more advanced codecs than MP3 and Vorbis.
Another thing is using mono files does not test the full potential of modern codecs. Another feature of audio codecs is channel coupling which prevents saving two completely separate audio streams for a stereo file. So it’s also necessary to test how well a codec preserves the stereo information if channel coupling is enabled. I chose mono files because it is a lot easier to look at a single spectrograph per audio file instead of two.
I created the spectrographs with Spek which is a very straight forward little tool.
The complete set is 178,4 MB in size. If you want to download only a few specific files use the torrent. The MEGA download is a single compressed 7z-file. Because apparently WordPress doesn’t like Magnet URIs the torrent link is not clickable.