music - "The Show Must Be Go" by Kevin MacLeod, CC-BY 3.0
voice - LibriVox recording of "The Art of War" by Sun Tzu, Read by Moira Fogarty, Public Domain
Test\_mp3\_opus\_16kbps.wav - Audio sample by Shishirdasika, licensed under CC-BY-SA 4.0

# Audio Codecs
By Logan G
[comment]: # (!!!) <!--------------------------------------------------------->

@ -0,0 +1,19 @@
<video width="960" height="540" controls>
<source src="media/output.webm" type="video/webm">
Your browser does not support the video tag.
<div style="font-size: 0.66em">
ffmpeg \
-i $INPUT \
-filter_complex "[0:a]pan=mono|c0=0.5*FL+0.5*FR,asplit=2[aout][audiovis]; \
[audiovis]showwaves=s=1920x1080:mode=p2p:colors=white,format=yuv420p[v]" \
-map "[v]" -map "[aout]" -c:v libx264 -crf 36 -c:a libaac -b:a 64k -ac 1
[comment]: # (!!!) <!--------------------------------------------------------->

@ -0,0 +1,39 @@
### Digital Audio Representation
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
![Picture of PCM encoded sine wave](media/sinewave.svg) <!-- .element: style="image-rendering: crisp-edges;background-color:white;" -->
<div style="font-size: 0.33em; line-height: 0.1em;">
Image by Aquegg
Licensed under [CC-BY-SA 3.0](
<div> <!-- Right pane -->
<!-- Title -->
![Picture of PCM encoded waveform](media/linear-PCM.svg) <!-- .element: style="image-rendering: crisp-edges;background-color:white;" -->
<div style="font-size: 0.33em; line-height: 0.1em;">
Image by Aquegg
Licensed under [CC-BY-SA 3.0](
Audio is typically represented in a digital form using Pulse Code Modulation, or PCM.
On the left there is an example of a sine wave. The red line represents the original analog signal, with the blue dots representing the data points
in the PCM encoded version. A similar idea is happening on the right, but with a more complex waveform.
During the conversion process from analog to digital, better know as "quantization", the amplitude of the analog signal is converted into digital data points
at regular intervals. In this example, a simple rounding algorithm is used. Do however note that there are more complicated algorithms,
but you could spend many hours just talking about those alone, and the specifics aren't particularly important here.
View file

### Important Terminology
<div style="font-size: 0.8em;">
- Bit/Sampling Depth
- Number of bits that represent each data point
- Typically 16/24/32 bits
- Sampling Rate
- Number of data points per second
- Typically 44.1/48kHz
- Bit Rate
- Number of bits per second of audio
- Varies wildly
- Audio Channels
- Number of audio streams contained
- Typically 2 (Stereo)
[comment]: # (!!!) <!--------------------------------------------------------->

@ -0,0 +1,13 @@
### Common Audio Codecs
- Dolby Digital
- MP3
- Vorbis
- Opus
[comment]: # (|||) <!--------------------------------------------------------->

@ -0,0 +1,33 @@
### "Lossless" vs "Lossy"
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Lossless
- Perfect representation of the digital input*
- Typically very large (>1Mbit/s)
##### Lossy
- Deliberately removes data to save space
- Typically very small (<128Kbit/s)
There are two main types of audio codecs, "Lossless", and "Lossy".
Lossless codecs perfectly duplicate whatever digital audio was put into them. Now, it is important
to note that since digital audio isn't a perfect representation of an audio signal, that does not mean that
it can perfectly replicate the original.
Compared to lossy codecs, which, deliberately destroy the signal in order to achieve a smaller file size.
This destruction isn't random through, lossy codecs specifically make changes to the original waveform that
are least likely to be perceived by a human listener.
View file

### Common Audio Codecs
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
###### Lossless
- WAV*
- WMA*
- Dolby Digital TrueHD
###### Lossy
- MP3
- WMA*
- Dolby Digital
- Vorbis
- Opus
View file

### WAV
- Published in 1991
- Formerly patented
- Not really an audio codec
- Typically used as a container for raw audio
- Extremely easy to play
- Supported natively across many OSs
- Supports virtually any analog signal(s)
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.wav">WAV Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.wav">WAV Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
The first codec I'm going to talk about this WAV. Now, WAV isn't actually an audio codec.
WAV is actually just an audio container for raw audio, which I'll talk more about later.
WAV was originally developed by IBM and Microsoft in 1991. It used to be a patented audio format
but the patent has since long expired.
Fun Fact: WAV is not just restricted to raw audio. It can represent many analog signals, up to 2.1GHz.
WAV is also extremely easy to parse since it's basically just raw audio.
View file

<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
<div style="font-size: 0.33em;">
| WAV/RIFF Header Format |
| :------: |
| FileTypeBlocID<br>4 bytes<br><br><br> |
| FileSize<br>4 bytes<br><br><br> |
| FileFormatID<br>4 bytes<br><br><br> |
| FormatBlocID<br>4 bytes<br><br><br> |
| BlocSize<br>4 bytes<br><br><br> |
| AudioFormat<br>2 bytes |
| NbrChannels<br>2 bytes |
| Frequence<br>4 bytes<br><br><br> |
| BytePerSec<br>4 bytes<br><br><br> |
| BytePerBloc<br>2 bytes |
| BitsPerSample<br>2 bytes |
<div> <!-- Right pane -->
<!-- Title -->
<div style="font-size: 0.33em;">
| WAV Data Chunk Format |
| :------: |
| DataBlocID<br>4 bytes<br><br><br> |
| DataSize<br>4 bytes<br><br><br> |
| SampledData<br>2-3 bytes typically<br><br><br> |
So easy in fact that I can just show you the entire format.
[Master RIFF chunk]
FileTypeBlocID (4 bytes) : Identifier « RIFF » (0x52, 0x49, 0x46, 0x46)
FileSize (4 bytes) : Overall file size minus 8 bytes
FileFormatID (4 bytes) : Format = « WAVE » (0x57, 0x41, 0x56, 0x45)
[Chunk describing the data format]
FormatBlocID (4 bytes) : Identifier « fmt␣ » (0x66, 0x6D, 0x74, 0x20)
BlocSize (4 bytes) : Chunk size minus 8 bytes, which is 16 bytes here (0x10)
AudioFormat (2 bytes) : Audio format (1: PCM integer, 3: IEEE 754 float)
NbrChannels (2 bytes) : Number of channels
Frequence (4 bytes) : Sample rate (in hertz)
BytePerSec (4 bytes) : Number of bytes to read per second (Frequence * BytePerBloc).
BytePerBloc (2 bytes) : Number of bytes per block (NbrChannels * BitsPerSample / 8).
BitsPerSample (2 bytes) : Number of bits per sample
[Chunk containing the sampled data]
DataBlocID (4 bytes) : Identifier « data » (0x64, 0x61, 0x74, 0x61)
DataSize (4 bytes) : SampledData size
View file

### ALAC
<sup><sup>**A**pple **L**ossless **A**udio **C**odec</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 2004
- Fully Open Source, Patent and Royalty Free
- Decent software support
- Supports:
- 32-bit sampling depth
- 384kHz sampling rate
- 8 audio channels
- Typically 50% the size of equivalent PCM audio
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/musicalac.m4a">ALAC Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voicealac.m4a">ALAC Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Distinct from Alcohol Advisory Council of New Zealand
Apache 2 license
Now, let's talk about actual audio codecs.
ALAC was first developed in 2004 by Apple.
Although originally proprietary, Apple eventually opened up the standard
and reference implementation under the Apache 2 license.
Software support for ALAC is decent. Most modern platforms can play it, although Firefox is
not one of them.
It also has decent capabilities, such as a 32-bit sampling depth, 384kHz sampling rate and up to 8 full bandwidth audio channels.
One thing that many lossless codecs, like ALAC, do, is compress raw audio. Specialzied audio codecs
are more efficient at this job than say gzip, as they can achieve compression ratios of around 50% without
any degradation of the audio signal.
View file

### WMA
<sup><sup>**W**indows **M**edia **A**udio</sup></sup>
- Proprietary and Garbage
Now, let's talk about WMA
It's shit.
View file

### WMA
<sup><sup>**W**indows **M**edia **A**udio</sup></sup>
- Developed in 1999 by Microsoft
- Very poor software support
- Patented with royalties
- Lossless and Lossy versions
- Supports:
- 24-bit sampling depth
- 96kHz sampling rate
- 8* audio channels
Okay there's a little more to say about it.
WMA was originally developed in 1999 by none other than Microsoft in
order to replace MP3 and WAV. Software support is extremely poor, so poor in fact
that I could not even make audio samples for this presentation, let alone play them.
This may surprise you, but WMA is both patented and has royalties for its use.
WMA does come in both lossy and lossless versions, but they both are kinda garbage.
It does however support up to 24-bit sampling depth, 96kHz sampling rate and up to 8 channels,
as long as you're using the lossy version.
Designed to replace MP3 and WAV
WMA Lossless only supports 6 channels
Microsoft lied about the audio quality
FFmpeg only supportes WMAv1 lossy and WMAv2 lossy
View file

### Dolby Digital
<sup><sup>Doesn't stand for anything</sup></sup>
<div style="font-size: 0.6em;">
- Patented with royalties
- Some software support
<div style="font-size: 0.6em; text-align: center; display: grid; grid-template-columns: 1fr 1fr 1fr;">
##### Dolby AC-3
- Lossy
- 16-bit sampling depth
- Up to 48kHz sampling rate
- Up to 5+1 audio channels
- Up to 640 kbit/s*
##### Dolby Digital Plus
- Lossy
- 16-bit sampling depth
- Up to 48kHz sampling rate
- Up to 15+1 audio channels
- Up to 6144 kbit/s
##### Dolby TrueHD
- Lossless
- 24-bit sampling depth
- Up to 192kHz sampling rate
- Up to 7+1 audio channels
Now, I'm gonna briefly talk about Dolby Digital. I only mentioned this one and not
its competitor, DTS, because you can actually use this one.
It's a proprietary format developed by Dolby, although it does have some software support.
For instance, FFmpeg can both encode and decode AC-3 audio.
Commonly seen on BluRays
Mention DTS
View file

### FLAC
<sup><sup>**F**ree **L**ossless **A**udio **C**odec</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 2001 by Josh Coalson, later merged into Xiph.Org
- Fully Open Source, Patent and Royalty Free
- Very good software support
- Supports:
- 32-bit sampling depth
- 192kHz sampling rate
- 8 audio channels
- Typically slightly better than ALAC
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.flac">FLAC Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.flac">FLAC Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Now, for the holy grail of lossless audio codecs, FLAC.
FLAC was originally made in 2001 by Josh Coalson, but was later adopted by the Xiph.Org
foundation, who you will hear about several times throughout this presentation.
It started as a fully open standard with no royalties, and is the most popular lossless codec
for music.
Many things are able to play it, even some lower end music players have FLAC support.
Similar capabilities to ALAC as well.
Made by Xiph Foundation
View file

### Lossy Codecs Comparison
<div style="font-size: 0.33em">
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
![Picture of various lossy audio codecs and their dynamic range at certain bitrates](media/Opus_quality_comparison_colorblind_compatible.svg) <!-- .element: style="width: 80%;image-rendering: crisp-edges;background-color:white;" -->
Image by Jean-Marc Valin
Licensed under [CC-BY 3.0](
![Chart of various audio codecs showcasing audio delay, processing power and bitrate](media/Opus_bitrate+latency_comparison.svg) <!-- .element: style="width: 80%;image-rendering: crisp-edges;background-color:white;" -->
Image by Jean-Marc Valin and Flugaal
Licensed under [CC-BY 3.0](
View file

### MP3
<sup><sup>**MP**EG-1 Audio Layer **3**</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 1992
- Formerly patented with royalties
- Extremely good software support
- Supports:
- 16-bit sampling depth?
- 48kHz sampling rate
- 2 audio channels
- 320 kbit/s bitrate
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.mp3">MP3 Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.mp3">MP3 Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Runs on toaster
Prominent in less than legitimate music obtaining sources
There is also MPEG-2 and MPEG-3 but they are for low bitrate and kinda suck
View file

### AAC
<sup><sup>**A**dvanced **A**udio **C**odec</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 1998
- Open standard with royalties
- Also sometimes called "MP4 Audio"
- Very good software support
- Supports:
- 24-bit sampling depth
- 96kHz sampling rate
- 48* audio channels
- 256 kbit/s/channel bitrate
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/musicaac.m4a">AAC Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voiceaac.m4a">AAC Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
48 channels have up to 96kHz sampling rate, plus 16 channels with 120Hz sampling rate, plus 16 "dialog" channels, plus 16 digital data channels
Used in bluray movies
Fairly common bluetooth audio format
View file

### Vorbis
<sup><sup>It doesn't stand for anything :(</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 2000 by Xiph.Org Foundation
- Fully Open Standard and Royalty Free
- Extremely good software support
- Supports:
- Floating point samples
- 655.36kHz sampling rate
- 255 audio channels
- Superseded by Opus
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.ogg">Vorbis Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.ogg">Vorbis Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Encodes with quality presets, not constant/variable/target bitrates
Floating point samples makes Vorbis very flexible, it's 32-bit floats though
Fairly popular audiobook format
Commonly paired with Therora video format
Created because of a licensing dispute with MP3
FSF approved
Standard is public domain
Apple are weird and don't support it
View file

### Opus
<sup><sup>It doesn't stand for anything either >:(</sup></sup>
<div style="font-size: 0.55em;">
- Developed in 2012 by Xiph.Org
- Fully Open Standard and Royalty Free
- Very good software support
- Low latency, music or speech optimized audio codec
- Supports:
- 16-bit sampling depth only
- 48kHz sampling rate
- 255 audio channels
- 256 kbit/s/channel bitrate
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.opus">Opus Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.opus">Opus Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Commonly used for low latency voice applications (Discord, Skype (new), Teams, Source Engine 1&2, Zoom, basically anything WebRTC)
Also commonly used for music playback (YouTube, Spotify)
Typical latency is 27ms for default frame size
2-way Bluetooth audio codec
View file

### Weird Audio Codecs
- Codec2
- Lyra
- Speex
- Full Rate
- G.7xx
Full Rate and G.7xx is for telephony
SILK is Skype
View file

### Speex
<div style="font-size: 0.55em;">
- Developed in 2003 by Xiph.Org
- Speech optimized, low latency audio codec
- Fully Open Standard and Royalty Free
- Okay software support
- Supports:
- 16-bit sampling depth
- 32kHz sampling rate
- 2 audio channels
- 44 kbit/s bitrate
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/music.spx">Speex Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voice.spx">Speex Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
View file

### Codec2
<!-- Samples encoded at 2400 bit/s -->
<div style="font-size: 0.55em;">
- Developed in 2010 by David Grant Rowe
- Speech optimized, ultra low bandwidth audio codec
- Fully Open Standard and Royalty Free
- Very poor software support
- Supports:
- 16-bit sampling depth only
- 8kHz sampling rate only
- 1 audio channel
- 3.2 kbit/s bitrate
<div style="text-align: center; display: grid; grid-template-columns: 1fr 1fr;">
<div> <!-- Left pane -->
<!-- Title -->
##### Music
<audio controls src="media/samples/musicc2.flac">Codec2 Music Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
"The Show Must Be Go" by Kevin MacLeod
Licensed under [CC-BY 3.0](
<div> <!-- Right pane -->
<!-- Title -->
##### Voice
<audio controls src="media/samples/voicec2.flac">Codec2 Voice Sample</audio>
<div style="font-size: 0.33em; line-height: 0.1em;">
LibriVox recording of "The Art of War" by Sun Tzu
Read by Moira Fogarty
Officially supported 700 bit/s - 3200 bit/s, but a white paper exists for 450 bit/s
Developer worked on Speex
View file

![QR code containing a Codec2 encoded recording](media/voiceqrcode.svg) <!-- .element: style="width:50%; image-rendering: pixelated;" -->
cat $FILE | c2dec 700C - - | ffplay -f s16le -ar 8000 -
View file

### Low Bitrate Comparison
<div style="font-size: 0.3em">
<audio controls src="media/samples/voice.flac">FLAC Voice Sample</audio>
LibriVox recording of "The Art of War" by Sun Tzu
<span style="padding-right: 5em"></span>
Read by Moira Fogarty
<div style="font-size: 0.5em; text-align: center; display: grid; grid-template-columns: 1fr 1fr 1fr;">
##### Codec2
2400 bit/s:
<audio controls src="media/samples/voicec2.flac">C2 2400 bit/s Voice Sample</audio>
1600 bit/s:
<audio controls src="media/samples/voicec2lb.flac">C2 1600 bit/s Voice Sample</audio>
##### Speex
16000 bit/s:
<audio controls src="media/samples/voice.spx">Speex 16kbit/s 8kHz Voice Sample</audio>
8000 bit/s:
<audio controls src="media/samples/voicelb.spx">Speex 8kbit/s 8kHz Voice Sample</audio>
##### Opus
32000 bit/s:
<audio controls src="media/samples/voicemb.opus">Opus 32kbit/s Voice Sample</audio>
16000 bit/s:
<audio controls src="media/samples/voicelb.opus">Opus 16kbit/s Voice Sample</audio>
8000 bit/s:
<audio controls src="media/samples/voiceulb.opus">Opus 8kbit/s Voice Sample</audio>
View file

### Multimedia Containers
- Many audio formats cannot be stored "as-is" on disk
- Common audio containers:
- RIFF/WAV (wav)
- MPEG-4 (mp4/m4a)
- OGG (ogg)
- Uncommon audio containers:
- WebM (webm)
- Matroska (mka)
MPEG-4 supports AAC, FLAC, MP3, ALAC, and Opus
OGG supports Vorbis, Opus, Speex, and FLAC
WebM supports Vorbis and Opus, and is a subset of Matroska apparently
Matroska supports everything made maybe ever
View file

### Useful Resources
View file

![Presentation Source QR](media/presentationsourceqr2.svg) <!-- .element: style="width:30%; image-rendering: pixelated;" -->
[Presentation Source](
This presentation is licensed under [CC-BY-NC-SA 4.0](
![CC-BY-NC-SA Icon](media/Cc-by-nc-sa_icon.svg)
View file

## Fun Facts:
This presentation took 23 hours
LibreOffice crashed 6 times, so I stopped using it
Made with mdslides + reveal.js
[comment]: # (!!!) <!--------------------------------------------------------->