Audio Recording Settings for Geeks

Some parts of modern digital audio equipment easily produce MUCH more accuracy than human ears can hear, even at “low quality” settings!

TIP: Just want the basics of using Audacity for voice over? See Schoolofvoiceover.com/talent-resources/sound-design-tricks-for-audacity/

You’ve probably heard of sample rate, e.g. 44,100 times a second (44.1k) for “CD quality” audio. That just means how often we take a digital “picture” of the pitch/frequency of whatever audio we are converting.

SIMPLE ANALOGY (Understand sample rate? Skip this)

Picture ripples/waves coming at you on water. You take photos of the waves with your smartphone for ten seconds. If you want to later look through your photos and count how many waves there were, and there were only two waves in ten seconds, you don’t need 50 photos!

But if there were lots of fast, tiny ripples, you’ll need more photos to count how many reached the beach in those ten seconds. Waves closer together are like high frequency/high pitch sounds, and waves farther apart are low frequency. Sample rate just means how often you take a picture. So digital equipment “takes a picture” as often as the sampling rate you set it for, such as 44,100 times a second for “CD quality” audio.

THE PART SOME PEOPLE DON’T BELIEVE

It is possible to produce PERFECT reproduction from a digital sample back to the analog sound waves without needing to take very frequent “pictures” of the source sound. Hear and see this video for visual and audible proof: Xiph.org/video/vid2.shtml

BIT DEPTH

You also have to “take a picture” of the volume (amplitude) of the audio. And when doing this, *some* noise is ALWAYS introduced. In digital terms, you need more bit depth, e.g. 16 bits in CD-quality audio to quantize the amplitude (think of each picture as one quanta, taking these pictures is the process of “quantization”, or breaking it into pieces before saving it digitally).

MODERN TECH FIXES IT AGAIN!

The noise introduced when saving digital “pictures” of the amplitude (volume) can be called distortion artifacts. But there’s a modern trick for making this problem (noise from distortion) so small that you CAN’T hear it. It’s called dithering.

Intentionally adding some noise to the signal prior taking picture of amplitude (quantization) turns quantization distortion all the way down: you don’t hear noise!

Llsten to how amazingly dithering can preserve audio listenability here: Audiocheck.net/audiotests_dithering.php (Audiocheck.net has lots of helpful ways you can test your audio equipment.) “Shaped dither,” which moves quantization noise energy into frequencies where it’s harder to hear, is a modern trick that makes this work even better. Dither sound too good to be true to you? Read this Uwspace.uwaterloo.ca/bitstream/handle/10012/3867/thesis.pdf

WHEN MORE QUALITY MEANS WORSE SOUND

When you play back sound pitched higher than our ears can hear (ultrasonics), audio transducers and power amplifiers introduce EXTRA distortion trying to play back those sounds you can’t hear, and it makes them work less well playing back sound you CAN hear.

In other words, high-sample rate sound encoding can make it so hard for your equipment that it reduces the fidelity (quality) of what you hear. Yup: “too much” encoding makes worse sound on playback!

We’re talking about 192kHz digital music files here: higher than you need to play back at.

Because neither audio transducers nor power amplifiers are free of distortion, and distortion tends to increase rapidly at the lowest and highest frequencies, if the same transducer tries to reproduce ultrasonics along with audible content, any nonlinearity will shift some of the ultrasonic content down into the audible range as “an uncontrolled spray of intermodulation distortion products covering the entire audible spectrum.” Oops! Source: People.xiph.org/~xiphmont/demo/neil-young.html

AVOIDING SOUND YOU CAN’T HEAR

If you accidentally sample sound you can’t hear (high pitched ultrasonics) it may be digitially converted to into sound that you can hear. This is known as the “aliasing” problem.

What analog-to-digital converters do is to fix this is filter OUT sound you don’t want to have converted BEFORE it is converted. That filtering (often called low pass) is easier at higher sample rates.

But in modern equipment you can sample at LOWER rates like 44.1kHz or 48kHz audio with all the fidelity benefits of sampling at high rates (low aliasing) and none of the drawbacks (ultrasonics that cause intermodulation distortion, wasted space from large file sizes). Why? Nearly all of today’s analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) oversample at very high rates. Few people realize this is happening because it’s completely automatic and hidden. Source: People.xiph.org/~xiphmont/demo/neil-young.html

Oversampling at high rates (as opposed to playing back) helps prevent aliasing. In modern integrated circuit technology, the digital filter is easier/cheaper to implement than a comparable analog filter required by a non-oversampled system. Lots more info here: Wiki.xiph.org/Videos/A_Digital_Media_Primer_For_Geeks )

HIGHER BIT DEPTH *PRACTICAL* ADVANTAGES

Now we’re back to talking about bit depth (the “container” for the loudness of sound) as opposed to sampling rate (the “container” for the pitch of sound).

There ARE some reasons to record at 24 bits instead of 16 bits, though. It’s more forgiving for noise and clipping settings, and it provides more data for effects processing. When digitized sound is manipulated and changed over and over again by effects, more data is better for the mathematics of processing. But again, you don’t need to save it in a large file size when done! 16 bits is MORE than enough.

And recording at the setting called “32-bit float” is VERY effective at preventing clipping. Using the float means (oversimplifying here) that almost no matter how loud your sound is, there is a digital number to represent it. However, if you convert down to a lesser float when exporting, clipping may occur.

FOR SKEPTICS

In 554 trials, audiophile listeners chose which audio was high quality correctly 49.8% of the time. Pretty much 50/50. In other words, they were guessing. Not one listener throughout the entire test was able to identify which audio was 16/44.1 and which was high rate…and the 16-bit signal wasn’t even dithered!

Here’s the original test: Aes.org/e-lib/browse.cfm?elib=14195 and some online discussions of it:
mixonline.com/recording/mixing/audio_emperors_new_sampling/
Hydrogenaudio.org/forums/index.php?showtopic=57406
Bostonaudiosociety.org/explanation.htm

People say “I did a test and I could hear the difference” but there are several ways to fool you into thinking you can hear a difference. The minimum standard for a good comparison test is ABX Wikipedia.org/wiki/ABX_test

Here’s an article on how difficult it can be to run a good, fair test from a famous example: Bostonaudiosociety.org/bas_speaker/abx_testing2.htm

ABOUT SPEAKERS

In the real world, speakers sometimes hang on a wall, sometimes sit on a table, with infinite possibilities in how the space around them absorbs and reflects sound. Read about how a famous speaker model became very controversial and was eventually scrapped at the link below, and see how speakers need to be designed for their likely environment:

Soundonsound.com/reviews/yamaha-ns10-story