Disclosure: We may receive commissions when you click our links and make purchases. However, this does not influence our reviews or ratings. We endeavor to keep our opinions fair and balanced to help you make informed buying choices.
What do sample rate and bit depth have to do with digital audio?
What sample rate and bit depth should I work at?
Find out how the two terms are related and learn some common myths and facts.
Sample rate and bit depth are terms you will have no doubt heard at some point. They are important concepts in digital audio that very directly affect audio quality and how sound is produced in general. Thankfully, it’s easy to get a handle on what they mean and how they work.
In this article, we’ll get right into the fundamentals behind sample rate and bit depth and look at how they vary across different media types (think CD audio vs BluRay). But more importantly, you’ll learn which settings are crucial for ensuring your songs are in the best possible quality.
Sample Rate And Bit Depth Explained
What Are Samples?
Before we learn about sample rates and bit depth, we need to know what ‘samples’ are in this context. First thing’s first, this has nothing to do with sample packs or the art of sampling. Although they are technically related, and we’ll discover why shortly.
Samples are the tiny ‘frames’ of digital audio that make up our waveforms. If you’re familiar with how digital images work, you can think of them as being like pixels but for waveforms. You might be able to see them if you zoom in far enough, although it depends on what software you are using.
So this is why samplers are called samplers! It’s just a convenient coincidence that it also describes the process of borrowing musical elements from other songs.
The terms sample rate and bit depth describe how samples are used to create sound with digital audio. We’ll look at what these terms mean, and do some myth-busting surrounding these concepts.
This is the number of samples per second, and so we measure this in Hz (often kHz). You can think of this as the “frame rate” of sound, although it is much faster than the frame rate of film and video games.
When you work on music in your DAW, you have a set sample rate for all the sound you are hearing. This is usually found in some “options” type menu, for example in Ableton Live you will find this setting under Options->Preferences or CTRL + , (comma, CMD + , for Mac).
Any audio clips you are using have their own individual sample rate. If the sample rate of an audio clip differs from your DAW’s current sample rate, your DAW will adjust so that the clip plays at the correct pitch and speed (which are directly linked in sound and music). This is one reason why you can’t simply edit the sample rate of a file without applying some process to compensate for the time-shift that occurs.
The sample rate directly affects what frequencies can be present in a sound wave. If it is too low, then there is not enough information to represent frequencies beyond a certain point. This point is called the Nyquist frequency and is always exactly half the sampling rate because it takes at least two samples to make an oscillation.
44100 Hz (or 44.1 kHz) is considered to be the minimum sample rate required for high fidelity digital audio, and this is the same sample rate used for CDs. This means the Nyquist frequency for CD-quality audio is 22.05 kHz, which is too high to hear, meaning we are using the full frequency range.
This sample rate was settled on as it provided the best compromise between audio quality and album length – ensuring a maximum of 74 minutes of music (later improved to 80 minutes for some releases).
It’s important to remember that the sampling rate is per channel. This means that with a stereo or multichannel waveform (like a 5.1 surround WAV file) while we do technically have more samples per second as there are simultaneous channels, this isn’t reflected in the sample rate.
Simply put, if you try to fit frequencies beyond the Nyquist into a waveform, there’s no way to represent them at that sample rate. This messes with the sampling process because we are grabbing bits of audio with frequencies that are too high to be sampled. These inaccuracies are heard as ‘reflections’ as if the frequencies are being pushed back in the opposite direction as they go beyond the Nyquist.
So from this, we can see how frequencies we cannot hear are still important components in a sound wave, and if we are aware of this we won’t get aliasing. Thankfully, most of the time plugins that introduce harmonics have “anti-aliasing” features, so this isn’t exactly something we need to be too worried about.
Oversampling is the process of increasing the sample rate by a factor of two or more, usually temporarily inside a plugin so it can perform a process in higher resolution – most often distortion.
Oversampling means more headroom for harmonics that are introduced by distortion and overdrive effects – which can go far beyond 22 kHz.
As we learned, if the harmonics go beyond the Nyquist and aren’t properly filtered, you get aliasing. Oversampling helps to avoid aliasing as it increases the Nyquist frequency and gives us more ‘spectral headroom’.
A good distortion plugin will likely already have some form of oversampling built-in, so even if you are working at 44.1 kHz, it will be upscaled to handle the extra harmonics, then safely filtered in the process of downsampling back to your DAW’s sample rate.
Bit depth directly relates to the resolution of each individual sample. Imagine our waveform on a graph where the amplitude of each sample is in the range of -1 to 1. The more bits we have, the more individual ‘steps’ there are between the min and max amplitude values, meaning there is more detail in each sample.
With low bit depths, there are only a limited number of possible values for our samples. For example, with 2 bit audio, we can have only four different possible values for each sample. Our wave quite literally lacks detail, and can only roughly convey the original sound. For this reason, 2 bit audio is never used, and it sounds absolutely terrible! In fact, anything less than 16 bits is considered low quality.
16 bit audio files, the standard for CDs, can have 65, 536 possible values per sample. While this sounds like a lot, when you factor in multiple stages of processing combined with the dynamic range of modern music, we start to realize that even this isn’t enough for mixing.
Sure, it’s perfectly fine for music once it has been rendered. But in the process of calculating that final waveform, we are processing audio so much that we need extremely precise values for each sample, and 16 bits will not hold up here.
For this reason, with digital audio, we use bit depths of 24 bits and above when recording and producing, and 16 bits and above for final renders. 24 bit audio can have over 16 million values per sample, which is a whopping increase over 16 bit audio. 32 bit float predictably takes things even further, providing over 4 billion different values.
The difference between 24 bit and 32 bit float audio is impossible to hear, even in music with a wide dynamic range. The main advantage of 32 bit float audio is waveforms can “clip” without losing any data.
With 24 bit audio, once the maximum value of 0 dB is reached, that’s it! There is no longer any data left to record louder signals. But 32 bit float audio has a much higher ceiling, and if a signal happens to go beyond 0db, it can be safely reduced later on with no clipping.
In terms of recording quality, 16 bit audio is certainly adequate, but I wouldn’t say it’s ideal, as quiet sections in particular can sound rather noisy when boosted, so music with a wide dynamic range will suffer.
Bit Depth and Dynamic Range
As you may have gathered, there is a direct relationship between bit depth and dynamic range. A 16 bit audio file can have a maximum dynamic range of 96 dB, which is certainly very acceptable.
But it should be noted that very soft sounds recorded in 16 bit quality will have less definition, as they are not using the full range.
If this is confusing, consider a full-scale waveform in 16 bit quality, where the lowest bit value is -32 768 (-1 on a graph) and the highest value is 32 767 (1 on a graph, with a sample value of 0 being -inf dB / pure silence). All these numbers add up to 65 536, so in this case, we are using the full range of values available to represent this waveform.
But if we halve the amplitude of this wave, we are only using half of this range!
While this logic also applies to 24 and 32 bit waves, with these we have so many possible values, that we don’t need to worry about any quality loss when reducing amplitude. Audio recorded at 24 bits can have up to 144 dB of dynamic range, which is bloody huge!
In the early days of (16 bit) digital recording, it was important that the full range of values was used to ensure the best quality digital recording. This meant engineers would play a dangerous game as the ideal recording level was only a notch down from the level at which clipping occurred.
With 24 bit audio, this is no longer the case, and digital audio that is recorded softly can be safely amplified without any noticeable noise. Obviously gain staging is still important and if you don’t pay attention to preamp noise and other issues in your signal path, it won’t matter what bit depth you record in as you’ll still have noise.
So, higher bit depths don’t guarantee a clean signal or a wide dynamic range, they just make them possible. It’s up to you to make the most of these things.
This is a term that you may have seen but are otherwise unsure about. Dithering is a specific process that is applied when we decrease the bit depth (sometimes called ‘bit crushing’). It’s actually easier to learn what dithering is if we first learn why we need it.
So, when we decrease the bit depth, we quite literally lose detail in each sample, and this manifests as mathematical imprecisions. Some sample values can be accurately decimated without errors. For example, a value of 1024 can easily be halved to 512 with no loss in quality. But a value of, say, 619, cannot be divided as easily. We end up with 309.5, but we can’t have decimal places when storing numbers in bits. So this value is truncated to 309, and we’ve lost definition.
This loss in definition creates ‘quantization noise’ which is similar to aliasing noise. Dithering counters this noise by – drumroll – applying even more noise! Yes, a small amount of random noise is added to each sample, so that the harsh noise created by quantization errors is replaced with a more natural-sounding noise. This may seem counter-intuitive but it’s actually quite ingenious.
One important thing to remember with dithering is to only apply it to the very final bounce of a song – i.e. once it has been mixed and mastered. If you apply dithering to a track before it is mastered, you’re adding in unnecessary noise which will affect the final master. You shouldn’t even find yourself in this situation though, as dithering pretty much only applies when you are going from 24/32 bit to 16 bit, and you shouldn’t be sending 16 bit tracks to mastering engineers anyway.
What Bit Depth Should I Use?
There are three different scenarios where this question applies: recording, mixing, and rendering.
As I’ve already stated, you should record in 24 bit quality, as 16 bits is too low.
Experimental musicians and sound designers who rely on resampling may appreciate 32 bit quality as you can go beyond 0db without clipping, which is useful for those long jams with unpredictable level changes. However keep in mind that if the input clips on the preamps of your interface, it still clips regardless of the bit depth! This may also be true of certain plugins or pedals in your effects chain.
You should mix in 32 bit quality (or above), but there’s a good chance your DAW already does this and it’s not usually a setting you can change anyway. So you don’t really need to worry about this one.
With rendering, I would suggest 24 bit quality, or 32 bit float if you prefer. Then, you can make a 16 bit copy of this audio file if you need to. In this day and age, the size difference between 24 and 16 bit audio files isn’t worth worrying about, so you may as well pick the higher quality option.
Rendering in 24 bit or above is especially important if the song is not yet mastered. Don’t make the audio 16 bit until the final master is complete. This is also true with dithering – do not apply dithering until the very final stage.
One last thing – do not confuse bit depth with the bit rate of MP3 files. They are separate, and while an MP3 file has a sample rate that affects the bit rate, MP3s do not have bit depth in the same way as WAV files.
What Sample Rate Should I Use? (44.1 kHz vs 48 kHz)
I would argue that all producers should work at 48 kHz. But it’s also ok if you don’t. Music comes first, and even at 44.1 kHz, you’ll still have excellent sound quality. So, as is often the case when it comes to creative tools we use, the choice you want to make is the right choice.
So, this is a controversial topic that prompts a lot of furious debate, and I personally believe the benefits of working at 48 kHz outweigh the downsides at this point. The benefits are small, and not exactly essential, but I am strangely comforted by them.
Benefits of working at 48 kHz:
Better compatibility with YouTube and other video services.
Slightly higher quality – which adds up with complex projects with lots of plugins.
Easy to calculate the number of samples per millisecond (i.e. no ugly decimal places).
Easy to calculate samples per video frame at 24 frames per second (exactly 2000 samples per frame).
Easy to calculate samples per cycle (an A note at 440 Hz is exactly 220 samples long).
Downsampling to 44.1 kHz is better quality than upsampling 44.1 to 48 kHz.
It’s slowly becoming “the new standard”.
Disadvantages of working at 48 kHz:
Reduced compatibility with various streaming services and older audio hardware like vintage samplers.
This means you will likely end up with copies of your samples at different sample rates, which can eat up storage space.
More CPU usage spent processing audio as you mix, which may overload less powerful computers resulting in clicks.
One of my main arguments is that 44.1kHz is an arbitrary restriction from the CD era, yet we’ve come so far since then with digital audio technology. As stated, 48000 Hz is most often used in video production, and digital video has become a big money-spinner, employing many sound designers, composers, and audio engineers.
Let’s just say you end up working in this industry, and the DAW you use is the same DAW that pays your bills. Are you going to constantly switch sample rates whenever you go to make music, just to satisfy the nerds at Sony who originally decided on 44.1kHz way back in the ‘70s? No disrespect intended, but let’s face it, they are nerds.
It’s easy enough to change the final rendered waveform to other sample rates, regardless of what you are working at. You can do this with a dedicated wave editor like Audition or WaveLab, or you can check the render settings in your DAW for a different output file format.
It’s also a matter of fact that 48kHz is higher-resolution audio than 44.1kHz, and because they are so close to each other, it’s not unreasonable to reach for the higher resolution option based on pure convenience if nothing else.
Having said all this, I think it’s totally fair to say that the difference is minute or insignificant. You particularly may not notice the difference if you have a simple acoustic song with one vocal track.
But with music that is more textually dense, or just has more plugins and processing, it’s at least scientifically accurate to say that you would have a higher resolution or higher ‘quality’ waveform. Now, our idea of quality differs from this mathematical definition, though they are at least correlated up to a certain point.
On that note, there are a few myths I see repeated in the course of this debate that I want to deal with right here and now because I am sick to death of them. These involve common misconceptions about how working at different sample rates and rendering in different formats can affect the quality of the audio.
Myth 1: “We Can’t Hear Higher Frequencies So Higher Sample Rates Are Pointless”
The first myth relates to using higher sampling rates and is often expressed as:
“We can’t hear frequencies beyond 22kHz, which is basically the Nyquist frequency for 44.1kHz, so why go higher?”.
That seems to make sense, right? We can’t hear beyond this range, so why should we go higher than 44.1 kHz? What is the point of higher sampler rates at all?
It’s easy for someone to take a 48 kHz wave, downsample to 44.1 kHz, and claim that there is no difference, so we should just work at 44.1 kHz. Indeed if you heard both it is very unlikely you would hear any difference, so maybe they are right?
However, this demonstrates a lack of understanding about how the sample rate affects nearly every single process inside our DAW’s audio engine.
In this case, we have incorrectly assumed that the same rate is only relevant to the final waveform. But to make this waveform, your DAW processes each track of audio, and all the plugins, at a sample rate. This is usually the same sample rate as the final wave, and I’d suggest you keep them the same, too.
Inside each plugin, samples of audio are captured, processed, then are passed to the next plugin (or the master channel if it’s the last plugin in the chain). What’s more is each plugin will process audio multiple times to get the final sound. So the difference in the sample rate is not only factored in for the output, it’s a factor in every step involved in computing the final waveform.
So we can now see how the true way to compare sample rates would be to render from your DAW at 48 kHz, then change your DAW’s sample rate to 44.1 kHz and render a copy.
Myth 2: “96 kHz And Above Is Pointless”
This is more of a sub-myth to the first one, but it’s worth examining in its own right.
Just like with 44.1 kHz vs 48 kHz, it’s unlikely that you’ll hear any difference between 96 kHz and 48 kHz waveforms on any speakers or headphones. But also, all the same sound quality concerns with processing still apply with 96kHz, meaning it’s super high res audio! But is it really worth the disk space?
Yes and no. I would say it’s certainly not worth it for large recording projects unless you are a really committed and brave audiophile. But if you have a studio with heaps of hard drive space and unlimited cloud storage, plus amazingly fast computers, I say go for it. If you’re feeling game, you can even try 192 kHz!
But for most producers, I would not recommend working at 96 kHz or higher sampling rates. So, is this a myth then?
Well, for some people, it’s not “pointless”. For example, many sound designers who rely heavily on resampling and intense processing work at 96 kHz, or at least make some sounds at higher sampling rates. If you’re slowing down a sound file to a crawl, that extra resolution can really help.
So really using higher sample rates isn’t because we want to hear above 22kHz, it’s more about having a higher resolution in which to represent all frequencies. But also, if we slow down 96 kHz audio, we can begin to hear frequencies captured in the recording that are not otherwise audible.
These ‘extra’ harmonics can be unwanted, depending on how the audio is captured. But I’ve found with a nice clean signal path, analog synths recorded at 96 kHz can absolutely produce harmonics beyond 22 kHz, and in general sounds produced at 96 kHz and above will still have more ‘treble’ when pitched down. Try it out!
Myth 3: “Streaming Services Use 44.1 kHz So Why Bother?”
Defenders of 44.1 kHz may argue that music streaming services will expect 44.1kHz audio files, so we should work at this rate. I would counter that this is a regressive hangover from the CD era – they expect this quality because all the recorded music is already in this format. So this is a myth that feeds into itself, in a way.
Downsampling a copy of your final waveform from 48kHz to 44.1kHz is a simple solution to this problem anyhow. When done correctly, there will be no noticeable loss in quality.
So 44.1 kHz will be around for as long as it’s supported. If Spotify and other services start streaming in 48 kHz quality, expect this to “signal” the end of 44.1 kHz.
So with digital audio, we have “samples” that are basically pixels of digital audio that describe a waveform. The sampling rate is the frequency at which samples are played back. We need a high enough sampling rate to ensure all audible frequencies can be represented.
The bit depth is the quality of each sample in terms of its mathematical accuracy. Lower bit depths mean there is less digital information available to describe each sample, and this results in noise and a lack of dynamics. 24 bit audio is excellent quality, though 16 bits is perfectly adequate for mastered tracks as long as they are not too soft.
Setting the sample rate and bit depth is something you should be able to do once and then forget about. There’s no practical benefit to using different sample rates for different projects unless you really need to for video projects and the like. For the most part, it is easy to set and forget, but there are still plenty of times when you will need to deal with conversions especially if you are uploading a lot of music to Soundcloud and Spotify.
Thankfully these days converting between different sample rates and bit depths is a breeze and the algorithms available result in high-quality sound. But it’s still very useful as a producer to have a solid idea of how they relate to audio and music in general.