OK, I’ve gotten some audio clips to illustrate the what I’m trying to do with my audio. Last night I was in my office at home, very late at night. Everyone was asleep; the environment was just about as quiet as it gets. So I recorded the first chapter of Hodgson’s “The House on the Borderland.”
By cranking the audio buffer size up to 4096, a ridiculously high number, I seem to be able to get a recording that is free of audio “glitches.” Actually, I am not completely positive that this is the case — the glitches were always sporadic, and I have not listened to the whole session yet; there may still be some. Using a 4096-sample buffer means I can’t monitor on my headphones — a delay that long confuses speaking too much. So I have to take off the headhones and turn off monitoring, and hope for the best.
I’ve had a ongoing problem with audio levels from the Snowball mic. In order to get the highest level possible, I positioned my Snowball mic (using a boom mic stand) right in my face, perhaps two inches or less from my mouth, so that I can look past it to read. That’s quite close. I don’t have a pop filter, so I’m getting some noticeable plosive “thuds” as the air blast from a “b” or “p” sound hits the diaghragm. I probably should have one, especially if I’m going to mic that closely — but there isn’t much space to mount one, in this configuration, I’m on a spending hiatus, and it isn’t the worst of the issues with my audio, so I’ll do without it for now.
For this experiment, I decided that perhaps setting the Snowball’s gain too high in the Audio MIDI control panel might be contributing to the background noise problem without giving me much more clean signal, so I left it where it seems to default, which is at 0.79 (out of 1.0). Later I can do an A/B comparison and see if this is true, but for now all these clips are recorded at the default setting.
I recorded for about thirty minutes, using no plug-ins, and did no editing. All my on-the-fly retakes are there. For example, for some reason (it was late, I was tired) I was unable to say the word “promontory” correctly. It isn’t a word I use every day; I know the pronounciation is supposed to be “PROM-uhn-tor-ee” but I kept saying “pro-MONT-or-ee.” Anyway, there’s some editing to do, but my raw file is a 16-bit, 44.1 KHz AIFF file of about 150 megabytes in size.
So, after all that, how quiet is it? According to DSP Quattro, the RMS level is -35.34dB, and the hottest peak is at -12dB. This is definitely too low. By comparison, AudioLeak says the A-weighted RMS level of a BBC interview with Douglas Adams measures around -21.2, and Escape Pod, a science fiction podcast I like, measures around -18.2. I’ve seen some general recommendations that material ought to have an RMS level of about -17.5dB, which seems reasonable; I think anywhere from -14dB to -21dB would be fine.
-35dB to -14dB is a big difference, especially when you consider thatdB is a logarithmic scale. So, obviously I’ve got to do something about that; if I made that audio file available, it would sound extremely quiet next to other podcasts. It would be better to have a more dynamic signal to begin with, but I’ve got to work with what I have.
Here is an AIFF file of a small section of the raw recording:
sample-1-close-mic-vol-079-dry.aiff
If I do a straightforward Normalize with DSP Quattro, the results are very unflattering. Let’s say I normalize such that the one detected peak is at -0.3dB. This gives me an RMS level of -23.18dB. That’s still lower than I want. If I ask DSP Quattro to just do its default normalization, it puts that peak at 0dB and the result is a file with an RMS level of -22.79dB.
Understand that, on the logarithmic dB scale, even after normalization, that signal is using only a small fraction of the available headroom — less than 10%. Compressing the signal, even lightly, and then raising it would reduce that peak and allow me to boost the RMS a little higher. But even taking it to -22.79dB gives me a level of background noise that is quite audible — an inevitable part of raising the gain, but not what I had hoped for.
Here’s the same short clip after normalization to -22.79dB:
sample-2-close-mic-vol-079-normalized.aiff
What is the cause of this background noise? I can’t quite tell. The room is as close to quiet as I can get it without turning it into a full-blown vocal booth. This is noise that has an RMS of about -79dB, and sounds like tape hiss. To me, it doesn’t sound like the remaining ambient noise that I hear in the room, which is the Mac Mini, which is not very loud at all, and some very faint noise from an air conditioner some ways away, through the window, perhaps across the parking lot, and maybe a bit of distant traffic sound.
I’ve made a recording of the backgound noise; this was recorded while I was out of the room. Here it is:
sample-3-room-ambience-vol-079.aiff
Anyway, a gate or expander is supposed to help with this kind of thing. Here is the raw file with the Floorfish plug-in applied, using a preset called “exp: vox backgnd bleed”:
sample-4-close-mic-vol-079-floorfish.aiff
As you can hear, it is gating; it seems to be doing basically the right thing, but the settings are clearly wrong for the audio levels represented in the file. It sounds to me like the settings would be right for a much louder source. So, let’s try it again after the normalization:
sample-5-close-mic-vol-079-normalized-floorfish.aiff
That’s with a little tweaking and creation of my own preset: lowering the expansion ratio, changing the sense frequency, and tweaking the sensitivity. It sounds reasonable, and preserves my breaths, but the background noise is still higher than I’d like, even under the voice.
Can I do better? My experiments so far with the Blockfish compressor and with various attempts to filter the unwanted background noise have produced results that worked, but which did not sound very good.
It is too loud in here for me to record tonight (a tremendous series of thunderstorms is still coming through the area), but I will try again with a new dry recording at an increased gain, and also continue to see what I can do with this signal. Ideally once I got a set of processing steps that produced good results, I could just apply them to pretty much any clip I record in the identical setup.