|
Basics
of vocoder The
vocoder is a kind of simulation of our vocal organ which is primarily
comprised by the vocal bands, and mouth. Additionally involved are the
nose cavity and lungs for the air supply. The vocal bands vibrate
(oscillate) when streamed with air and create a sound which mainly can
vary in frequency (aha, the oscillator). The mouth and nose cavity form
a filter which characteristic depends of many factors like mouth
aperture, lip forming and tongue position while speaking (aha again, the
VCF). Well this filter is a little bit more complex than the ones we
know on synthesizers. There are a lot of complex models of filters to
simulate the one we have onboard and they still can’t reach it. Ok
they sound synthetic, but this is just what we want. But getting back to
the vocoder, let’s call the things like they should be: The
oscillating part is the CARRIER-section and the complex filtering part
is the FORMANT-section. In
the vocoder, the formant section is implemented with a set of band-pass
filters that extract the spectral information of the signal applied to
it (usually our speech). The resulting amounts for each filter band in
the formant are then applied as gain amplitude values to another set of
band-pass filters which are applied to the carrier signal. The
“quality” of the vocoder depends primarily of the characteristic of
this band-pass filters. I write “quality” intentionally in quotes
because it is hard to define what we mean with this term. If we want a
good reproduction of our voice then we need a good “quality” but
this is not what I expect from a vocoder. So what should be the quality?
For me, the quality (without quotes) of a vocoder is a good
intelligibility with a sound that is very different to my voice. Another
important aspect for the quality of a vocoder is the ability to detect
and reproduce unvoiced consonants like “s”, “t”, ”h” etc. If
this feature is poor or doesn’t exists, there are methods to improve
it a little. Returning
to the filters, the quality depends of the amount of band-pass filters
and their characteristics. How much filters and what characteristics
(db/octave) they should have will not be discussed here. A vocoder with
16 bands and 24 or 36 db/octave would be enough for me. Until now, I
don’t know what are the specs for the R3’s vocoder (maybe someone
can tell me), so I will assume nothing and try to get the best
effect-quality (a new quality term?) out of it. I remember testing a
vocoder with 256 bands and it sounded nearly equal to my voice which is
not what I am looking for. Effect
quality? Yes. Once the discussion about number and characteristics of the filter bank is settled and the fact that we have to live with what we get, let’s try how to get the best of it. Remember:
these hints are useful for every kind of vocoder and that is why I
don’t get specific to the R3 for now. To achieve the best results it is important to select the appropriate sources for the formant and the carrier. The
carrier: The
carrier signals should be sharp and crispy to have a wide frequency
spectrum in order to have something to feed to each filter. Selecting
warm or soft sounds may sound nice but will decrease intelligibility and
some filters will have nothing to do. Try using dry sounds. If you want
you can apply some compressor or EQ. Avoid using effects like chorus,
flangers or delays in this section, save them for the post-vocoder
signal. Played dry it may sound dreadfully but trust me, these are the
best ones. The
best waveforms are pulse and saw. You can add a little noise to them to
increase intelligibility. For the VCF, set the cutoff to max and
resonance to a low or zero level. They can be applied (played) as chords
or single tones. For improvement at single tones (solo and robot voices)
you can play the same tone one or more octaves higher or lower
simultaneously or use a suboscillator The
formant: Being
the more active and important part of the vocoder, it is essential to
apply a good formant signal to obtain satisfactory results. Here are
some hints:
Intelligibility: Part
of the intelligibility of the text depends of the quality or ability of
the vocoder to reproduce the unvoiced consonants like “s”, “t”,
”h” etc. Some vocoders offer special detection of these unvoiced
sounds. If not, they at least should offer a High-Pass Filter (HPF)
which permit bypassing part of the formant signal. As these unvoiced
sounds are rich in high frequency, we use this HPF to add these unvoiced
sounds from the original to the vocoded one. The setting for the amount
of this HPF is a balance between hearing the unvoiced consonants without
hearing the original formant. For me a good balance is when I can
understand the text without being able to identify the speaker (in this
case my voice) Other
settings: Other
common settings for vocoders are the envelope followers, band shifter
(or band patch matrix) and band panorama.
If
you want to know more about vocoders, search for it in the internet.
There are a lot of articles about them. I wrote some articles too,
but this was years ago before internet was accessible to me. |