1. Binary Spectrogram Masking

The idea:
To create time-frequency islands (binary masks) that represent speech activity derived from the spectrogram. The islands represent formant locations, and they can be of any shape. However, the islands must be continuous both in frequency and time, as they follow the  formant evolutions in the TF space.

The idea came
in 1999, after implementing the RASTA speech enhancement method on a Motorola DSP and reading papers from Dan Ellis, Paris Smaragdis and Malcolm Slaney.

Method description:
The method consists of 6 steps:
1. Transforming the 1D speech signal into a 2D spectrogram (ie. time-frequency-energy representation).


2. Continuous evaluation of the signal-to-noise ratio (SNR) and Voice Activity (VAD) in every FFT frequency band.
3. Creating the islands-like continuous binary mask based on the common VAD onsets and offsets of several frequency bands.

image024
4. Smoothing of the binary mask by 2D filtration of the island edges.

image026
5. Weighting the original spectrogram by the created mask representing (hopefully) the speech formants in Time-Frequency plane.
6. Re-synthesis of the spectrogram to 1D speech signal by using IFFT and OLA.

image042

Credits:
Martin Plsek, BUT for implementing the band-based VAD in yr.2000.

References:
This method was first published in 2000 at the TSP conference under the title “Spectrogram Mapping Method (SMM)“. Since then, Time-frequency masking, or Binary masking are the accepted and widely used terms in the literature. Ideal Ratio Masks represent a special case of the method.

[1] Képesi, M. “Spectrogram Mapping Method (SMM)“, TSP, 2000, Brno, Czech rep.
[2] Képesi, M. – Plšek, M.: “One-Channel Speech Separation Using Spectrogram Modifications,” Proc. of the Czech-German Workshop on speech processing, Prague, September 2001, pp.75-7x, ISBN 8086269078.
[3] Képesi, M. Macku, J.: “One-Channel Speech Separation Techniques,” In Proc. of Telecommunications and Signal Processing 2000, Brno, 6-7. 9.2000, ISBN:8072041614, pp.130-133.

Related Links:
DESCRIPTION and audio examples

Leave a comment