Problem:
Need for a powerful but straightforward pitch estimation method.
The idea:
reordering the information represented by the frequency bins of a spectrogram (FFT, FChT or ChT) into an FoGram.
Auditory Perceptual Integration:
The main idea is to scan through all possible pitch candidates and assign, to every frequency index Fo, the sum of the energy values at Fo, 2Fo, … , iFo. In equation it looks like this:
fo … pitch candidates (usually between 80 and 380Hz),
nH … number of harmonics considered for gathering,
S() … Spectral sample at i x fo.
An example of a such Fo-gram derived from a HChT spectrogram is shown below (courtesy Cancela et al.):
After zooming the image we see that the FoGram provides extremely high frequency resolution (below 1Hz!), far-far above the frequency resolution of the spectral representation it is derived from (usually 10-30Hz/freq. bin).
References:
[1] M. Képesi, L. Weruaga, E. Schofield, “Detailed Multidimensional Analysis of our Acoustical Environment,” Forum Acusticum. Budapest (Hu), September 2005, pp. 2649-2654.
[2] M. Képesi and L. Weruaga, “High-resolution noise-robust spectral-based pitch estimation,” Interspeech 2005, pp. 313-316, Lisboa (P), Sep. 2005
Related Work:
[5] Pei Zhao, Zhiping Zhang, Xihong Wu: Monaural speech separation based on multi-scale Fan-Chirp Transform, ICASSP 2008. March 31 2008, Page(s): 161 – 164


January 3, 2020 at 1:23 pm |
[…] Database of further audio samples. Octave Code. Theory. […]