Accomplishments
Voice conversion system using salient sub-bands and radial basis function
- Abstract
The objective of voice conversion is to replace the speaker-dependent characteristics of the source speaker so that it is perceptually similar to that of the target speaker. The speaker-dependent spectral parameters are characterized using single-scale interpolation techniques such as linear predictive coefficients, formant frequencies, mel cepstrum envelope and line spectral frequencies. These features provide a good approximation of the vocal tract, but produce artifacts at the frame boundaries which result in inaccurate parameter estimation and distortion in re-synthesis of the speech signal. This paper presents a novel approach of voice conversion based on multi-scale wavelet packet transform in the framework of radial basis neural network. The basic idea is to split the signal acoustic space into different salient frequency sub-bands, which are finely tuned to capture the speaker identity, conveyed by the speech signal. Characteristics of different wavelet filters are studied to determine the best filter for the proposed voice conversion system. A relative performance of the proposed algorithm is compared with the state-of-the-art wavelet-based voice morphing using various subjective and objective measures. The results reveal that the proposed algorithm performs better than the conventional wavelet-based voice morphing.