Volume 4, Issue 2, October 2013, Pages 353–358
Mohammed Anwer1 and Rezwan-Al-Islam Khan2
1 School of Engineering and Computer Science, Independent University, Dhaka, Bangladesh
2 School of Engineering and Computer Science, Independent University, Dhaka, Bangladesh
Original language: English
Copyright © 2013 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In present day business and consumer environment, a robust voice identification system is needed to reduce false positives, and true negatives. In this work, a modified voice identification system is described using over sampled Haar wavelets followed by proper orthogonal decomposition. The audio signal is decomposed using over sampled Haar wavelets. This converts the audio signal into various non-correlating frequency bands. This allows us to calculate the linear predictive cepstral coefficient to capture the characteristics of individual speakers. Adaptive threshold was applied to reduce noise interference. This is followed by multi-layered vector quantization technique to eliminate the interference between multi-band coefficients. Finally, proper orthogonal decomposition is used to evaluate unique characteristics for capturing more details of phoneme characters. The proposed algorithm was used on KING and MAT-400 databases. These databases were chosen as previous extraction results were available for them. In the present study, the KING database were trained with three sentences, and tested with two. On the other hand, the MAT-400 database were trained with two seconds of random voice signal, and tested with other two seconds. Results were compared with vector quantization and Gaussian mixture models. The present model gave consistently better performance on speech collected through mouthpieces, but gave comparatively poor performance on audio collected on telephones. The better performance is obtained at the cost of higher computational time.
Author Keywords: Voice identification, Haar wavelet, Proper Orthogonal Decomposition, Signal Processing, Modeling.
Mohammed Anwer1 and Rezwan-Al-Islam Khan2
1 School of Engineering and Computer Science, Independent University, Dhaka, Bangladesh
2 School of Engineering and Computer Science, Independent University, Dhaka, Bangladesh
Original language: English
Copyright © 2013 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In present day business and consumer environment, a robust voice identification system is needed to reduce false positives, and true negatives. In this work, a modified voice identification system is described using over sampled Haar wavelets followed by proper orthogonal decomposition. The audio signal is decomposed using over sampled Haar wavelets. This converts the audio signal into various non-correlating frequency bands. This allows us to calculate the linear predictive cepstral coefficient to capture the characteristics of individual speakers. Adaptive threshold was applied to reduce noise interference. This is followed by multi-layered vector quantization technique to eliminate the interference between multi-band coefficients. Finally, proper orthogonal decomposition is used to evaluate unique characteristics for capturing more details of phoneme characters. The proposed algorithm was used on KING and MAT-400 databases. These databases were chosen as previous extraction results were available for them. In the present study, the KING database were trained with three sentences, and tested with two. On the other hand, the MAT-400 database were trained with two seconds of random voice signal, and tested with other two seconds. Results were compared with vector quantization and Gaussian mixture models. The present model gave consistently better performance on speech collected through mouthpieces, but gave comparatively poor performance on audio collected on telephones. The better performance is obtained at the cost of higher computational time.
Author Keywords: Voice identification, Haar wavelet, Proper Orthogonal Decomposition, Signal Processing, Modeling.
How to Cite this Article
Mohammed Anwer and Rezwan-Al-Islam Khan, “Voice identification Using a Composite Haar Wavelets and Proper Orthogonal Decomposition,” International Journal of Innovation and Applied Studies, vol. 4, no. 2, pp. 353–358, October 2013.