In present day business and consumer environment, a robust voice identification system is needed to reduce false positives, and true negatives. In this work, a modified voice identification system is described using over sampled Haar wavelets followed by proper orthogonal decomposition. The audio signal is decomposed using over sampled Haar wavelets. This converts the audio signal into various non-correlating frequency bands. This allows us to calculate the linear predictive cepstral coefficient to capture the characteristics of individual speakers. Adaptive threshold was applied to reduce noise interference. This is followed by multi-layered vector quantization technique to eliminate the interference between multi-band coefficients. Finally, proper orthogonal decomposition is used to evaluate unique characteristics for capturing more details of phoneme characters. The proposed algorithm was used on KING and MAT-400 databases. These databases were chosen as previous extraction results were available for them. In the present study, the KING database were trained with three sentences, and tested with two. On the other hand, the MAT-400 database were trained with two seconds of random voice signal, and tested with other two seconds. Results were compared with vector quantization and Gaussian mixture models. The present model gave consistently better performance on speech collected through mouthpieces, but gave comparatively poor performance on audio collected on telephones. The better performance is obtained at the cost of higher computational time.