nexiwave.com, a Speech Indexing Cloud Service company based in Boston MA, announces that it has completed the GPU-acceleration of its speech indexing service, Nexiwave 2.0. Without sacrificing accuracy of its service, nexiwave enjoys over 75% relative speed improvement (comparing a stock Sphinx4 running on a 2.5Ghz/8 core/24GB RAM server to a Sphinx4 on 2.5Ghz/Quad Core/4GB with NVIDIA GTX 470 GPU). Read the rest of this entry »
Nexiwave 2.0 GPU-accelerated Speech Indexing
June 3rd, 2010Harnessing Graphics Processors for the Fast Computation of Acoustic Likelihoods in Speech Recognition
February 10th, 2010Abstract:
In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.
(Paul R. Dixon, Tasuku Oonishi, Sadaoki Furui, “Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition”, Computer Speech & Language, Volume 23, Issue 4, October 2009, Pages 510-526, ISSN 0885-2308, DOI: 10.1016/j.csl.2009.03.005)
Universal employment of modern graphics hardware by the example of the optimization of a speech recognition system
May 24th, 2006In this masters thesis by Christian Fenzl (accomplished at the University of Applied Sciences in Darmstadt), an easy to use framework is implemented with additional demos to show the main concepts of gpgpu. Furthermore, a demo implementation is included which calculates scores on feature vectors used in a speech recognition system (about 12 times faster than an equivalent cpu implementation). An application with several demos using the framework including the fully documented source code (English) and the paper itself (German) is available. The framework code is recommended especially for gpgpu beginners to look into the OpenGL and DirectX code which shows how gpgpu programs can be developed.