Abstract:
Techniques are described to reduce rounding errors during computation of discrete cosine transform using fixed-point calculations. According to these techniques, a discrete cosine transform a matrix of scaled coefficients is calculated by multiplying coefficients in a matrix of coefficients by scale factors. Next, a midpoint bias value and a supplemental bias value are added to a DC coefficient of the matrix of scaled coefficients. Next, an inverse discrete cosine transform is applied to the resulting matrix of scaled coefficients. Values in the resulting matrix are then right-shifted in order to derive a matrix of pixel component values. As described herein, the addition of the supplemental bias value to the DC coefficient reduces rounding errors attributable to this right-shifting. As a result, a final version of a digital media file decompressed using these techniques may more closely resemble an original version of a digital media file.
Abstract:
Techniques for efficiently performing computation for signal and data processing are described. For multiplication-free processing, a series of intermediate values is generated based on an input value for data to be processed. At least one intermediate value in the series is generated based on at least one other intermediate value in the series. One intermediate value in the series is provided as an output value for a multiplication of the input value with a constant value. The constant value may be an integer constant, a rational constant, or an irrational constant. An irrational constant may be approximated with a rational dyadic constant having an integer numerator and a denominator that is a power of twos. The multiplication-free processing may be used for various transforms (e.g., DCT and IDCT), filters, and other types of signal and data processing.
Abstract:
Methods and apparatus are described for transmitting information units over a plurality of constant bit rate communication channel. The techniques include encoding the information units, thereby creating a plurality of data packets. The encoding is constrained such that the data packet sizes match physical layer packet sizes of the communication channel. The information units may include a variable bit rate data stream, multimedia data, video data, and audio data. The communication channels include CMDA channels, WCDMA, GSM channels, GPRS channels, and EDGE channels.
Abstract:
A voice recognition (VR) system is disclosed that utilizes a combination of speaker independent (SI) (230 and 232) and speaker dependent (SD) (234) acoustic models. At least one SI acoustic model (230 and 232) is used in combination with at least one SD acoustic model (234) to provide a level of speech recognition performance that at least equals that of a purely SI acoustic model. The disclosed hybrid SI/SD VR system continually uses unsupervised training to update the acoustic templates in the one ore more SD acoustic models (234). The hybrid VR system then uses the updated SD acoustic models (234) in combination with the at least one SI acoustic model (230 and 232) to provide improved VR performance during VR testing.
Abstract:
A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit (102) and a server VR engine in a server (160). The local VR engine comprises a feature extraction (FE) module (104) that extracts features from a speech signal, and a voice activity detection module (VAD) (106) that detects voice activity within a speech signal. The voice activity signal and the features are downsampled before they are transmitted from the local engine to the server engine. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit (104) to the server (160). The indication of detected voice activity is transmitted ahead of the extracted features in order to avoid long recognition delays. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perception (MLP) and providing the same to the speech server (160).