Abstract:
A device and a method for detecting a speech endpoint using a weighted finite state transducer (WFST) are provided. The present invention includes: a speech decision unit for receiving a frame unit characteristic vector which is converted from a speech signal, and analyzing the received characteristic vector and classifying the vector by speech class and noise class; a frame level WFST for receiving the classified speech class and noise class and converting the classes into WFST type data; a speech level WFST for analyzing the relation among the classified speech class, noise class and predetermined status, and detecting a speech endpoint; a WFST combination unit for combining the frame level WFST and speech level WFST; and an optimization unit for optimizing the combined WFST, which the frame level WFST and speech level WFST are combined, to be the minimum.
Abstract:
A method for improving a voice recognition performance according to an embodiment of the present invention is provided to improve a voice recognition performance for a voice inputted under noise circumstances based on at least a single voice recognition feature vector. The method for improving a voice recognition performance according to an embodiment of the present invention includes the steps of: extracting at least two or more feature vectors according to at least two or more sound models which are set by each phoneme for an inputted Korean voice; extracting an observation probability value by each phoneme through the previous feature vectors of at least two or more sound models and at least two or more feature vectors which are preset for an integrated sound model activated in a viterbi decoder; and resetting the integrated sound model based on the extracted observation probability value by each phoneme, and re-recognizing the Korean voice by each phoneme.
Abstract:
The present invention relates to the improvements of performance of voice endpoints detection and features vector extractor used in a voice recognition system. According to the present invention, a method to detect an audio signal comprises the steps of detecting voice segments by performing the endpoints detection of a frame unit with respect to an input signal; extracting a feature value of the signal in at least a partial segment corresponding to a plurality of windows among the detected voice segments; and comparing the extracted feature value with a predetermined threshold to detect the actual voice segment of the voice segments. If a user uses the method provided in the present invention, the method can improve the performance of normalizing the feature vector used in the voice recognition system and perform to improve voice recognition performance in a noisy environment.
Abstract:
Disclosed is a method for improving automatic voice recognition performance using an intra frame feature. According to the present invention, the method for improving automatic voice recognition performance using an intra frame feature includes: a step of collecting speech signals and preprocessing the collected speech signals by boosting or attenuating the signals; a step of dividing the preprocessed speech signals by threshold band using a gamma-tone filter bank and channelizing signals in each threshold band; a step of frame-blocking the channelized speech signals with a frame shift size of 10 ms and a frame size of 20 - 25 ms; a step of hamming-windowing each blocked channel and extracting a predefined amount of data from the predefined section; a step of estimating signal intensity from the extracted data based on time-frequency and estimating energy based on the estimated signal intensity; a step of getting Cepstral coefficients and derivatives through logarithmic operation and discrete cosine transform for the estimated energy; a step of performing sub-frame analysis for the preprocessed speech signals and extracting intra frame features from the sub-frame analyzed speech signals; and a step of getting voice recognition features by combining the Cepstral coefficients, the derivatives, and the intra frame features.
Abstract:
PURPOSE: A utterance verification based mass voice data automatic processing device and a method thereof are provided to utilize a voice model in voice modeling data collection and error data verification by automatically classifying mass voice data through a voice recognition system and generating a voice model using the classified voice data. CONSTITUTION: An utterance verification unit(160) classifies each mass voice data into normally recognized data, abnormally recognized data and data using a feature of a voice extracted from an extraction unit(140), context-dependent adaptive model and context-independent adaptive anti-phoneme model. An acoustic modeling unit(180) classifies the mass voice data and generates an acoustic model based on the classified acoustic modeling data. [Reference numerals] (120) Saving unit; (140) Extraction unit; (160) Utterance verification unit; (180) Acoustic modeling unit
Abstract:
PURPOSE: A pronunciation evaluation device and a method are provided to evaluate foreign language pronunciations using an acoustic model of a foreign language learner, pronunciations generated using a pronunciation model in which pronunciation errors are reflected, and an acoustic model of a native speaker, thereby increasing the accuracy of the pronunciation generated for the sound of the foreign language learner. CONSTITUTION: A pronunciation evaluation device(100) includes a sound input part(110), a sentence input part(120), a storage part(130), a pronunciation generation part(140), a pronunciation evaluation part(150), and an output part(160). The sound input part receives the sound of a foreign language learner, and the sentence input part receives a sentence corresponding to the sound of the foreign language learner. The storage part stores an acoustic model for the sound of the foreign language learner and a pronunciation dictionary for the sound of the foreign language learner. The pronunciation generation part performs sound recognition based on the acoustic model and pronunciation dictionary for the sound of the foreign language learner stored in the storage part. The pronunciation evaluation part detects the vocalization errors by analyzing the pronunciations for the sound of the foreign language learner. The output part outputs the vocalization errors of the foreign language learner detected from the pronunciation evaluation part. [Reference numerals] (110) Sound input part; (120) Sentence input part; (130) Storage part; (140) Pronunciation generation part; (150) Pronunciation evaluation part; (160) Output part
Abstract:
PURPOSE: A Corpus-based language model discrimination learning method and a device thereof are provided to easily build and use a learning database corresponding to a target domain by building a discrimination learning training corpus database with a text corpus. CONSTITUTION: A language model discrimination learning database extracts a voice feature vector from a corpus database to be built(S302). Continuous speech voice recognition is performed by receiving the voice feature vector(S303). The language model discrimination learning is performed by using a score sentence score and a voice recognition result outputted through continuous speech voice recognition performance(S304). A discrimination language model is generated(S305). [Reference numerals] (AA) Start; (BB) End; (S301) Build a DB for language model discrimination learning; (S302) Extract a voice feature vector; (S303) Recognize voice of continuous speech; (S304) Perform the language model discrimination learning; (S305) Generate a discriminative language model
Abstract:
PURPOSE: A confusion network rescoring device for Korean continuous voice recognition, a method for generating a confusion network by using the same, and a rescoring method thereof are provided to improve a generation speed of the confusion network by setting a limit of a lattice link probability in a process for converting a lattice structure into a confusion network structure. CONSTITUTION: A confusion network rescoring device receives on or more lattices generated through voice recognition(S105). The device calculates each posterior probability of the lattices(S110). The device allocates a node included in the lattices to plural equivalence classes based on the posterior probability(S120,S130,S135). The device generates a confusion set by using the equivalence classes(S150,S155). The device generates a confusion network based on the confusion set. [Reference numerals] (AA) Start; (BB,DD,FF,HH,JJ) No; (CC,EE,GG,II,KK) Yes; (LL) End; (S105) Inputting lattices through voice recognition; (S110) Calculating each posterior probability of the lattices; (S115) Inputting SLF?; (S120) Allocating a first node(no) of the lattices to a first equivalence class(NO); (S125) N_i and n_i links exist?; (S130) Allocating an i-th node(n_i) of the lattices to a j-th equivalence class(N_j); (S135) Allocating the i-th node(n_i) of the lattices to a i-th equivalence class(N_i); (S140) Allocating all nodes of the lattices?; (S145) If u∈N_s n_i∈N_t, t=s+1 in e(u->n_i); (S150) Classifying the e(u->n_i) as CS(N_s,N_t); (S155) Classifying the e(u->n_i) as CS(N_k,N_k+1); (S160) Normalizing link probability in an extracted CS sequence; (S165) Adding a Null link, and allocating remaining probability values of a normalized value; (S170) Possibility value of the Null link > possibility value of the other link; (S175) Excluding the CS sequence from a voice recognition result
Abstract:
PURPOSE: A voice recognition apparatus and a method thereof are provided to increase recognition speed of an input signal and to perform recognition of an input signal in parallel. CONSTITUTION: A global database unit(10) includes a global feature vector(12), a global vocabulary model(14), and a global sound model(16). A recognition unit(20) includes separated recognition units(22a~22n). A plurality of separate recognition units performs voice recognition in parallel. A separate database unit(30) includes separate language models. A collection and evaluation unit(40) collects and evaluates the recognition result of the separate recognition unit.
Abstract:
PURPOSE: A voice recognition system is provided to increase recognition performance of abnormal send and to reduce the recurrence of a user by recognition of abnormal send. CONSTITUTION: A determining unit(120) determines whether a speech of a user is segment speech. A first recognition unit(130) recognizes a voice of the user by using a phonemic probability model. A second recognition unit(140) recognizes the voice of the user according to a comparison result of a voice signal and a previously learned learning probability model.