Abstract:
본발명은음성인식시스템에서다양한발성속도를갖는자연어인식성능을향상시킬수 있도록한 하모닉성분재설정검출기반의발성속도결정장치및 그방법에관한것으로서, 모음의강한하모닉성분의존재로발생하는하모닉성분재설정을활용함으로써발성속도의차이에따른자연어인식기의성능저하를줄일수 있고, 음절경계를추정함으로써장음화현상을검출하고이것을자연어인식기의성능개선에활용될수 있으며, 또한, 주파수영역에서하모닉성분을추정하는방법은피치의이득을구하는방법보다정교하기때문에정확한발성속도를얻고음성인식성능이향상되도록하는것이다.
Abstract:
A device for controlling a mobile terminal according to the present invention comprises: a conversation recognizing unit to recognize a conversation among users by mobile terminals; a user intention identifying unit to identify an intention of at least one user among the users based on the recognized result; and an additional function control unit to execute an additional function corresponding to the user intention identified in the mobile terminal of the user. According to the present invention, the device can recognize the conversation among the users to directly provide the information associated with the conversation or to provide a service, thereby improving a communication among the users.
Abstract:
Disclosed is a method for improving automatic voice recognition performance using an intra frame feature. According to the present invention, the method for improving automatic voice recognition performance using an intra frame feature includes: a step of collecting speech signals and preprocessing the collected speech signals by boosting or attenuating the signals; a step of dividing the preprocessed speech signals by threshold band using a gamma-tone filter bank and channelizing signals in each threshold band; a step of frame-blocking the channelized speech signals with a frame shift size of 10 ms and a frame size of 20 - 25 ms; a step of hamming-windowing each blocked channel and extracting a predefined amount of data from the predefined section; a step of estimating signal intensity from the extracted data based on time-frequency and estimating energy based on the estimated signal intensity; a step of getting Cepstral coefficients and derivatives through logarithmic operation and discrete cosine transform for the estimated energy; a step of performing sub-frame analysis for the preprocessed speech signals and extracting intra frame features from the sub-frame analyzed speech signals; and a step of getting voice recognition features by combining the Cepstral coefficients, the derivatives, and the intra frame features.
Abstract:
PURPOSE: A pronunciation evaluation device and a method are provided to evaluate foreign language pronunciations using an acoustic model of a foreign language learner, pronunciations generated using a pronunciation model in which pronunciation errors are reflected, and an acoustic model of a native speaker, thereby increasing the accuracy of the pronunciation generated for the sound of the foreign language learner. CONSTITUTION: A pronunciation evaluation device(100) includes a sound input part(110), a sentence input part(120), a storage part(130), a pronunciation generation part(140), a pronunciation evaluation part(150), and an output part(160). The sound input part receives the sound of a foreign language learner, and the sentence input part receives a sentence corresponding to the sound of the foreign language learner. The storage part stores an acoustic model for the sound of the foreign language learner and a pronunciation dictionary for the sound of the foreign language learner. The pronunciation generation part performs sound recognition based on the acoustic model and pronunciation dictionary for the sound of the foreign language learner stored in the storage part. The pronunciation evaluation part detects the vocalization errors by analyzing the pronunciations for the sound of the foreign language learner. The output part outputs the vocalization errors of the foreign language learner detected from the pronunciation evaluation part. [Reference numerals] (110) Sound input part; (120) Sentence input part; (130) Storage part; (140) Pronunciation generation part; (150) Pronunciation evaluation part; (160) Output part
Abstract:
PURPOSE: A Corpus-based language model discrimination learning method and a device thereof are provided to easily build and use a learning database corresponding to a target domain by building a discrimination learning training corpus database with a text corpus. CONSTITUTION: A language model discrimination learning database extracts a voice feature vector from a corpus database to be built(S302). Continuous speech voice recognition is performed by receiving the voice feature vector(S303). The language model discrimination learning is performed by using a score sentence score and a voice recognition result outputted through continuous speech voice recognition performance(S304). A discrimination language model is generated(S305). [Reference numerals] (AA) Start; (BB) End; (S301) Build a DB for language model discrimination learning; (S302) Extract a voice feature vector; (S303) Recognize voice of continuous speech; (S304) Perform the language model discrimination learning; (S305) Generate a discriminative language model
Abstract:
PURPOSE: A confusion network rescoring device for Korean continuous voice recognition, a method for generating a confusion network by using the same, and a rescoring method thereof are provided to improve a generation speed of the confusion network by setting a limit of a lattice link probability in a process for converting a lattice structure into a confusion network structure. CONSTITUTION: A confusion network rescoring device receives on or more lattices generated through voice recognition(S105). The device calculates each posterior probability of the lattices(S110). The device allocates a node included in the lattices to plural equivalence classes based on the posterior probability(S120,S130,S135). The device generates a confusion set by using the equivalence classes(S150,S155). The device generates a confusion network based on the confusion set. [Reference numerals] (AA) Start; (BB,DD,FF,HH,JJ) No; (CC,EE,GG,II,KK) Yes; (LL) End; (S105) Inputting lattices through voice recognition; (S110) Calculating each posterior probability of the lattices; (S115) Inputting SLF?; (S120) Allocating a first node(no) of the lattices to a first equivalence class(NO); (S125) N_i and n_i links exist?; (S130) Allocating an i-th node(n_i) of the lattices to a j-th equivalence class(N_j); (S135) Allocating the i-th node(n_i) of the lattices to a i-th equivalence class(N_i); (S140) Allocating all nodes of the lattices?; (S145) If u∈N_s n_i∈N_t, t=s+1 in e(u->n_i); (S150) Classifying the e(u->n_i) as CS(N_s,N_t); (S155) Classifying the e(u->n_i) as CS(N_k,N_k+1); (S160) Normalizing link probability in an extracted CS sequence; (S165) Adding a Null link, and allocating remaining probability values of a normalized value; (S170) Possibility value of the Null link > possibility value of the other link; (S175) Excluding the CS sequence from a voice recognition result
Abstract:
PURPOSE: A voice recognizing method and a device thereof are provided to obtain features about both a short section and a long section of a voice wherein features of time are reflected to the long section. CONSTITUTION: A segment dividing unit(531) partitions a voice signal into segment sections. A temporal length of a segment section is longer than a temporal length of a frame section. A segment feature extracting unit(532) extracts a segment voice feature vector around a partition boundary portion of the segment section. A segment voice recognizing unit(533) recognizes a voice using the segment voice feature vector and a segment based probability model. A combination synchronizing unit(540) combines a voice recognizing result of a frame based voice recognizing unit with a voice recognizing result of the segment voice recognizing unit.