Abstract:
PURPOSE: A remote controller, a method and an apparatus for controlling an input interface are provided to enable a user to conveniently input a Hangul, English, number and symbol character through a keypad. CONSTITUTION: An input keypad(1100) combines two keys among a number key, an asteroid key, a sharp key, a directional key and a special character key. The input keypad selects one of input among the Hangul, English and number characters and symbol, and a control unit(1200) recognizes a key operation through the input keypad. The control unit process a key signal corresponding to the recognized key operation, and a wireless transmission unit(1400) transmits the key signal processed in the control unit.
Abstract:
PURPOSE: A rejection apparatus and a method of a garbage and anti-word model base in voice recognition are provided to effectively reject various operating noise or an unenrolled word by implementing a rejection process about a recognized word. CONSTITUTION: An extracting unit(104) extracts a feature vector from a voice signal. A searcher(110) gives a score through a pattern matching about the feature vector and outputs a recognition result. A rejection network generator(114) generates 'the rejection network for a rejection evaluation' through the recognition result. A rejection searcher(124) outputs a recognition score of 'word model comprising the rejection network' based on a garbage sound model. A decision logic unit(128) decides the rejection about the recognized word comparing with the recognition scores.
Abstract:
PURPOSE: A home network service method using a ubiquitous intelligent robot for offering a service for a location of a user and a robot for the coordinate information are provided to no need to use a remote controller by supplying robot performing voice input through a location sensor. CONSTITUTION: User interface information is inputted through a ubiquitous intelligent robot. The inputted user interface information is transmitted to the ubiquitous intelligent robot server(S300, S302). The ubiquitous intelligent robot server refers to the multimedia device having the multimedia information corresponding to the user interface information from a home network device group(S304). If the multimedia device is detected, the information search result user interface information is outputted through the ubiquitous intelligent robot.
Abstract:
PURPOSE: A multiple recognition candidate formation apparatus and a method thereof are provided, which can improve the efficiency of the voice recognition engine by reducing the usage amount of a memory unit and search time for creating the multiple recognition candidate. CONSTITUTION: A voice feature extractor(502) creates the feature vector through the voice recognition about the consecutive numbers voice. A search unit(504) creates the single recognition candidate string through the pattern recognition about the feature vector. The search unit outputs the likelihood point and feature vector about discrete numerical sound composed of the single recognition candidate string. A multiple recognition candidate generation part(508) creates the multiple recognition candidate by referring the order by numerical sound of the confidence measure generator(506) and the pre-set confusion matrix.
Abstract:
A voice recognition method is provided to model various textual language phenomenons into statistical modeling among various knowledge sources. A morpheme is interpreted for a primitive text language corpus consisting of the separate words of Korean(S201). A morpheme language corpus separated is a separate word generated to morpheme. A word trigram which is the language model consisting of a morpheme unigram about a generated morpheme language corpus as described above, and bigram and trigrams is generated(S202). A first N - best recognition candidate to the maximum N is generated for a voice(S204). Recognition result candidates applying a morph-syntactic constraints are revaluated(S205). A second N-best list generated in above step is revaluated(S206). A final N-best list is generated.
Abstract:
A method for distinguishing the voice by using the voiced sound features of the human voice and an apparatus therefor are provided to overcome a problem that the performance of the existing voice and non-voice determining techniques is degraded in the actual noise environment. An input signal sound-quality enhancing part(201) removes additional noise from a sound signal including a voice signal and noise signal to minimize the phenomenon that the sound quality of the input signal is degraded by the additional noise. A voiced sound feature detecting part(205) extracts voiced sound features based on the voice signal received from the sound-quality enhancing part. A voiced sound/unvoiced sound determining model part(207) stores the threshold value or critical value of the voiced sound features extracted from a pure voice model in which the noise is not included. A voiced sound/unvoiced sound determining unit(209) compares 11 voiced sound features extracted by the voiced sound feature detecting part with the stored threshold value or critical value.
Abstract:
A method for estimating priori speech absence probability based on a statistical model is provided to enable more accurate priori speech absence probability estimation by applying nonlinear property to the priori speech absence probability. Observation signal log energy is obtained(S110), and noise signal log energy is obtained(S120). A posteriori signal-noise ratio is obtained by using the observation signal log energy and the noise signal log energy(S130). A local and a global averages of the posteriori signal-noise ratio of a log scale are obtained(S140). A local and a global parameters are obtained by applying a sigmoid function and threshold value decision about the local and global averages(S145). A frame average of the posteriori signal-noise ratio of the log scale is obtained(S150). An average parameter is obtained by using the frame average of the posteriori signal-noise ratio of the log scale(S155). An instant speech absence probability is obtained by using the local parameter, the global parameter and the average parameter(S170). The priori speech absence probability is obtained by using the instant speech absence probability(S180).
Abstract:
PURPOSE: An additional noise removing apparatus using a human auditory model is provided to improve the performance of a speech recognition system by applying the human auditory model to an input audio signal at a pre-processing step for removing an additional noise. CONSTITUTION: An additional noise removing apparatus using a human auditory model includes a buffering an framing unit(10), a human auditory model application unit(100), a frequency spectrum estimator(40), an additional noise estimator(30), and an additional noise removing unit(50). The buffering and framing unit buffers an input audio signal and segments the audio signal into frames at a predetermined time interval. The human auditory model application unit applies the human auditory model to the input audio signal. The frequency spectrum estimator transforms the input audio signal into a frequency domain to generate a frequency spectrum composed of an amplitude component and a phase component. The additional noise estimator estimate spectrum information of a noise added to the audio signal using the frequency spectrum. The additional noise removing unit removes the additional noise estimated by the additional noise estimator from the frequency spectrum.
Abstract:
PURPOSE: A method for converting time shaft of a voice signal is provided to apply three level center clipping and a level crossing method to a synthesis voice signal and an analysis voice signal and then perform synchronization, thereby capable of reducing an amount of calculation by canceling a normalization portion and reducing a search period. CONSTITUTION: An analysis voice frame is initialized as a synthesis voice frame(S1). Thereafter, when all voice data are inputted(S2), a time shaft conversion method is finished. When all voice data are not inputted, a clipping level of the synthesis voice frame and the analysis voice frame is determined(S3). The synthesis voice frame and the analysis voice frame are divided into three levels by using the determined clipping level(S4). A level crossing point of the synthesis voice frame and the analysis voice frame is searched(S5). A synchronization point between the synthesis voice frame and the analysis voice frame is searched by using the analysis voice signal, the synthesis voice signal and the level crossing point processed through a three level center clipping process(S6). On the basis of the searched synchronization point, the synthesis voice signal and the analysis voice signal are rearranged. Two signals are superposed and added(S7).