-
1.
公开(公告)号:JP2005031632A
公开(公告)日:2005-02-03
申请号:JP2004101094
申请日:2004-03-30
Applicant: ATR ADVANCED TELECOMM RES INST
Inventor: FRANK K SOONG , NAKAMURA SATORU , ASHIKARI YUTAKA , ITO GEN
Abstract: PROBLEM TO BE SOLVED: To provide an utterance section detecting device capable of properly detecting an utterance section without reference to environmental noise. SOLUTION: The utterance section detecting device includes a speech input part 104 which generates speech data in frames, a frame buffer 110 which stores the energy value of the frame-constituted speech in an FIFO basis, an initial environmental noise calculation part 112 which processes energy values of frames in the frame buffer 110 in a specified statistical method to calculate an initial value of an estimated value of environmental noise, a dynamic threshold calculation part 116 which calculates thresholds of energy values for detecting an utterance section by frames according to the energy values stored in the frame buffer 110 to vary following up the environmental noise included in speech data, and a state decision part 118 which decides the states of the frames according to the threshold. COPYRIGHT: (C)2005,JPO&NCIPI
-
公开(公告)号:JP2003177781A
公开(公告)日:2003-06-27
申请号:JP2001378546
申请日:2001-12-12
Applicant: ATR ADVANCED TELECOMM RES INST
Inventor: IDA MASAKI , NAKAMURA SATORU
Abstract: PROBLEM TO BE SOLVED: To provide an acoustic model which is not restricted by a condition that the SN ratio of an input voice is known. SOLUTION: A Gaussian mixture model generation part 11 generates a multiple mixed Gaussian mixture model in one state on the basis of the waveform signal data of a plurality of kinds of environmental noise for learning which are stored in a database 21, so that an output likelihood can be maximum, and an HMM synthesizing part 13 generates a plurality of adapted HMMs which include mixture Gaussian distributions in respective states, which are represented by the sum of linear coupling of Gaussian distributions weighted by a prescribed weight coefficient in all combined states of respective states and correspond to a plurality of SNs between the noiseless voice HMM and the generated noise Gaussian mixture models, in accordance with a prescribed noiseless voice HMM and the generated noise Gaussian mixture models, and the plurality of generated adapted HMMs are juxtaposed to generate an acoustic model in a multipath form. A voice recognition part 4 uses the adapted acoustic model to perform voice recognition of an uttered voice signal on the basis of an extracted feature quantity. COPYRIGHT: (C)2003,JPO
-
3.
公开(公告)号:JP2003169399A
公开(公告)日:2003-06-13
申请号:JP2001366148
申请日:2001-11-30
Applicant: ATR ADVANCED TELECOMM RES INST
Inventor: SHIMADA MASAHARU , HOKARI HARUHIDE , MAYAHARA TATSUHIRO , MIZUMACHI MITSUNORI , NAKAMURA SATORU
IPC: H04S1/00
Abstract: PROBLEM TO BE SOLVED: To provide a stereophonic sound image controller in which expansion/ shrinkage and rotation of sound image range can be operated by digital signal processing substantially in real time and deterioration of sound quality is substantially eliminated. SOLUTION: The stereophonic sound image controller comprises means for calculating the main value of the phase between 2 channel signals obtained by a time-frequency converting means, means for expanding/shrinking the sound image generating range based on the main value of the phase thus calculated and a preset expansion/shrinkage rate, means for moving a sound image based on the output from the sound image expanding/shrinking means and a preset rotational angle of movement, and a frequency-time converting means for converting the signal of each channel outputted from the sound image moving means into the signal of time axis. COPYRIGHT: (C)2003,JPO
-
-