-
1.
公开(公告)号:US09460733B2
公开(公告)日:2016-10-04
申请号:US14301870
申请日:2014-06-11
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Nam In Park
IPC: G10L19/00 , G10L21/038 , G10L19/02
CPC classification number: G10L21/038 , G10L19/0212
Abstract: Disclosed is an apparatus for extending a bandwidth of a sound signal. The apparatus includes a database that stores predetermined training information as a result of at least one of Gaussian mixture model (GMM) training and hidden Markov model (HMM) training; a modified discrete cosine transform (MDCT) transformer that transforms a first band signal through MDCT, a feature extractor that extracts a feature parameter of the first band signal from an MDCT coefficient output from the MDCT transformer; an extender that provides an extended MDCT coefficient for a second band signal based on the MDCT coefficient of the first band signal output from the MDCT transformer, a subband energy estimator that estimates subband energy of the second band signal with reference to information stored in the database based on the feature parameter.
Abstract translation: 公开了一种用于扩展声音信号的带宽的装置。 该装置包括作为高斯混合模型(GMM)训练和隐马尔可夫模型(HMM)训练中的至少一种的结果存储预定训练信息的数据库; 通过MDCT变换第一频带信号的改进的离散余弦变换(MDCT)变换器,从MDCT变压器输出的MDCT系数中提取第一频带信号的特征参数的特征提取器; 参考存储在数据库中的信息,基于从MDCT变换器输出的第一频带信号的MDCT系数提供第二频带信号的扩展MDCT系数的扩展器,子带能量估计器,其估计第二频带信号的子带能量 基于特征参数。
-
公开(公告)号:US09288602B2
公开(公告)日:2016-03-15
申请号:US14301830
申请日:2014-06-11
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Nam In Park
CPC classification number: H04S5/00
Abstract: Disclosed herein are a stereo extension apparatus and method. The apparatus includes a database that stores predetermined information as a result of Gaussian mixture model (GMM) training or hidden Markov model (HMM) training; a modified discrete cosine transform (MDCT) transformer that transforms a mono signal through MDCT, a feature parameter extractor that extracts a feature parameter of the mono signal from an MDCT coefficient output from the MDCT transformer, a side signal energy estimator that estimates subband energy of a side signal with reference to information stored in the database based on the feature parameter; an energy controller that obtains the MDCT coefficient of a side signal estimated from the subband energy of the estimated side signal, an inverse MDCT transformer that obtains an estimated side signal by transforming the MDCT coefficient of the estimated side signal through inverse MDCT.
Abstract translation: 这里公开了一种立体声扩展装置和方法。 该装置包括作为高斯混合模型(GMM)训练或隐马尔可夫模型(HMM)训练的结果存储预定信息的数据库; 通过MDCT变换单声道信号的修正离散余弦变换(MDCT)变换器,从MDCT变换器输出的MDCT系数中提取单声道信号的特征参数的特征参数提取器,估计子带能量的侧信号能量估计器 参考基于特征参数存储在数据库中的信息的侧信号; 获得从所估计的侧信号的子带能量估计的侧信号的MDCT系数的能量控制器,通过逆MDCT变换估计侧信号的MDCT系数而获得估计侧信号的逆MDCT变换器。
-
公开(公告)号:US11754748B2
公开(公告)日:2023-09-12
申请号:US17371763
申请日:2021-07-09
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Seong Yeop Jeong , In Young Park
Abstract: A temperature prediction system may include a data input module configured to receive data related to climate, a prediction module having installed therein a trained model for predicting a temperature based on input data from the data input module, and an output module configured to output temperature information predicted by the prediction module.
-
公开(公告)号:US09877129B2
公开(公告)日:2018-01-23
申请号:US14435720
申请日:2013-10-04
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Chan Jun Chun
CPC classification number: H04S1/002 , G06F3/0412 , G10L13/02 , H04R5/04
Abstract: The present invention extracts azimuth information on a sound source, read the touch state of a touch screen on which an image is displayed, and enables a sound source having azimuth information corresponding to a place touched on the image to be synthesized so as to be distinguished from other sound sources. According to the present invention, since it is possible to listen to the distinguished sound of a desired location on an image, a user may be provided with more satisfaction.
-
公开(公告)号:US20220146707A1
公开(公告)日:2022-05-12
申请号:US17371763
申请日:2021-07-09
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Seong Yeop Jeong , In Young Park
Abstract: A temperature prediction system may include a data input module configured to receive data related to climate, a prediction module having installed therein a trained model for predicting a temperature based on input data from the data input module, and an output module configured to output temperature information predicted by the prediction module.
-
公开(公告)号:US20150271618A1
公开(公告)日:2015-09-24
申请号:US14435720
申请日:2013-10-04
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Chan Jun Chun
CPC classification number: H04S1/002 , G06F3/0412 , G10L13/02 , H04R5/04
Abstract: The present invention extracts azimuth information on a sound source, read the touch state of a touch screen on which an image is displayed, and enables a sound source having azimuth information corresponding to a place touched on the image to be synthesized so as to be distinguished from other sound sources. According to the present invention, since it is possible to listen to the distinguished sound of a desired location on an image, a user may be provided with more satisfaction.
Abstract translation: 本发明在声源上提取方位信息,读取显示图像的触摸屏的触摸状态,并且能够使具有对应于要合成的图像上所触摸的地方的方位信息的声源进行区分 来自其他声源。 根据本发明,由于可以收听图像上的期望位置的区别声音,因此可以提供用户更多的满足感。
-
公开(公告)号:US09866984B2
公开(公告)日:2018-01-09
申请号:US15355053
申请日:2016-11-18
Applicant: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
Inventor: Hong Kook Kim , Su Yeon Park , Chan Jun Chun
CPC classification number: H04S5/005 , G10L19/008 , G10L19/0212 , G10L19/16 , G10L25/30
Abstract: A method includes extracting a difference value through extraction of features of a front audio channel signal and a surround channel of multichannel sound content by setting the front audio channel signal and the surround channel as input and output channel signals, respectively, training a deep neural network (DNN) model by setting the input channel signal and the difference value as an input and an output of the DNN model, respectively, normalizing a frequency-domain signal of the input channel signal by converting the input channel signal into the frequency-domain signal, and extracting estimated difference values by decoding the normalized frequency-domain signal through the DNN model, deriving an estimated spectral amplitude of the surround channel based on the front audio channel signal and the difference value, and deriving an audio signal of a final surround channel by converting the estimated spectral amplitude of the surround channel into the time domain.
-
8.
公开(公告)号:US08909539B2
公开(公告)日:2014-12-09
申请号:US13708346
申请日:2012-12-07
Applicant: Gwangju Institute of Science and Technology
Inventor: Hong Kook Kim , Nam In Park
IPC: G10L21/00
CPC classification number: G10L19/02 , G10L21/0388 , G10L25/93
Abstract: A method for extending a bandwidth of a speech signal received, according to an embodiment of the present invention, includes: transforming the received speech signal into a frequency domain by decoding the received speech signal; normalizing the transformed speech signal; differentiating a voiced sound period or unvoiced sound period from the received speech signal; extracting, from the normalized speech signal, a first period including a harmonic component of the voiced sound period on the basis of the voiced sound period; extracting, from the normalized speech signal, a second period on the basis of correlation between the unvoiced sound period and the normalized speech signal; generating a high-band speech signal on the basis of the first period and the second period; and synthesizing the generated high-band speech signal and the transformed speech signal to output a wideband speech signal.
Abstract translation: 根据本发明实施例的用于扩展接收到的语音信号的带宽的方法包括:通过解码接收到的语音信号将接收到的语音信号变换成频域; 归一化变换后的语音信号; 从接收到的语音信号中区分浊音周期或清音期间; 从归一化语音信号中提取基于有声声音周期的包括有声声音周期的谐波分量的第一周期; 基于无声声音周期和标准化语音信号之间的相关性,从归一化语音信号中提取第二周期; 基于第一周期和第二周期生成高频带语音信号; 以及合成所生成的高频带语音信号和变换的语音信号以输出宽带语音信号。
-
公开(公告)号:US12300251B2
公开(公告)日:2025-05-13
申请号:US18070499
申请日:2022-11-29
Applicant: Gwangju Institute of Science and Technology
Inventor: Dong Keon Park , Hong Kook Kim , Ye Chan Yu
IPC: G10L17/18 , G10L21/0272 , G10L21/0308 , G10L25/18
Abstract: The present invention relates to a speaker diarization technology, and more specifically to, end-to-end speaker diarization system and method through transformer learning having an auxiliary loss-based residual connection to separate speakers by dividing the speakers for time interval, wherein the end-to-end speaker diarization system and method using an auxiliary loss can differentiate and separate speakers through speaker labeling based on the transformer learning using an auxiliary loss even if speaker speeches overlap in a multi-speaker environment.
-
-
-
-
-
-
-
-