METHOD AND DEVICE FOR SIMULTANEOUS VOICE RECOGNITION, SPEAKER SEGMENTATION AND SPEAKER CLASSIFICATION

    公开(公告)号:JP2001060098A

    公开(公告)日:2001-03-06

    申请号:JP2000188625

    申请日:2000-06-23

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To obtain a method, in which audio information from an audio/video source is automatically transferred and a speaker is identified simultaneously, by tranferring the audio source, simultaneously identifying latent segment boundaries and assigning a speaker label to each identified segment. SOLUTION: The method includes a step, in which a transfer is made for an audio source to generate a text version of audio information, a step which simultaneously identifies latent segment boundaries, and a step in which a speaker label is assigned to each of identified segments. A simultaneous transfer, segmentation and speaker identification process 500 generates a transfer of audio information, which represents a speaker related to each segment, in real time. A segmentation process 600 identifies all frames in which segment boundaries may exist. A speaker identifying process 700 assigns a speaker label to each of the segments that use registered speaker databases.

    METHOD AND DEVICE FOR TRACKING LOUDSPEAKER IN AUDIO STREAM

    公开(公告)号:JP2001051691A

    公开(公告)日:2001-02-23

    申请号:JP2000188613

    申请日:2000-06-23

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To provide a method and a device for automatically identifying a loudspeaker from an audio (or video) source. SOLUTION: The audio/video source processes 300 first so as to identify a frame where a segment border showing a loudspeaker change exists based on a Bayes information criterion(BIC) model selection criterion, and the segment corresponding to the same loudspeaker is clustered 400, and the cluster identification data are allocated to each of the identified segments. A loudspeaker classification system 100 generates a clustering output file 160 providing a series a segment numbers (having start times and end times of each segment) together with the corresponding identified cluster numbers.

    METHOD AND DEVICE FOR RETRIEVING VOICE INFORMATION BY USING CONTENTS INFORMATION AND SPEAKER INFORMATION

    公开(公告)号:JP2000348064A

    公开(公告)日:2000-12-15

    申请号:JP2000102972

    申请日:2000-04-05

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To retrieve voice information according to voice contents and speaker discrimination by discriminating voice information matching a user inquiry by comparing the user inquiry with the contents index and speaker index of a voice source. SOLUTION: The user inquiry includes a text character string including one or more key words and given speaker discrimination. The restriction conditions of the inquiry are compared with an indexed voice or/and a video database to retrieve proper voice and video segments. A voice retrieval system 100 comprises an indexing system 500 which transcribes and indexes voice information and an voice retrieval system 600. The indexing system 500 processes a text outputted from a voice recognition system in the indexing stage to perform contents indexing and speaker indexing. In the retrieval stage, the contents and speaker voice retrieval system 600 matches an inquiry document according to the voice contents and speaker discrimination by using the contents indexes and speaker indexes in the indexing stage and returns a proper document to a user.

    Tracking speakers in an audio stream

    公开(公告)号:GB2351592A

    公开(公告)日:2001-01-03

    申请号:GB0015194

    申请日:2000-06-22

    Applicant: IBM

    Abstract: Audio information is processed to identify potential segment boundaries, corresponding to a speaker changes 220. Thereafter, homogeneous segments (generally corresponding to the same speaker) are clustered 230, and a cluster identifier is assigned to each identified segment. A segmentation subroutine identifies potential segment boundaries using the BIC model selection criterion. A window selection scheme considers a relatively small amount of data in areas where new boundaries are very likely to occur, and the window size is increased when boundaries are not very likely to occur. When a segment boundary is found in a window, the next window begins after the detected boundary, using the minimal window size. BIC tests can be eliminated when they correspond to locations where the detection of a boundary is very unlikely.

    Methods and apparatus for tracking speakers in an audio stream

    公开(公告)号:GB2351592B

    公开(公告)日:2003-05-21

    申请号:GB0015194

    申请日:2000-06-22

    Applicant: IBM

    Abstract: Speakers are automatically identified in an audio (or video) source. The audio information is processed to identify potential segment boundaries. Homogeneous segments are clustered substantially concurrently with the segmentation routine, and a cluster identifier is assigned to each identified segment. A segmentation subroutine identifies potential segment boundaries using the BIC model selection criterion. A clustering subroutine uses a BIC model selection criterion to assign a cluster identifier to each of the identified segments. If the difference of BIC values for each model is positive, the two clusters are merged.

Patent Agency Ranking