Multithreaded speech data preprocessing

    公开(公告)号:US11862171B2

    公开(公告)日:2024-01-02

    申请号:US17993385

    申请日:2022-11-23

    Abstract: An apparatus includes a processor to: receive, from a requesting device, a request to perform speech-to-text conversion of a speech data set; within a first thread of a thread pool, perform a first pause detection technique to identify a first set of likely sentence pauses; within a second thread of the thread pool, perform a second pause detection technique to identify a second set of likely sentence pauses; perform a speaker diarization technique to identify a set of likely speaker changes; divide the speech data set into data segments representing speech segments based on a combination of at least the first set of likely sentence pauses, the second set of likely sentence pauses, and the set of likely speaker changes; use at least an acoustic model with each data segment to identify likely speech sounds; and generate a transcript based, at least in part, on the identified likely speech sounds.

    Human language analyzer for detecting clauses, clause types, and clause relationships

    公开(公告)号:US10699081B2

    公开(公告)日:2020-06-30

    申请号:US16655615

    申请日:2019-10-17

    Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement are met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.

    AUTOMATED NEAR-DUPLICATE DETECTION FOR TEXT DOCUMENTS

    公开(公告)号:US20250165536A1

    公开(公告)日:2025-05-22

    申请号:US18896244

    申请日:2024-09-25

    Abstract: Techniques described herein provide for automated detection of near-duplicate documents. In one example, a system can cluster documents into a set of clusters based on character frequencies associated with the documents. For a given cluster, the system can generate first similarity scores associated with every pair of documents in the cluster. The system can then select a filtered group of documents associated with first similarity scores that meet or exceed a first predefined similarity threshold. Next, the system can convert the filtered group of documents into matrix representations. The system can generate second similarity scores for every pair of matrix representations. The system can then identify documents, from among the filtered group of documents, associated with second similarity scores that meet or exceed a second predefined similarity threshold. The identified documents can be duplicate or near-duplicate text documents.

    Automated near-duplicate detection for text documents

    公开(公告)号:US12124518B1

    公开(公告)日:2024-10-22

    申请号:US18394209

    申请日:2023-12-22

    CPC classification number: G06F16/906 G06F16/93 G06F16/355

    Abstract: Techniques described herein provide for automated detection of near-duplicate documents. In one example, a system can cluster documents into a set of clusters based on character frequencies associated with the documents. For a given cluster, the system can generate first similarity scores associated with every pair of documents in the cluster. The system can then select a filtered group of documents associated with first similarity scores that meet or exceed a first predefined similarity threshold. Next, the system can convert the filtered group of documents into matrix representations. The system can generate second similarity scores for every pair of matrix representations. The system can then identify documents, from among the filtered group of documents, associated with second similarity scores that meet or exceed a second predefined similarity threshold. The identified documents can be duplicate or near-duplicate text documents.

    SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION

    公开(公告)号:US20240347064A1

    公开(公告)日:2024-10-17

    申请号:US18634155

    申请日:2024-04-12

    Abstract: A system, method, and computer-program product includes receiving speech audio of a multi-turn conversation, generating, via a speech-to-text process, a transcript of the speech audio, wherein the transcript of the speech audio textually segments speech spoken during the multi-turn conversation into a plurality of utterances, generating a speaker diarization prompt that includes contextual information about a plurality of speakers participating in the multi-turn conversation, inputting, to a large language model, the speaker diarization prompt and the transcript of the speech audio, and obtaining, from the large language model, an output comprising an enhanced transcript of the speech audio, wherein the enhanced transcript of the speech audio textually segments the speech spoken during the multi-turn conversation into a plurality of refined utterances and associates a speaker identification value with each of the plurality of refined utterances.

    Systems and methods for configuring and using an audio transcript correction machine learning model

    公开(公告)号:US11922947B2

    公开(公告)日:2024-03-05

    申请号:US18214336

    申请日:2023-06-26

    Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-text post-processing sequence of a speech-to-text service.

    Method for Configuring and Using a Numeric-to-Alphabetic Expression Machine Learning Model

    公开(公告)号:US20230386473A1

    公开(公告)日:2023-11-30

    申请号:US18220632

    申请日:2023-07-11

    Abstract: A system, method, and computer-program product includes constructing a transcript adaptation training data corpus that includes a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes: a predicted audio transcript that includes at least one numerical expression, an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression, and a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens; configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using the transcript adaptation training data corpus; and executing the numeric-to-alphabetic expression machine learning model.

    MULTI-THREADED SPEAKER IDENTIFICATION
    18.
    发明公开

    公开(公告)号:US20230317083A1

    公开(公告)日:2023-10-05

    申请号:US18207433

    申请日:2023-06-08

    Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

    Speech audio pre-processing segmentation

    公开(公告)号:US11049502B1

    公开(公告)日:2021-06-29

    申请号:US17138521

    申请日:2020-12-30

    Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; configure a neural network to implement an acoustic model that includes a CTC output; provide each data chunk to the neural network and monitor the CTC output for a string of blank symbols; designate each string of blank symbols from the CTC output that is at least as long as a predetermined blank threshold length as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in a selected language in each speech segment.

    SYSTEMS AND METHODS FOR CONFIGURING AND USING AN AUDIO TRANSCRIPT CORRECTION MACHINE LEARNING MODEL

    公开(公告)号:US20230360652A1

    公开(公告)日:2023-11-09

    申请号:US18214336

    申请日:2023-06-26

    Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-text post-processing sequence of a speech-to-text service.

Patent Agency Ranking