Abstract:
The invention relates to a method and a device for the transcription of spoken and written utterances. To this end, the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription. The additional information rendered usable by the combination as a result of the recognition result (ME) enables the transcriber to work relatively roughly and therefore quickly on the manual transcription. When using a keyboard ( 25 ), he can, for example, restrict himself to hitting the keys of only one row and/or can omit some keystrokes completely. In addition, the manual transcribing can also be accelerated by the suggestion of continuations ( 31 ) to the text input so far ( 30 ), which continuations are anticipated by virtue of the recognition result (ME).
Abstract:
The invention relates to a method and a device for the transcription of spoken and written utterances. To this end, the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription. The additional information rendered usable by the combination as a result of the recognition result (ME) enables the transcriber to work relatively roughly and therefore quickly on the manual transcription. When using a keyboard ( 25 ), he can, for example, restrict himself to hitting the keys of only one row and/or can omit some keystrokes completely. In addition, the manual transcribing can also be accelerated by the suggestion of continuations ( 31 ) to the text input so far ( 30 ), which continuations are anticipated by virtue of the recognition result (ME).
Abstract:
The method involves raising different M-gram probabilities of an element to the power of M-gram-specifically optimised parameter values and multiplying the results. The estimate of probability does not include the case where a probability with M greater than 1 for a speech vocabulary element estimated with a training vocabulary body is multiplied by a quotient raised to the power of an optimised parameter value. The optimised parameter value is determined using a GIS algorithm with the dividend as a unigram probability estimated using a second training vocabulary body and a unigram probability estimated using the first training vocabulary body as divisor. An Independent claim is also included for a speech recognition system.
Abstract:
The invention relates to a method and a device for the transcription of spoken and written utterances. To this end, the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription. The additional information rendered usable by the combination as a result of the recognition result (ME) enables the transcriber to work relatively roughly and therefore quickly on the manual transcription. When using a keyboard ( 25 ), he can, for example, restrict himself to hitting the keys of only one row and/or can omit some keystrokes completely. In addition, the manual transcribing can also be accelerated by the suggestion of continuations ( 31 ) to the text input so far ( 30 ), which continuations are anticipated by virtue of the recognition result (ME).
Abstract:
A small vocabulary pattern recognition system is used for recognizing a sequence of words, such as a sequence of digits (e.g. telephone number) or a sequence of commands. A representation of reference words is stored in a vocabulary 132, 134. Input means 110 are used for receiving a time-sequential input pattern representative of a spoken or written word sequence. A pattern recognizer 120 comprises a word-level matching unit 130 for generating a plurality of possible sequences of words by statistically comparing the input pattern to the representations of the reference words of the vocabulary 132, 134. A cache 150 is used for storing a plurality of most recently recognized words. A sequence-level matching unit 140 selects a word sequence from the plurality of sequences of words in dependence on a statistical language model which provides a probability of a sequence of M words, M>=2. The probability depends on a frequency of occurrence of the sequence in the cache. In this way for many small vocabulary systems where no reliable data is available on frequency of use of word sequences, the cache is used to provide data representative of the actual use.
Abstract:
A small vocabulary pattern recognition system is used for recognizing a sequence of words, such as a sequence of digits (e.g. telephone number) or a sequence of commands. A representation of reference words is stored in a vocabulary 132, 134. Input means 110 are used for receiving a time-sequential input pattern representative of a spoken or written word sequence. A pattern recognizer 120 comprises a word-level matching unit 130 for generating a plurality of possible sequences of words by statistically comparing the input pattern to the representations of the reference words of the vocabulary 132, 134. A cache 150 is used for storing a plurality of most recently recognized words. A sequence-level matching unit 140 selects a word sequence from the plurality of sequences of words in dependence on a statistical language model which provides a probability of a sequence of M words, M>=2. The probability depends on a frequency of occurrence of the sequence in the cache. In this way for many small vocabulary systems where no reliable data is available on frequency of use of word sequences, the cache is used to provide data representative of the actual use.
Abstract:
The invention relates to a method and a device for the transcription of spoken and written utterances. To this end, the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription. The additional information rendered usable by the combination as a result of the recognition result (ME) enables the transcriber to work relatively roughly and therefore quickly on the manual transcription. When using a keyboard ( 25 ), he can, for example, restrict himself to hitting the keys of only one row and/or can omit some keystrokes completely. In addition, the manual transcribing can also be accelerated by the suggestion of continuations ( 31 ) to the text input so far ( 30 ), which continuations are anticipated by virtue of the recognition result (ME).
Abstract:
The invention describes a method for storing broadcast contents and a broadcast content storage system. A plurality of content categories (KAT1, KAT2) is pre-defined, each of which is defined or described by at least one content descriptor (OB 1, OB2). Broadcast contents transmitted over at least one broadcast transmission channel are received, preferably continually, or over pre-defined lengths of time. Received broadcast contents, which are described by a content descriptor (OB1, OB2), are automatically assigned to the content category (KAT1, KAT2) defined or described by the corresponding content descriptor (OB1, OB2). The broadcast contents assigned to a content category (KAT1, KAT2) and the assignments of the broadcast contents to the corresponding content categories (KAT1, KAT2) are automatically stored.
Abstract:
The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding. Based on the assignment of models to sections of text an improved speech recognition and/or text formatting procedure is performed.
Abstract:
The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.