Efficiently Extendable In-Interpreter Natural Language Agent

    公开(公告)号:US20240346246A1

    公开(公告)日:2024-10-17

    申请号:US18300930

    申请日:2023-04-14

    CPC classification number: G06F40/279

    Abstract: A trained natural language model is provided that uses an input session history to generate outputs to an interpreter. Outputs to the interpreter, and inputs responsively received therefrom, are added to the history to generate additional model outputs as the history is updated. The model is trained to engage in goal-oriented dialog with the interpreter and with the user (optionally through interpreter function calls) to identify the user's goals, to learn information about modules, functions, and methods available in the interpreter that are relevant to the user's goals, and to execute function calls and/or commands, based on the learned information, that accomplish the user's goals. The use of a history that may be completely blank at the beginning of the session reduces the computational requirements of running the model, as well as allowing the model to ‘update’ itself as the available modules are update, added, or removed.

    Data augmentation for intent classification

    公开(公告)号:US12243518B2

    公开(公告)日:2025-03-04

    申请号:US17978690

    申请日:2022-11-01

    Inventor: Dzmitry Bahdanau

    Abstract: The present disclosure relates to a data augmentation system and method that uses a large pre-trained encoder language model to generate new, useful intent samples from existing intent samples without fine-tuning. In certain embodiments, for a given class (intent), a limited number of sample utterances of a seed intent classification dataset may be concatenated and provided as input to the encoder language model, which may generate new sample utterances for the given class (intent). Additionally, when the augmented dataset is used to fine-tune an encoder language model of an intent classifier, this technique improves the performance of the intent classifier.

    Systems and methods for translating natural language queries into a constrained domain-specific language

    公开(公告)号:US11768831B2

    公开(公告)日:2023-09-26

    申请号:US17397117

    申请日:2021-08-09

    CPC classification number: G06F16/24522 G06F16/243

    Abstract: A natural language query to domain-specific language query (NLQ-to-DSLQ) translation system includes a language model and a domain-specific language (DSL) parser that constrains the output of the language model to a DSL, such as structured query language (SQL). At each decoding step, the language model generates a predicted next token for each of a set of potential translations of a NLQ. The DSL parser evaluates each of the potential translations at each decoding step based on a set of stored DSL rules, which define valid terminology, syntax, grammar, and/or other constraints of the DSL. The DSL parser may reject and remove from consideration partial potential translations that are invalid or receive a low parsing score, such that the language model only continues to generate new tokens at the next decoding step for partial potential translations that are determined to be valid and/or sufficiently high scoring.

    SYSTEMS AND METHODS FOR TRANSLATING NATURAL LANGUAGE QUERIES INTO A CONSTRAINED DOMAIN-SPECIFIC LANGUAGE

    公开(公告)号:US20220358125A1

    公开(公告)日:2022-11-10

    申请号:US17397117

    申请日:2021-08-09

    Abstract: A natural language query to domain-specific language query (NLQ-to-DSLQ) translation system includes a language model and a domain-specific language (DSL) parser that constrains the output of the language model to a DSL, such as structured query language (SQL). At each decoding step, the language model generates a predicted next token for each of a set of potential translations of a NLQ. The DSL parser evaluates each of the potential translations at each decoding step based on a set of stored DSL rules, which define valid terminology, syntax, grammar, and/or other constraints of the DSL. The DSL parser may reject and remove from consideration partial potential translations that are invalid or receive a low parsing score, such that the language model only continues to generate new tokens at the next decoding step for partial potential translations that are determined to be valid and/or sufficiently high scoring.

    DATA AUGMENTATION FOR INTENT CLASSIFICATION
    5.
    发明公开

    公开(公告)号:US20230141398A1

    公开(公告)日:2023-05-11

    申请号:US17978690

    申请日:2022-11-01

    Inventor: Dzmitry Bahdanau

    CPC classification number: G10L15/1815 G10L15/063 G10L2015/0631

    Abstract: The present disclosure relates to a data augmentation system and method that uses a large pre-trained encoder language model to generate new, useful intent samples from existing intent samples without fine-tuning. In certain embodiments, for a given class (intent), a limited number of sample utterances of a seed intent classification dataset may be concatenated and provided as input to the encoder language model, which may generate new sample utterances for the given class (intent). Additionally, when the augmented dataset is used to fine-tune an encoder language model of an intent classifier, this technique improves the performance of the intent classifier.

Patent Agency Ranking