TRAINING NEURAL NETWORKS ON ARBITRARILY LARGE DATA FILES

    公开(公告)号:US20250013869A1

    公开(公告)日:2025-01-09

    申请号:US18762333

    申请日:2024-07-02

    Abstract: In an example, a method for a method for training a Machine Learning (ML) model using arbitrarily sized training data files, to selectively identify informative portions of one or more training data files for improving the ML model includes automatically selectively identifying, by a computing system, one or more informative portions of one or more training data files; calculating, by the computing system, gradients for the identified one or more informative portions; and updating, by the computing system, weights of a ML model using the calculated gradients.

    ACCELERATED INFORMATION EXTRACTION THROUGH FACILITATED RULE DEVELOPMENT

    公开(公告)号:US20240193366A1

    公开(公告)日:2024-06-13

    申请号:US18534210

    申请日:2023-12-08

    CPC classification number: G06F40/289 G06F40/284

    Abstract: A computing system is configured to process a first document using an anchor rule, wherein the anchor rule identifies tokens for a domain. The computing system is further configured to identify, using the anchor rule, a first set of phrases from the first document that match the tokens. The computing system is further configured to receive a first selection from a first subset of the first set of phrases. The computing system is further configured to determine, based on the first selection, a word list, wherein the word list is a list of words ranked by rate of appearance in the first document. The computing system is further configured to process, based on the word list, a second document to extract one or more points of information from the second document.

    MEDIA ATTRIBUTION VERIFICATION
    3.
    发明公开

    公开(公告)号:US20240147025A1

    公开(公告)日:2024-05-02

    申请号:US18471171

    申请日:2023-09-20

    CPC classification number: H04N21/83 H04N21/814

    Abstract: In general, the disclosure describes techniques for obtaining, by a computing system, a content item and a purported source for the content item, wherein the content item may include multimodal data. The techniques may further include generating, by the computing system, a plurality of modality feature vectors representative of the multimodal data, wherein each of the generated modality feature vectors has a different, corresponding modality feature. The techniques may further include mapping, by the computing system, the generated modality feature vectors based on a statistical distribution associated with the purported source. The techniques may further include determining, by the computing system, a score based on the mapping. The techniques may further include outputting, by the computing system and based on the score, an indication of whether the content item originated from the purported source.

    GENERATING TRAINING EXAMPLES FOR TRANSLATION OF NATURAL LANGUAGE QUERIES TO EXECUTABLE DATABASE QUERIES

    公开(公告)号:US20240378198A1

    公开(公告)日:2024-11-14

    申请号:US18391339

    申请日:2023-12-20

    Abstract: In an example, a method includes, generating, by a machine learning system, one or more formal queries based on data contained in a database repository; generating, by the machine learning system, a natural language query for each formal query of the one or more formal queries to generate pairs of formal queries and corresponding natural language queries by applying a general grammar for a language of each formal query; and training, by the machine learning system, a neural network configured to translate natural language queries into formal queries using the pairs of the formal queries and corresponding natural language queries generated by the machine learning system.

    RAPID ADAPTATION TO CONTEMPORARY TEXT DATASETS

    公开(公告)号:US20240152540A1

    公开(公告)日:2024-05-09

    申请号:US18472761

    申请日:2023-09-22

    CPC classification number: G06F16/353

    Abstract: In an example, a method for adapting a machine learning model includes receiving first input data; choosing a first set of unlabeled textual spans in the first input data, wherein the chosen first set of unlabeled textual spans is associated with a first domain; labeling the chosen first set of unlabeled textual spans to generate a labeled first set of textual spans; categorizing the labeled first set of textual spans to generate a categorized labeled first set of textual spans; receiving second input data; choosing a second set of unlabeled textual spans, wherein the second set of unlabeled textual spans is associated with a second domain; and adapting the machine learning model to the second domain based on the categorized second set of unlabeled textual spans that is generated based on the categorized labeled first set of textual spans.

Patent Agency Ranking