Human Language Analyzer for Detecting Clauses, Clause Types, and Clause Relationships

    公开(公告)号:US20200050672A1

    公开(公告)日:2020-02-13

    申请号:US16655615

    申请日:2019-10-17

    Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement are met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.

    AUTOMATED NEAR-DUPLICATE DETECTION FOR TEXT DOCUMENTS

    公开(公告)号:US20250165536A1

    公开(公告)日:2025-05-22

    申请号:US18896244

    申请日:2024-09-25

    Abstract: Techniques described herein provide for automated detection of near-duplicate documents. In one example, a system can cluster documents into a set of clusters based on character frequencies associated with the documents. For a given cluster, the system can generate first similarity scores associated with every pair of documents in the cluster. The system can then select a filtered group of documents associated with first similarity scores that meet or exceed a first predefined similarity threshold. Next, the system can convert the filtered group of documents into matrix representations. The system can generate second similarity scores for every pair of matrix representations. The system can then identify documents, from among the filtered group of documents, associated with second similarity scores that meet or exceed a second predefined similarity threshold. The identified documents can be duplicate or near-duplicate text documents.

    Automated near-duplicate detection for text documents

    公开(公告)号:US12124518B1

    公开(公告)日:2024-10-22

    申请号:US18394209

    申请日:2023-12-22

    CPC classification number: G06F16/906 G06F16/93 G06F16/355

    Abstract: Techniques described herein provide for automated detection of near-duplicate documents. In one example, a system can cluster documents into a set of clusters based on character frequencies associated with the documents. For a given cluster, the system can generate first similarity scores associated with every pair of documents in the cluster. The system can then select a filtered group of documents associated with first similarity scores that meet or exceed a first predefined similarity threshold. Next, the system can convert the filtered group of documents into matrix representations. The system can generate second similarity scores for every pair of matrix representations. The system can then identify documents, from among the filtered group of documents, associated with second similarity scores that meet or exceed a second predefined similarity threshold. The identified documents can be duplicate or near-duplicate text documents.

    Leveraging text profiles to select and configure models for use with textual datasets

    公开(公告)号:US11423680B1

    公开(公告)日:2022-08-23

    申请号:US17565824

    申请日:2021-12-30

    Abstract: Text profiles can be leveraged to select and configure models according to some examples described herein. In one example, a system can analyze a reference textual dataset and a target textual dataset using text-mining techniques to generate a first text profile and a second text profile, respectively. The first text profile can contain first metrics characterizing the reference textual dataset and the second text profile can contain second metrics characterizing the target textual dataset. The system can determine a similarity value by comparing the first text profile to the second text profile. The system can also receive a user selection of a model that is to be applied to the target textual dataset. The system can then generate an insight relating to an anticipated accuracy of the model on the target textual dataset based on the similarity value. The system can output the insight to the user.

    Human language analyzer for detecting clauses, clause types, and clause relationships

    公开(公告)号:US10467344B1

    公开(公告)日:2019-11-05

    申请号:US16380353

    申请日:2019-04-10

    Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement is met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.

    SYSTEMS, METHODS, AND GRAPHICAL USER INTERFACES FOR PREDICTING AND ANALYZING ACTION LIKELIHOOD

    公开(公告)号:US20250156467A1

    公开(公告)日:2025-05-15

    申请号:US18812637

    申请日:2024-08-22

    Abstract: A computer-implemented system, computer-implemented method, and computer-program product includes obtaining a text document that includes text describing an action; extracting one or more action tokens from the text document; executing a plurality of linguistic pattern searches that search the text document for one or more likelihood tokens associated with the one or more action tokens; classifying the action to a likelihood category associated with a respective linguistic pattern search of the plurality of linguistic pattern searches that identified the one or more likelihood tokens; classifying the text document to a respective domain; computing a priority value of the action described in the text document based on an input of the likelihood category and the respective domain; and generating a priority summary artifact that visually prioritizes the text document over one or more other text documents when the priority value of the action satisfies a predefined maximum priority threshold value.

    Leveraging text profiles to select and configure models for use with textual datasets

    公开(公告)号:US11501547B1

    公开(公告)日:2022-11-15

    申请号:US17858634

    申请日:2022-07-06

    Abstract: Text profiles can be leveraged to select and configure models according to some examples described herein. In one example, a system can analyze a reference textual dataset and a target textual dataset using text-mining techniques to generate a first text profile and a second text profile, respectively. The first text profile can contain first metrics characterizing the reference textual dataset and the second text profile can contain second metrics characterizing the target textual dataset. The system can determine a similarity value by comparing the first text profile to the second text profile. The system can also receive a user selection of a model that is to be applied to the target textual dataset. The system can then generate an insight relating to an anticipated accuracy of the model on the target textual dataset based on the similarity value. The system can output the insight to the user.

    Human language analyzer for detecting clauses, clause types, and clause relationships

    公开(公告)号:US10699081B2

    公开(公告)日:2020-06-30

    申请号:US16655615

    申请日:2019-10-17

    Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement are met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.

Patent Agency Ranking