Intelligent data curation
    1.
    发明授权

    公开(公告)号:US10909460B2

    公开(公告)日:2021-02-02

    申请号:US16726339

    申请日:2019-12-24

    Abstract: An apparatus includes a processor to: provide a set of feature routines to a set of processor cores to detect features of a data set distributed thereamong; generate metadata indicative of the detected features; generate context data indicative of contextual aspects of the data set; provide the metadata and context data to each processor core, and distribute a set of suggestion models thereamong to enable derivation of a suggested subset of data preparation operations to be suggested to be performed on the data set; transmit indications of the suggested subset to a viewing device, and receive therefrom indications of a selected subset of data preparation operations selected to be performed; compare the selected and suggested subsets; and in response to differences therebetween, re-train at least one suggestion model of the set of suggestion models based at least on the combination of the metadata, context data and selected subset.

    Distributed data set task selection

    公开(公告)号:US09753767B2

    公开(公告)日:2017-09-05

    申请号:US15431573

    申请日:2017-02-13

    Abstract: An apparatus may include a processor and storage to store instructions that cause the processor to perform operations including: generate a current data set model descriptive of a characteristic of a current data set; compare the current data set model to at least one previously generated data set model descriptive of a characteristic of a previously analyzed data set; in response to detection of a match within a similarity threshold: retrieve an indication from a correlation database of an action previously performed on a previously analyzed data set; select a computer language based on node data descriptive of characteristics of a node device execution environment; generate node instructions in the selected computer language and based on the current data set model to cause the node device to perform the previously performed action on a portion of the current data set; and transmit the node instructions to the node device.

    Systems, methods, and graphical user interfaces for taxonomy-based classification of unlabeled structured datasets

    公开(公告)号:US11841851B1

    公开(公告)日:2023-12-12

    申请号:US18124299

    申请日:2023-03-21

    CPC classification number: G06F16/2428 G06F16/287 G06F40/284

    Abstract: A system, method, and computer-program product includes identifying a target hierarchical taxonomy that includes a plurality of distinct hierarchical taxonomy categories, extracting a plurality of distinct taxonomy tokens from the plurality of distinct hierarchical taxonomy categories, computing a taxonomy vector corpus based on the plurality of distinct taxonomy tokens, computing a plurality of distinct taxonomy clusters based on an input of the taxonomy vector corpus, constructing a hierarchical taxonomy classifier based on the plurality of distinct taxonomy clusters, converting a volume of unlabeled structured datasets to a plurality of distinct corpora of taxonomy-labeled structured datasets based on the hierarchical taxonomy classifier, and outputting at least one corpus of taxonomy-labeled structured datasets of the plurality of distinct corpora of taxonomy-labeled structured datasets based on an input of a data classification query.

    Systems, methods, and graphical user interfaces for taxonomy-based classification of unlabeled structured datasets

    公开(公告)号:US11809460B1

    公开(公告)日:2023-11-07

    申请号:US18221695

    申请日:2023-07-13

    CPC classification number: G06F16/287

    Abstract: A computer-implemented system includes identifying a target hierarchical taxonomy comprising a plurality of distinct hierarchical taxonomy categories; extracting a plurality of distinct taxonomy tokens from the plurality of distinct hierarchical taxonomy categories; computing a taxonomy vector corpus based on the plurality of distinct taxonomy tokens; computing a plurality of distinct taxonomy clusters based on an input of the taxonomy vector corpus; constructing a hierarchical taxonomy classifier based on the plurality of distinct taxonomy clusters; converting a volume of unlabeled structured datasets to a plurality of distinct corpora of taxonomy-labeled structured datasets based on the hierarchical taxonomy classifier; and outputting at least one corpus of taxonomy-labeled structured datasets of the plurality of distinct corpora of taxonomy-labeled structured datasets based on an input of a data classification query.

    Computerized pipelines for transforming input data into data structures compatible with models

    公开(公告)号:US11106694B1

    公开(公告)日:2021-08-31

    申请号:US17173308

    申请日:2021-02-11

    Abstract: Computerized pipelines can transform input data into data structures compatible with models in some examples. In one such example, a system can obtain a first table that includes first data referencing a set of subjects. The system can then execute a sequence of processing operations on the first data in a particular order defined by a data-processing pipeline to modify an analysis table to include features associated with the set of subjects. Executing each respective processing operation in the sequence to generate the modified analysis table may involve: deriving a respective set of features from the first data by executing a respective feature-extraction operation on the first data; and adding the respective set of features to the analysis table. The system may then execute a predictive model on the modified analysis table for generating a predicted value based on the modified analysis table.

    Multi-Domain Impact Analysis Using Object Relationships
    7.
    发明申请
    Multi-Domain Impact Analysis Using Object Relationships 有权
    使用对象关系的多域影响分析

    公开(公告)号:US20140280349A1

    公开(公告)日:2014-09-18

    申请号:US13913658

    申请日:2013-06-10

    CPC classification number: G06F17/30292

    Abstract: Systems and methods for impact analysis across multiple domains using non-data types of relationships between objects are provided. A data model can be formed. The data model can include objects representative of physical data in separate domains and relationships of non-data types between the objects. An impact analysis interface can be generated using the data model. The impact analysis interface can depict the objects and the non-data types of relationships between the objects.

    Abstract translation: 提供了使用对象之间的非数据类型的关系在多个域之间进行影响分析的系统和方法。 可以形成数据模型。 数据模型可以包括表示不同域中的物理数据的对象和对象之间的非数据类型的关系。 可以使用数据模型生成影响分析界面。 影响分析界面可以描绘对象和对象之间的非数据类型关系。

    SYSTEMS, METHODS, AND GRAPHICAL USER INTERFACES FOR TAXONOMY-BASED CLASSIFICATION OF UNLABELED STRUCTURED DATASETS

    公开(公告)号:US20240028621A1

    公开(公告)日:2024-01-25

    申请号:US18221684

    申请日:2023-07-13

    CPC classification number: G06F16/287

    Abstract: A computer-implemented system includes identifying a target hierarchical taxonomy comprising a plurality of distinct hierarchical taxonomy categories; extracting a plurality of distinct taxonomy tokens from the plurality of distinct hierarchical taxonomy categories; computing a taxonomy vector corpus based on the plurality of distinct taxonomy tokens; computing a plurality of distinct taxonomy clusters based on an input of the taxonomy vector corpus; constructing a hierarchical taxonomy classifier based on the plurality of distinct taxonomy clusters; converting a volume of unlabeled structured datasets to a plurality of distinct corpora of taxonomy-labeled structured datasets based on the hierarchical taxonomy classifier; and outputting at least one corpus of taxonomy-labeled structured datasets of the plurality of distinct corpora of taxonomy-labeled structured datasets based on an input of a data classification query.

    Intelligent data curation
    9.
    发明授权

    公开(公告)号:US10552739B1

    公开(公告)日:2020-02-04

    申请号:US16503742

    申请日:2019-07-05

    Abstract: An apparatus includes a processor to: provide a set of feature routines to a set of processor cores to detect features of a data set distributed thereamong; generate metadata indicative of the detected features; generate context data indicative of contextual aspects of the data set; provide the metadata and context data to each processor core, and distribute a set of suggestion models thereamong to enable derivation of a suggested subset of data preparation operations to be suggested to be performed on the data set; transmit indications of the suggested subset to a viewing device, and receive therefrom indications of a selected subset of data preparation operations selected to be performed; compare the selected and suggested subsets; and in response to differences therebetween, re-train at least one suggestion model of the set of suggestion models based at least on the combination of the metadata, context data and selected subset.

    Multi-domain impact analysis using object relationships
    10.
    发明授权
    Multi-domain impact analysis using object relationships 有权
    使用对象关系的多域影响分析

    公开(公告)号:US09239854B2

    公开(公告)日:2016-01-19

    申请号:US13913658

    申请日:2013-06-10

    CPC classification number: G06F17/30292

    Abstract: Systems and methods for impact analysis across multiple domains using non-data types of relationships between objects are provided. A data model can be formed. The data model can include objects representative of physical data in separate domains and relationships of non-data types between the objects. An impact analysis interface can be generated using the data model. The impact analysis interface can depict the objects and the non-data types of relationships between the objects.

    Abstract translation: 提供了使用对象之间的非数据类型的关系在多个域之间进行影响分析的系统和方法。 可以形成数据模型。 数据模型可以包括表示不同域中的物理数据的对象和对象之间的非数据类型的关系。 可以使用数据模型生成影响分析界面。 影响分析界面可以描绘对象和对象之间的非数据类型关系。

Patent Agency Ranking