Invention Grant
- Patent Title: Identifying gene signatures and corresponding biological pathways based on an automatically curated genomic database
-
Application No.: US16157660Application Date: 2018-10-11
-
Publication No.: US11354591B2Publication Date: 2022-06-07
- Inventor: Sanjoy Dey , Achille B. Fokoue-Nkoutche , William S. Spangler , Ping Zhang
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen J. Walder, Jr.; Kelsey Skodje
- Main IPC: G06N20/00
- IPC: G06N20/00 ; G06N5/02 ; G06F16/22 ; G16B20/00 ; G16B50/30 ; G16B40/00

Abstract:
Mechanisms are provided to implement a genomic database curation (GDC) system. The GDC system generates a ground truth database based on a training subset of datasets from an uncurated large scale genomic database, and label metadata for the training subset. The GDC system trains at least one classification engine of the GDC system based on the training subset and the ground truth database at least by performing a machine learning operation on the at least one classification engine. The GDC system automatically applies the at least one trained classification engine on the uncurated large scale genomic database to generate an automatically curated large scale genomic database. A meta-classifier engine generates an output specifying at least one of significant gene signatures or gene pathways for at least one of diseases or drug agents based on the automatically curated large scale genomic database.
Information query