Rapid genomic sequence classification using probabilistic data structures

Invention Grant

US11037654B2 Rapid genomic sequence classification using probabilistic data structures 有权

Please log in to see more content

Patent Title: Rapid genomic sequence classification using probabilistic data structures
Application No.: US15977667

Application Date: 2018-05-11
Publication No.: US11037654B2

Publication Date: 2021-06-15
Inventor: Masooda Omari , Tyler W. Barrus , Mark Sanders , Daniel Negron
Applicant: NOBLIS, INC.
Applicant Address: US VA Reston
Assignee: NOBLIS, INC.
Current Assignee: NOBLIS, INC.
Current Assignee Address: US VA Reston
Agency: Morrison & Foerster LLP
Main IPC: G16B30/10
IPC: G16B30/10 ; G16B30/00 ; G06F16/9038 ; G06F16/903 ; G16B40/00

Rapid genomic sequence classification using probabilistic data structures

Abstract:

Techniques for identifying and/or classifying genomic information are provided. In some embodiments, genomic information may be identified by computing systems without access to a database of reference genomic information, instead relying on locally stored probabilistic data structures representing reference genomic information. Query genomic data, such as data taken from a read-set, may be divided into sub-strings, and each of the locally-stored probabilistic data structures may be queried by each of the extracted sub-strings, generating probabilistic outputs indicating either that (a) the sub-string is probably included in the set of data represented by the probabilistic data structure or (b) the sub-string is definitely not included in the set of data. Based on the number and/or proportion of sub-strings from a read-set that are indicated as being likely represented by a probabilistic data structure, a likely identity or classification for the genomic information in the read-set may be determined.

Public/Granted literature

US20180330054A1 RAPID GENOMIC SEQUENCE CLASSIFICATION USING PROBABILISTIC DATA STRUCTURES Public/Granted day:2018-11-15

Information query

Espacenet

IPC分类:

G	物理
G16	特别适用于特定应用领域的信息通信技术
G16B	生物信息学，例如特别适用于计算分子生物学中的遗传或蛋白质相关数据处理的信息与通信技术
G16B30/00	特别适用于对核酸或氨基酸进行序列分析的ICT
G16B30/10	.序列排列；同源搜索