Rapid genomic sequence classification using probabilistic data structures
Abstract:
Techniques for identifying and/or classifying genomic information are provided. In some embodiments, genomic information may be identified by computing systems without access to a database of reference genomic information, instead relying on locally stored probabilistic data structures representing reference genomic information. Query genomic data, such as data taken from a read-set, may be divided into sub-strings, and each of the locally-stored probabilistic data structures may be queried by each of the extracted sub-strings, generating probabilistic outputs indicating either that (a) the sub-string is probably included in the set of data represented by the probabilistic data structure or (b) the sub-string is definitely not included in the set of data. Based on the number and/or proportion of sub-strings from a read-set that are indicated as being likely represented by a probabilistic data structure, a likely identity or classification for the genomic information in the read-set may be determined.
Information query
Patent Agency Ranking
0/0