Abstract:
Provided herein are systems, compositions and methods for tracking, sorting and/or identifying sample polynucleotides using nucleic acid barcodes. The barcodes provided herein are oligonucleotides that are designed to be uniquely identifiable. The nucleic acid barcodes have properties that permit them to be sequenced with high accuracy and/or reduced error rates. In some embodiments, the nucleic acid barcodes are designed to have certain nucleotide sequences that make up overlapping dibase color positions (also called color positions). The order of the overlapping dibase color positions can be determined using fluorophore-encoded dibase probes in a fluorophore color calling scheme to give high fidelity reads.
Abstract:
Disclosed are systems and methods for resequencing using color calls. A DNA sample is encoded and sequenced according to a multi-base code producing a string of read color calls for a fragment of the sample. A reference sequence is obtained. The string of read color calls is mapped to the reference sequence. A base sequence is extracted from the reference sequence. The base sequence is encoded as a string of reference color codes according to the multi-base code. The string of read color calls is aligned with the string of reference color codes and mismatches in the alignment are detected. One or more mismatches of the string of read color calls are annotated as inconsistent. The one or more inconsistent mismatches of the string of read color calls are corrected. The string of corrected read color calls is decoded to bases producing a read sequence.
Abstract:
Systems and method for annotating variants within a genome can call variants from reads or receive called variants directly and associate the called variants with functional annotations and interpretive annotations. A summary report of the called variants, the associated functional annotations, and the associated interpretive annotations can be generated.
Abstract:
Systems and methods are used to identify an exon junction from a single read of a transcript. A transcript sample is interrogated and a read sequence is produced using a nucleic acid sequencer. A first exon sequence and a second exon sequence are obtained using the processor. The first exon sequence is mapped to a prefix of the read sequence using the processor. The second exon sequence is mapped to a suffix of the read sequence using the processor. A sum of a number of sequence elements of the first exon sequence that overlap the prefix of the read sequence, of a number of sequence elements of the second exon sequence that overlap the suffix of the read sequence, and of a constant is calculated using the processor. If the sum equals a length of the read sequence, a junction is identified in the read using the processor.
Abstract:
A system for performing quality control for nucleic acid sample sequencing is disclosed. The system has a set of solid supports, each support having attached thereto a plurality of nucleic acid sequences. The set has plural groups of solid supports and each group contains solid supports having the same nucleic acid sequences attached thereto. The nucleic acid sequences of each group differ from each other. The nucleic acid sequences are synthetically derived. A method of preparing a quality control for performing nucleic acid sample sequencing and a method of validating a nucleic acid sequencing instrument are also disclosed.