Method and systems for genome sequence compression
Abstract:
Systems and methods for genome sequence compression and decompression are provided. The method for compression encoding of a genome sequence includes partitioning a genome sequence into a plurality of Group of Bases (GoBs) and processing each of the plurality of GoBs independently to encode the genome sequence into a bit stream. Processing each of the plurality of GoBs includes dividing each of the plurality of GOBs into a first part and a second part, the first part including an initial context part and the second part including a learning-based inference part. The processing each of the plurality of GoBs further includes encoding the first part in accordance with a Markov model, encoding the second part in accordance with a learning-based model, and encoding the encoded first part and the encoded second part into the bit stream with an arithmetic encoder. The learning-based model may include Long and Short-Term Memory (LSTM)-based neural networks.
Public/Granted literature
Information query
Patent Agency Ranking
0/0