Invention Grant
- Patent Title: Distributed dataset modification, retention, and replication
-
Application No.: US16248640Application Date: 2019-01-15
-
Publication No.: US10976950B1Publication Date: 2021-04-13
- Inventor: Chris Trezzo , Jason Sprowl , Joep Rottinghuis
- Applicant: Twitter, Inc.
- Applicant Address: US CA San Francisco
- Assignee: Twitter, Inc.
- Current Assignee: Twitter, Inc.
- Current Assignee Address: US CA San Francisco
- Agency: Fish & Richardson P.C.
- Main IPC: G06F3/06
- IPC: G06F3/06

Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data retention and modification. One of the methods includes dividing partitions into a set of generations according to a retention policy; accumulating modification and deletion events that define changes to be applied to data of the distributed dataset; and when a triggering event occurs for a triggered generation in the set of generations, rolling an oldest partition out of the triggered generation, the rolling comprising: if the oldest partition has reached the end of a retention period for the dataset, marking the oldest partition for deletion in the triggered generation; otherwise: creating a new partition corresponding to the data of the oldest partition, wherein the data is cleaned using a scrubbing process; adding the new partition to a next generation in the set of generations; and marking the oldest partition for deletion in the triggered generation.
Information query