Distributed columnar data set subset retrieval

    公开(公告)号:US11263175B2

    公开(公告)日:2022-03-01

    申请号:US17039584

    申请日:2020-09-30

    Abstract: An apparatus includes a processor to: within each reading thread, retrieve a data set part and corresponding part metadata from storage device(s), analyze row group metadata for each row group within the data set part to identify candidate row group(s) meeting specified criteria, and store the candidate row group(s) and corresponding row group metadata within a data buffer of a queue; operate the queue as a FIFO buffer; within each provision thread, retrieve one of multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to identify rows meeting the criteria, and provide those rows to the requesting device or an application; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.

    Distributed columnar data set storage

    公开(公告)号:US10983957B2

    公开(公告)日:2021-04-20

    申请号:US17037652

    申请日:2020-09-29

    Abstract: An apparatus includes a processor to: instantiate collection threads, data buffers of a queue, and aggregation threads; within each collection thread, assemble a row group from a subset of the multiple rows, reorganize the data values row-wise to columnar organization, and store the row group within a data buffer of the queue; operate the buffer queue as a FIFO buffer; within each aggregation thread, retrieve multiple row groups from multiple data buffers of the queue, assemble a data set part from the multiple row groups, transmit, to storage device(s) via a network, the data set part; and in response to each instance of retrieval of a row group from a data buffer of the buffer queue for use within an aggregation thread, analyze a level of availability of at least storage space within the node device to determine whether to dynamically adjust the quantity of data buffers of the buffer queue.

    Distributed data storage grouping

    公开(公告)号:US10789207B2

    公开(公告)日:2020-09-29

    申请号:US16233644

    申请日:2018-12-27

    Abstract: An apparatus includes a processor component to: transmit node device identifiers to multiple node devices to define an ordering thereamong; following block exchanges redistributing the subsets among a reduced number of node devices, receive sizes of blocks or sub-blocks of data within each subset from the reduced number of node devices; based on the received sizes, generate map data organized to define an ordering among the blocks stemming from the ordering among the multiple node devices; determine whether the total size of the map data and metadata, together, exceeds a minimum size for data transmissions to storage device(s); and in response to the total size exceeding the minimum size, form the map data and metadata into segment(s) that each fit the minimum size and a maximum size, and transmit the segment(s) at least partially in parallel with other segments of the blocks transmitted by the reduced number of node devices.

    DISTRIBUTED COLUMNAR DATA SET RETRIEVAL

    公开(公告)号:US20210026805A1

    公开(公告)日:2021-01-28

    申请号:US17039314

    申请日:2020-09-30

    Abstract: An apparatus includes a processor to: instantiate data buffers of a queue, reading threads, and provision threads; within each reading thread, use an identifier provided in a data buffer of the queue to retrieve the corresponding data set part and part metadata from storage device(s), and store both within the data buffer; operate the queue as a (FIFO) buffer; within each provision thread, retrieve a row group from among multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to decompress at least one column, and provide the data values of the row group to the requesting device or an application routine; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.

    Distributed data set storage and retrieval

    公开(公告)号:US10185721B2

    公开(公告)日:2019-01-22

    申请号:US15804570

    申请日:2017-11-06

    Abstract: An apparatus includes a processor component caused to: retrieve metadata of organization of data within a data set, and map data of organization of data blocks within a data file; receive indications of which node devices are available to perform a processing task with a data set portion; and in response to the data set including partitioned data, compare the quantities of available node devices and of the node devices last involved in storing the data set. In response to a match, for each map data map entry: retrieve a hashed identifier for a data sub-block, and a size for each of the data sub-blocks within the corresponding data block; divide the hashed identifier by the quantity of available node devices; compare the modulo value to a designation assigned to each of the available node devices; and provide a pointer to the available node device assigned the matching designation.

Patent Agency Ranking