-
公开(公告)号:US20190102087A1
公开(公告)日:2019-04-04
申请号:US15720949
申请日:2017-09-29
Applicant: Oracle International Corporation
Inventor: Jia Shi , Yiliang Jin , Zheren R. Zhang , Zuoyu Tao , Vijay Sridharan , Kothanda Umamageswaran , Graham Ivey , Yunrui Li
IPC: G06F3/06 , G06F9/455 , G06F12/02 , G06F12/0871 , H04L29/08
Abstract: A shared storage architecture persistently stores database files in non-volatile random access memories (NVRAMs) of computing nodes of a multi-node DBMS. The computing nodes of the multi-node DBMS not only collectively store database data on NVRAMs of the computing nodes, but also host database server instances that process queries in parallel, host database sessions and database processes, and together manage access to a database stored on the NVRAMs of the computing nodes. To perform a data block read operation from persistent storage, a data block may be transferred directly over a network between NVRAM of a computing node that persistently stores the data block to a database buffer in non-volatile RAM of another computing node that requests the data block. The transfer is accomplished using remote direct memory access (“RDMA). In addition to techniques for performing a data block read operation to NVRAM, computing nodes perform a data block write operation to data blocks stored in NVRAM of the NVRAM shared storage architecture. The data block write operation is referred to herein as a one-sided write because only one database process needs to participate in the writing of a data block to NVRAM in order to successfully commit the write.
-
公开(公告)号:US10248685B2
公开(公告)日:2019-04-02
申请号:US15253626
申请日:2016-08-31
Applicant: Oracle International Corporation
Inventor: Kartik Kulkarni , Juan R. Loaiza , Vivekanandhan Raja , Kothanda Umamageswaran , Sanket Hase , Vasudha Krishnaswamy , Tirthankar Lahiri
IPC: G06F17/30
Abstract: A minimum value (MV) is computed for start timestamps that each correspond to an uncommitted transaction. In an embodiment, the MV is computed for a pluggable database that is open on at least first and second instances of a database. The MV is computed for the first instance as of a first current timestamp (CT). The MV and the first CT are communicated to a second instance that has a second CT. If the first and second CTs are equal, the second instance store the MV. If the first CT is bigger, the second CT also becomes equal to the first CT. If the first CT is smaller, the MV is discarded, and the first CT becomes equal to the second CT. In an embodiment, if the MV remains unchanged for a predetermined time period, a start timestamp corresponding to the MV is advanced to a current or future timestamp.
-
43.
公开(公告)号:US09898490B2
公开(公告)日:2018-02-20
申请号:US14313984
申请日:2014-06-24
Applicant: Oracle International Corporation
Inventor: Umesh Panchaksharaiah , Krishnan Meiyyappan , Kothanda Umamageswaran , Alex Tsukerman , Semen Ustimenko , Adrian Ng , Devang Mundhra , Yinian Qi
IPC: G06F17/30
CPC classification number: G06F17/30339 , G06F17/30289
Abstract: Techniques are described herein for supporting multiple versions of a database server within a database machine comprising a separate database layer and storage layer. In an embodiment, the database layer includes compute nodes each hosting one or more instances of a database server. The storage layer includes storage nodes each hosting one or more instances of a storage server, also referred to herein as a “cell server.” In general, the database servers may receive data requests, such as SQL queries, from client applications and service the requests in coordination with the cell servers of the storage layer.
-
公开(公告)号:US20170116269A1
公开(公告)日:2017-04-27
申请号:US15331599
申请日:2016-10-21
Applicant: Oracle International Corporation
Inventor: Roger D. Macnicol , Viral Shah , Xia Hua , Jesse Kamp , Shasank K. Chavan , Maria Colgan , Tirthankar Lahiri , Adrian Tsz Him Ng , Krishnan Meiyyappan , Amit Ganesh , Juan R. Loaiza , Kothanda Umamageswaran , Yiran Qin
IPC: G06F17/30 , G06F12/0811 , G06F12/0897 , G06F3/06
CPC classification number: G06F16/24539 , G06F3/061 , G06F3/0647 , G06F3/065 , G06F3/067 , G06F12/0811 , G06F12/0897 , G06F16/22 , G06F16/221 , G06F2212/1016 , G06F2212/163 , G06F2212/225
Abstract: Techniques are provided for storing in in-memory unit (IMU) in a lower-storage tier and copying the IMU to DRAM when needed for query processing. Techniques are also provided for copying IMUs to lower tiers of storage when evicted from the cache of higher tiers of storage. Techniques are provided for implementing functionality of IMUs within a storage system, to enable database servers to push tasks, such as filtering, to the storage system where the storage system may access IMUs within its own memory to perform the tasks. Metadata associated with a set of data may be used to indicate whether an IMU for the data should be created by the database server machine or within the storage system.
-
公开(公告)号:US09430383B2
公开(公告)日:2016-08-30
申请号:US14336860
申请日:2014-07-21
Applicant: Oracle International Corporation
Inventor: Zuoyu Tao , Jia Shi , Kothanda Umamageswaran , Selcuk Aya
CPC classification number: G06F12/0893 , G06F3/06 , G06F3/0611 , G06F3/0632 , G06F3/0643 , G06F3/0685 , G06F12/0246 , G06F12/0804 , G06F12/0806 , G06F12/0866 , G06F17/30115 , G06F2212/1008 , G06F2212/1024 , G06F2212/225 , G06F2212/46 , G06F2212/604 , G06F2212/608 , G06F2212/7207 , G06F2212/7208
Abstract: A method and system for fast file initialization is provided. An initialization request to create or extend a file is received. The initialization request comprises or identifies file template metadata. A set of allocation units are allocated, the set of allocation units comprising at least one allocation unit for the file on a primary storage medium without initializing at least a portion of the file on the primary storage medium. The file template metadata is stored in a cache. The cache resides in at least one of volatile memory and persistent flash storage. A second request is received corresponding to a particular allocation unit of the set of allocation units. Particular file template metadata associated with the particular allocation unit is obtained. In response to the second request, at least a portion of a new allocation unit is generated.
Abstract translation: 提供了一种用于快速文件初始化的方法和系统。 接收到创建或扩展文件的初始化请求。 初始化请求包括或识别文件模板元数据。 分配一组分配单元,该组分配单元包括在主存储介质上的文件的至少一个分配单元,而不在该主存储介质上初始化文件的至少一部分。 文件模板元数据存储在缓存中。 缓存位于易失性存储器和持久闪存存储器中的至少一个中。 接收对应于该组分配单元的特定分配单元的第二请求。 获得与特定分配单元相关联的特定文件模板元数据。 响应于第二请求,生成新的分配单元的至少一部分。
-
公开(公告)号:US20160092534A1
公开(公告)日:2016-03-31
申请号:US14823212
申请日:2015-08-11
Applicant: Oracle International Corporation
Inventor: Nilesh Choudhury , Scott Martin , Zuoyu Tao , Jia Shi , Alexander Tsukerman , Kothanda Umamageswaran
IPC: G06F17/30
CPC classification number: G06F16/27 , G06F3/0608 , G06F3/0617 , G06F11/1435 , G06F11/1458
Abstract: Techniques herein are for creating a database snapshot by creating a sparse database. A method involves receiving a creation request to create a sparse database. The creation request has an identity of a parent database. The creation request is processed to create a sparse database. The sparse database has the identity of the parent database. The sparse database does not contain data copied from the parent database. A write request to write data into the sparse database is received. The write request is processed by writing the data into the sparse database. The parent database does not receive the data.
Abstract translation: 这里的技术是通过创建稀疏数据库来创建数据库快照。 一种方法包括接收创建请求以创建稀疏数据库。 创建请求具有父数据库的标识。 处理创建请求以创建稀疏数据库。 稀疏数据库具有父数据库的身份。 稀疏数据库不包含从父数据库复制的数据。 接收到将数据写入稀疏数据库的写入请求。 通过将数据写入稀疏数据库来处理写入请求。 父数据库不接收数据。
-
公开(公告)号:US20160092454A1
公开(公告)日:2016-03-31
申请号:US14849012
申请日:2015-09-09
Applicant: Oracle International Corporation
Inventor: Zuoyu Tao , Nilesh Choudhury , Scott Martin , Mingmin Chen , Jia Shi , Alexander Tsukerman , Kothanda Umamageswaran
IPC: G06F17/30
CPC classification number: G06F16/1744
Abstract: Techniques herein are for accessing non-materialized blocks of a sparse file. A method involves a storage system receiving a storage command to access a sparse file. A combined content of a set of materialized blocks and a header that identifies one or more non-materialized blocks is assembled. The combined content does not comprise a content of the one or more non-materialized blocks. Responsive to the assembling, the combined content is transferred between the storage system and a computer system.
Abstract translation: 这里的技术用于访问稀疏文件的未实现的块。 一种方法涉及存储系统接收存储命令以访问稀疏文件。 组合一组实体化块和标识一个或多个未实现块的报头的组合内容。 组合的内容不包括一个或多个未实现的块的内容。 响应于组装,组合的内容在存储系统和计算机系统之间传送。
-
公开(公告)号:US11256627B2
公开(公告)日:2022-02-22
申请号:US16907703
申请日:2020-06-22
Applicant: Oracle International Corporation
Inventor: Juan R. Loaiza , J. William Lee , Wei-Ming Hu , Kothanda Umamageswaran , Neil J. S. MacNaughton , Adam Y. Lee
IPC: G06F12/0873 , G06F12/0868 , G06F12/0866 , G06F16/13 , G06F16/172
Abstract: A method and an apparatus for implementing a buffer cache for a persistent file system in a non-volatile memory is provided. A set of data is maintained in one or more extents in a non-volatile random-access memory (NVRAM) of a computing device. At least one buffer header is allocated in a dynamic random-access memory (DRAM) of the computing device. In response to a read request by a first process executing on the computing device to access one or more first data blocks in a first extent of the one or more extents, the first process is granted direct read access of the first extent in the NVRAM. A reference to the first extent in the NVRAM is stored in a first buffer header. The first buffer header is associated with the first process. The first process uses the first buffer header to directly access the one or more first data blocks in the NVRAM.
-
公开(公告)号:US10956335B2
公开(公告)日:2021-03-23
申请号:US15720972
申请日:2017-09-29
Applicant: Oracle International Corporation
Inventor: Zuoyu Tao , Jia Shi , Kothanda Umamageswaran , Juan R. Loaiza
IPC: G06F12/0873 , G06F12/0864 , G06F16/22 , G06F16/2455 , G06F12/02 , G06F12/0868 , G06F12/0871 , G06F15/173
Abstract: Data blocks are cached in a persistent cache (“NV cache”) allocated from as non-volatile RAM (“NVRAM”). The data blocks may be accessed in place in the NV cache of a “source” computing element by another “remote” computing element over a network using remote direct memory access (“RMDA”). In order for a remote computing element to access the data block in NV cache on a source computing element, the remote computing element needs the memory address of the data block within the NV cache. For this purpose, a hash table is stored and maintained in RAM on the source computing element. The hash table identifies the data blocks in the NV cache and specifies a location of the cached data block within the NV cache.
-
公开(公告)号:US10936616B2
公开(公告)日:2021-03-02
申请号:US14733691
申请日:2015-06-08
Applicant: Oracle International Corporation
Inventor: Dmitry Mikhailovich Potapov , Krishnan Meiyyappan , Alexander Tsukerman , Kothanda Umamageswaran , Semen Ustimenko , Wei Zhang , Adrian Tsz Him Ng , Daniel McClary , Allen Brumm , James Stenoish , Robert K. Abbott
IPC: G06F16/2453 , G06F16/25
Abstract: A storage system communicatively coupled to a database management system (DBMS performs storage-side scanning of data sources that are not stored in native database storage format of the DBMS. Data sources for external tables are accessible in a storage system referred to as a distributed data access system (DDAS), e.g. a Hadoop Distributed File System. To execute a query that references an external table, a DBMS first generates an execution plan. The DDAS supplies the DBMS with information that specifies each portion of the data source, and specifies which data node to use to access the portion. The DBMS sends a request for each portion to the respective data node, requesting that the data node generate rows from data in the portion. The request may specify scanning criteria, specifying one or more columns to project and/or filter on, and code modules for the data node to execute to generate records.
-
-
-
-
-
-
-
-
-