Invention Grant
- Patent Title: Probabilistic text index for semi-structured data in columnar analytics storage formats
-
Application No.: US16929949Application Date: 2020-07-15
-
Publication No.: US11514697B2Publication Date: 2022-11-29
- Inventor: Jian Wen , Hamed Ahmadi , Sanjay Jinturkar , Nipun Agarwal , Lijian Wan , Shrikumar Hariharasubrahmanian
- Applicant: Oracle International Corporation
- Applicant Address: US CA Redwood Shores
- Assignee: Oracle International Corporation
- Current Assignee: Oracle International Corporation
- Current Assignee Address: US CA Redwood Shores
- Agency: Hickman Becker Bingham Ledesma LLP
- Agent Brian N. Miller
- Main IPC: G06V30/40
- IPC: G06V30/40 ; G06F16/13 ; G06F21/62 ; G06F40/289 ; G06F16/22 ; G06F16/81 ; G06K9/62

Abstract:
Herein is a probabilistic indexing technique for searching semi-structured text documents in columnar storage formats such as Parquet, using columnar input/output (I/O) avoidance, and needing minimal storage overhead. In an embodiment, a computer associates columns with text strings that occur in semi-structured documents. Text words that occur in the text strings are detected. Respectively for each text word, a bitmap, of a plurality of bitmaps, that contains a respective bit for each column is generated. Based on at least one of the bitmaps, some of the columns or some of the semi-structured documents are accessed.
Public/Granted literature
- US20220019784A1 PROBABILISTIC TEXT INDEX FOR SEMI-STRUCTURED DATA IN COLUMNAR ANALYTICS STORAGE FORMATS Public/Granted day:2022-01-20
Information query