Swappable online machine learning algorithms implemented in a data intake and query system

    公开(公告)号:US11615102B2

    公开(公告)日:2023-03-28

    申请号:US16779509

    申请日:2020-01-31

    Applicant: Splunk Inc.

    Inventor: Ram Sriharsha

    Abstract: Systems and methods are described for testing one or more machine learning algorithms in parallel with an existing machine learning algorithm implemented within a data processing pipeline. Each machine learning algorithm can train a machine learning model that receives a live stream of raw machine data. The output of the machine learning model trained by the existing machine learning algorithm may be written to an external storage system, but the output of the machine learning model(s) trained by the test machine learning algorithm(s) may not be written to an external storage system. After some time, performance of the test machine learning algorithm(s) and the existing machine learning algorithm is evaluated. If the test machine learning algorithm performs better than the existing machine learning algorithm, then the machine learning algorithms can be swapped without any downtime and without needed to re-train a machine learning model using previously seen raw machine data.

    Systems and methods for integration of multiple programming languages within a pipelined search query

    公开(公告)号:US11567735B1

    公开(公告)日:2023-01-31

    申请号:US17074280

    申请日:2020-10-19

    Applicant: SPLUNK Inc.

    Abstract: According to one embodiment, a method that supports queries deploying operators based on multiple programming languages is described. A sequence of operators associated with a query is identified, where the sequence of operators includes at least two neighboring operators including a first operator based on a first programming language and a second operator based on a second programming language that is different from the first programming language. Thereafter, a schema associated with the first operator and a schema associated with the second operator is determined along with the compatibility between the schema of the first operator and the schema of the second operator. A query error message is generated in response to incompatibility between the first operator schema and the second operator schema. Compatibility is determined when an output generated by execution of the first operator provides machine data needed as input for execution of the second operator.

    Systems and methods for detecting DNS communications through time-to-live analyses

    公开(公告)号:US11477161B1

    公开(公告)日:2022-10-18

    申请号:US17514814

    申请日:2021-10-29

    Applicant: SPLUNK Inc.

    Abstract: A computerized method is disclosed that includes accessing domain name server (DNS) record data including a plurality of DNS records spanning a first time period, performing a time-to-live (TTL) analysis to determine a TTL run length distribution for the DNS record data, wherein the TTL analysis includes: generating a vector of the TTL values of each DNS record ordered sequentially in time, parsing the vector of the TTL values into segments, where a segment consists of one or more TTL values where a current TTL value is less than an immediately preceding TTL value, and determining the TTL run length distribution, determining whether DNS beaconing is present based on a result of the TTL analysis and in response to determining that DNS beaconing is present, generating an alert for a system administrator.

    LOG SOURCETYPE INFERENCE MODEL TRAINING FOR A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20220036002A1

    公开(公告)日:2022-02-03

    申请号:US16945448

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.

    DATA FIELD EXTRACTION MODEL TRAINING FOR A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20220035775A1

    公开(公告)日:2022-02-03

    申请号:US16945229

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for training an artificial intelligence model to extract one or more data fields from a log. For example, the artificial intelligence model may be a neural network. The neural network may be trained using training data obtained by iterating through a plurality of logs using active learning, and selecting a subset of the logs in the plurality to be labeled by a user. For example, the selected subset of logs may be logs that are not similar to other logs already labeled by a user. The user may be prompted to label the selected subset of logs to identify one or more data fields to extract. Once the selected subset of logs are labeled, these labeled logs can be used as the training data to train the neural network.

    ONLINE MACHINE LEARNING ALGORITHM FOR A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20210117857A1

    公开(公告)日:2021-04-22

    申请号:US16779456

    申请日:2020-01-31

    Applicant: Splunk Inc.

    Inventor: Ram Sriharsha

    Abstract: Systems and methods are described for processing ingested data using an online machine learning algorithm as the data is being ingested. For example, the online machine learning algorithm can be an adaptive thresholding algorithm used to identify outliers in a moving window of data. As another example, the online machine learning algorithm can be a sequential outlier detector that detects anomalous sequences of logs or events. As another example, the online machine learning algorithm can be a sentiment analyzer that determines whether text has a positive, negative, or neutral sentiment. As another example, the online machine learning algorithm can be a drift detector that detects whether ingested data marks the start of a change in the distribution of a time-series.

Patent Agency Ranking