Abstract:
Various systems, methods, and programs embodied on a computer readable medium that facilitate monitoring of services and servers. In one embodiment, an amount of data is stored in at least one storage device, the data being generated by a plurality of services executed on a plurality of servers, and by the servers upon which the services are executed. A plurality of monitoring applications are executed in a monitoring server, the monitoring applications being configured to perform a plurality of monitoring functions with respect to at least a portion of the data to facilitate an assessment of an operating condition of the services and the servers. An interface layer surrounds the monitoring applications in the monitoring server. The interface layer defines a messaging format that is used by devices external to the interface layer to interact with the monitoring applications.
Abstract:
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.
Abstract:
When a digital item is submitted for publication, an automated system may determine whether the digital item includes content from other digital items. In some implementations, when the digital item is an electronic book (eBook), the automated system may select sets of words from the eBook and compute hash codes, such that each hash code corresponds to a set of words. The automated system may compare the computed hash codes with retained hash codes associated with other electronic books to determine whether the digital item includes duplicate content.
Abstract:
Disclosed in various embodiments are systems and methods providing for storage of mass data such as metrics. A plurality of data models are generated in the server from a stream of metrics describing a state of a system. Each of the metrics is associated with one of a plurality of consecutive periods of time, and each data model represents the metrics associated with a corresponding one of the consecutive periods of time. The data models are stored in a data store and each of the metrics is discarded after use in generating at least one of the data models.
Abstract:
A plurality of data models are generated in a server from a stream of metrics describing a state of at least one system. Each of the data models represents a time grouping of a subset of the metrics. One or more dimensions are associated with each of the metrics. The data models are stored in association with respective ones of the dimensions in a memory. The dimensions with which the data models are associated in the memory are increased based upon an appearance of at least one previously non-existing dimension associated with a metric in the stream.
Abstract:
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.
Abstract:
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.
Abstract:
Systems, devices, and processes for classifying a digital item are described. In some examples, a first classifier determines whether a digital item, such as an electronic book (eBook), includes content of a first category that is acceptable for publication by a publisher. A second classifier determines whether the digital item includes content of a second category that is acceptable for publication by a publisher. In response to determining that the digital item includes content of the first category or content of the second category, a third classifier may determine whether the digital item includes a phrase that is indicative of content of a third category that is unacceptable for publication.
Abstract:
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.