Abstract:
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.
Abstract:
Distributed computing devices comprising a system for sharing computing resources can provide shared computing resources to users having sufficient resource credits. A user can earn resource credits by reliably offering a computing resource for sharing for a predetermined amount of time. The conversion rate between the amount of credits awarded, and the computing resources provided by a user can be varied to maintain balance within the system, and to foster beneficial user behavior. Once earned, the credits can be used to fund the user's account, joint accounts which include the user and others, or others' accounts that do not provide any access to the user. Computing resources can be exchanged on a peer-to-peer basis, though a centralized mechanism can link relevant peers together. To verify integrity, and protect against maliciousness, offered resources can be periodically tested.
Abstract:
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.
Abstract:
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.
Abstract:
Human Interaction Proofs ("HIPs", sometimes referred to as "captchas"), may be generated automatically. An captcha specification language may be defined, which allows a captcha scheme to be defined in terms of how symbols are to be chosen and drawn, and how those symbols are obscured. The language may provide mechanisms to specify the various ways in which to obscure symbols. New captcha schemes may be generated from existing specifications, by using genetic algorithms that combine features from existing captcha schemes that have been successful. Moreover, the likelihood that a captcha scheme has been broken by attackers may be estimated by collecting data on the time that it takes existing captcha schemes to be broken, and using regression to estimate the time to breakage as a function of either the captcha's features or its measured quality.
Abstract:
A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.
Abstract:
A method and system for implementing character recognition is described herein. An input character is received. The input character is composed of one or more logical structures in a particular layout. The layout of the one or more logical structures is identified. One or more of a plurality of classifiers are selected based on the layout of the one or more logical structures in the input character. The entire character is input into the selected classifiers. The selected classifiers classify the logical structures. The outputs from the selected classifiers are then combined to form an output character vector.
Abstract:
Systems and methods are disclosed that facilitate normalizing and beautifying digitally generated handwriting, such as can be generated on a tablet PC or via scanning a handwritten document. A classifier can identify extrema in the digital handwriting and label such extrema according to predefined categories (e.g., bottom, baseline, midline, top, other, …) . Multi-linear regression, polynomial regression, etc., can be performed to align labeled extrema to respective and corresponding desired points as indicated by the labels. Additionally, displacement techniques can be applied to the regressed handwriting to optimize legibility for reading by a human viewer and/or for character recognition by a handwriting recognition application. The displacement techniques can comprise a "rubber sheet" displacement algorithm in conjunction with a "rubber rod" displacement algorithm, which can collectively preserve spatial features of the handwriting during warping thereof.
Abstract:
Human Interaction Proofs (“HIPs”, sometimes referred to as “captchas”), may be generated automatically. An captcha specification language may be defined, which allows a captcha scheme to be defined in terms of how symbols are to be chosen and drawn, and how those symbols are obscured. The language may provide mechanisms to specify the various ways in which to obscure symbols. New captcha schemes may be generated from existing specifications, by using genetic algorithms that combine features from existing captcha schemes that have been successful. Moreover, the likelihood that a captcha scheme has been broken by attackers may be estimated by collecting data on the time that it takes existing captcha schemes to be broken, and using regression to estimate the time to breakage as a function of either the captcha's features or its measured quality.
Abstract:
Systems and methods are disclosed that facilitate normalizing and beautifying digitally generated handwriting, such as can be generated on a tablet PC or via scanning a handwritten document. A classifier can identify extrema in the digital handwriting and label such extrema according to predefined categories (e.g., bottom, baseline, midline, top, other, …) . Multi-linear regression, polynomial regression, etc., can be performed to align labeled extrema to respective and corresponding desired points as indicated by the labels. Additionally, displacement techniques can be applied to the regressed handwriting to optimize legibility for reading by a human viewer and/or for character recognition by a handwriting recognition application. The displacement techniques can comprise a 'rubber sheet' displacement algorithm in conjunction with a 'rubber rod' displacement algorithm, which can collectively preserve spatial features of the handwriting during warping thereof.