Abstract:
One or more techniques and/or systems are provided for providing a discussion summary corresponding to a search query and/or for providing discussion session search results. For example, discussion data (e.g., corresponding to real-time messaging, such as a microblog discussion) may be evaluated to identify a discussion topic for a discussion sessions (e.g., a kitchen renovation topic may be assigned to a 1 hour exchange of kitchen renovation messages by a discussion group). A discussion summary of a discussion session may be provided based upon the discussion session having a discussion topic corresponding to a search query topic of a search query. The discussion summary may be provided along with other results for the query and may describe the discussion group, identifiers such as hashtags used by the discussion group, meeting dates/times, average number(s) of participants, other discussion sessions hosted by the discussion group, future discussion sessions, and/or other information.
Abstract:
The privacy of a dataset is protected. A private dataset is received that includes multiple rows of multidimensional data. Each row may correspond to a user, and each dimension may be an attribute of the user. A projection matrix is applied to each row to generate a lower dimensional sketch of the row. Noise is added to each of the lower dimensional sketches. The sketches with the added noise may be published together with the projection matrix. The sketches preserve geometric relationships of the original dataset including clustering, distances, and nearest neighbor, and therefore may be useful for data mining purposes while still protecting the privacy of the users.
Abstract:
A document is received for segmentation. The document includes multiple atomic textual units in a sequence. These units may correspond to sentences, phrases, paragraphs, concept phrases, chapters, etc. A distance function is selected that determines a distance between one set of atomic textual units and another set of atomic textual units. The distance between the sets is large for sets that are dissimilar, and small for sets that are similar. The distance function is applied to the atomic textual units to separate each of the atomic textual units into multiple segments, while maintaining the sequence of the atomic textual units.
Abstract:
Text messages over some period of time are collected. Topic identifiers, such as hashtags, are extracted from the text messages. The text messages associated with each topic identifier are processed to identify which topic identifiers are associated with group chats based on information associated with the text messages such as the times when the text messages were generated and whether the text messages identify user accounts. The topic identifiers that are determined to be associated with the group chats are incorporated into applications that allow users to search for group chats, and to view text messages from past group chats.
Abstract:
A document such as a book or textbook includes multiple sections such as chapters. Concept phrases are determined for each of the sections based on the text of each section. A set of content items such as videos is received, and each content item is associated with one or more queries that were submitted by users who were provided the content item in a set of search results. These queries are processed to determine concept phrases that are associated with the content items. The content items and their associated concept phrases are compared with the concept phrases associated with the sections to determine, for some or all of the content items, a minimum subset of the sections whose associated concept phrases cover most of the concept phrases that are associated with the content item. The content items are inserted or linked with the sections in their corresponding minimum subsets.
Abstract:
Messages are collected and processed to determine topic identifiers that correspond to discussion groups. Queries are received and multiple discussion groups that are relevant to the query are determined based on the messages that are associated with the discussion groups and the topic identifiers associated with the discussion groups. The relevant discussion groups are ranked using a group preference model that simulates the behavior of a hypothetical seeker that considers discussion groups by selecting a message author who is an authority in a particular group, and exploring the discussion groups that are preferred by the selected author. The behavior of the seeker is simulated using a stationary Markov process and is used to generate a probability distribution that is used to rank the relevant discussion groups. The ranked relevant discussion groups are provided in response to the query.
Abstract:
Systems, methods, and computer storage media are provided for generating rich navigational study aids for electronic books. For a particular section of interest in a document, one or more related sections for providing additional context to the particular section are determined. The related sections are ranked based on a score indicating significance to the particular section. Based on a user's information processing preference, a set of ranked navigational links to each related section is presented to the user for additional context related to the particular section.
Abstract:
A document such as a book or textbook includes multiple sections such as chapters. Concept phrases are determined for each of the sections based on the text of each section. A set of content items such as videos is received, and each content item is associated with one or more queries that were submitted by users who were provided the content item in a set of search results. These queries are processed to determine concept phrases that are associated with the content items. The content items and their associated concept phrases are compared with the concept phrases associated with the sections to determine, for some or all of the content items, a minimum subset of the sections whose associated concept phrases cover most of the concept phrases that are associated with the content item. The content items are inserted or linked with the sections in their corresponding minimum subsets.
Abstract:
Messages are collected and processed to determine topic identifiers that correspond to discussion groups. Queries are received and multiple discussion groups that are relevant to the query are determined based on the messages that are associated with the discussion groups and the topic identifiers associated with the discussion groups. The relevant discussion groups are ranked using a group preference model that simulates the behavior of a hypothetical seeker that considers discussion groups by selecting a message author who is an authority in a particular group, and exploring the discussion groups that are preferred by the selected author. The behavior of the seeker is simulated using a stationary Markov process and is used to generate a probability distribution that is used to rank the relevant discussion groups. The ranked relevant discussion groups are provided in response to the query.