ALIGNING LARGE LANGUAGE MODELS WITH SPECIFIC OBJECTIVES USING REINFORCEMENT LEARNING AND HUMAN PREFERENCE

    公开(公告)号:US20240289632A1

    公开(公告)日:2024-08-29

    申请号:US18588622

    申请日:2024-02-27

    Applicant: Maplebear Inc.

    CPC classification number: G06N3/092 H04L51/02

    Abstract: An online system trains a specific-purpose LLM. The online system obtains training examples and divides training examples across batches. The online system generates a specific response by applying parameters of the specific-purpose LLM to a batch of training examples. The online system generates a general response by applying parameters of a general-purpose LLM to the batch of training examples. The online system computes a human readability score representing the difference between the specific response and the general response. The online system computes an objective compliance score by applying an evaluation model to the specific response, the evaluation model trained to score the first response based on a specific objective. The online system updates the parameters of the specific-purpose LLM based on the human readability score and the objective compliance score.

    PROVIDING AND DISPLAYING SEARCH RESULTS IN RESPONSE TO A QUERY

    公开(公告)号:US20240249335A1

    公开(公告)日:2024-07-25

    申请号:US18159357

    申请日:2023-01-25

    CPC classification number: G06Q30/0631 G06F16/9535 G06Q30/0201

    Abstract: An online system displays search results in response to a query by receiving a query from a customer. An online system accesses a set of candidate items and computes a relevance score and personalization score for each item. The online system computes the relevance score based on query data and item data and may normalize the relevance score. The online system computes the personalization score based on item data, such as an item embedding, and user data, such as a user embedding. The online system computes a query specificity score and adjusts the personalization score with the query specificity score such that generic queries have high personalization scores and specific queries have low personalization scores. The online system combines the relevance and personalization scores for each candidate item into a ranking score and displays the candidate items to the customer based on their ranking scores.

    Systems and Methods for Intelligent Promotion Design with Promotion Scoring

    公开(公告)号:US20240193637A1

    公开(公告)日:2024-06-13

    申请号:US18582839

    申请日:2024-02-21

    Applicant: Maplebear Inc.

    Inventor: Michael Montero

    Abstract: Systems and methods for scoring promotions are provided. A set of training offers are received, which include combinations of variable values. These combinations of variable values are converted into a vector value. The offers are paired and the vectors subtracted from one another, resulting in a pair vector. Metrics for the success of offers is collected, and are subtracted from one another for the paired offers to generate a raw score. This raw score is then normalized using the pair vector. The normalized scores are utilized to generate a model for the impact any variable value has on offer success, which may then be applied, using linear regression, to new offers to generate an expected level of success. The new scored offers are ranked and the top-ranked offers are selected for inclusion in a promotional campaign.

    Cumulative incrementality scores for evaluating the performance of machine learning models

    公开(公告)号:US11972464B2

    公开(公告)日:2024-04-30

    申请号:US17752800

    申请日:2022-05-24

    Applicant: Maplebear Inc.

    CPC classification number: G06Q30/0601 G06N5/022

    Abstract: An online concierge system uses a cumulative incrementality score to evaluate the performance of incrementality models used by the online concierge system to identify users for treatment. The online concierge system applies an incrementality model to a set of examples to generate predicted incrementality scores for the examples. The online concierge system ranks the examples based on the predicted incrementality scores for the examples and groups the examples based on their rankings. The online concierge system iteratively computes cumulative incrementality scores for each grouping based on the examples of each grouping, and computes a final cumulative incrementality score for the incrementality model based on each of the cumulative incrementality scores.

Patent Agency Ranking