WEAKLY SUPERVISED EXTRACTION OF ATTRIBUTES FROM UNSTRUCTURED DATA TO GENERATE TRAINING DATA FOR MACHINE LEARNING MODELS

    公开(公告)号:US20250117442A1

    公开(公告)日:2025-04-10

    申请号:US18987482

    申请日:2024-12-19

    Applicant: Maplebear Inc.

    Abstract: An online concierge system receives unstructured data describing items offered for purchase by various warehouses. To generate attributes for products from the unstructured data, the online concierge system extracts candidate values for attributes from the unstructured data through natural language processing. One or more users associate a subset candidate values with corresponding attributes, and the online concierge system clusters the remaining candidate values with the candidate values of the subset associated with attributes. One or more users provide input on the accuracy of the generated clusters. The candidate values are applied as labels to items by the online concierge system, which uses the labeled items as training data for an attribute extraction model to predict values for one or more attributes from unstructured data about an item.

    WEAKLY SUPERVISED EXTRACTION OF ATTRIBUTES FROM UNSTRUCTURED DATA TO GENERATE TRAINING DATA FOR MACHINE LEARNING MODELS

    公开(公告)号:US20230058829A1

    公开(公告)日:2023-02-23

    申请号:US17407158

    申请日:2021-08-19

    Abstract: An online concierge system receives unstructured data describing items offered for purchase by various warehouses. To generate attributes for products from the unstructured data, the online concierge system extracts candidate values for attributes from the unstructured data through natural language processing. One or more users associate a subset candidate values with corresponding attributes, and the online concierge system clusters the remaining candidate values with the candidate values of the subset associated with attributes. One or more users provide input on the accuracy of the generated clusters. The candidate values are applied as labels to items by the online concierge system, which uses the labeled items as training data for an attribute extraction model to predict values for one or more attributes from unstructured data about an item.

Patent Agency Ranking