GENERATING NATURAL LANGUAGE MODEL INSIGHTS FOR DATA CHARTS USING LIGHT LANGUAGE MODELS DISTILLED FROM LARGE LANGUAGE MODELS

    公开(公告)号:US20240320421A1

    公开(公告)日:2024-09-26

    申请号:US18338033

    申请日:2023-06-20

    Applicant: Adobe Inc.

    CPC classification number: G06F40/186

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for generating naturally phrased insights about data charts using light language models distilled from large language models. To synthesize training data for the light language model, in some embodiments, the disclosed systems leverage insight templates for prompting a large language model for generating naturally phrased insights. In some embodiments, the disclosed systems anonymize and augment the synthesized training data to improve the accuracy and robustness of model predictions. For example, the disclosed systems anonymize training data by injecting noise into data charts before prompting the large language model for generating naturally phrased insights from insight templates. In some embodiments, the disclosed systems further augment the (anonymized) training data by splitting or partitioning data charts into folds that act as individual data charts.

Patent Agency Ranking