Representation learning for tax rule bootstrapping
Abstract:
A rule having text is pre-processed by replacing terms with dummy tokens. A first machine learning model (MLM) uses the dummy tokens to generate a dependency graph with nodes related by edges tagged with dependency tags. A second MLM uses the dependency graph to generate a canonical version with node labels. The node labels are sorted into a lexicographic order to form a document. A third MLM uses the document to generate a machine readable vector (MRV) that embeds the document as a sequence of numbers representative of a structure of the rule. The MRV is compared to additional MRVs corresponding to additional rules for which computer useable program code blocks have been generated. A set of MRVs is identified that match the MRV within a range. The set of MRVs correspond to a set of rules from the additional rules. The set of rules is displayed to a user.
Information query
Patent Agency Ranking
0/0