Scalable streaming decision tree learning
Abstract:
In one embodiment, a computer-implemented method includes receiving training data including a plurality of records, each record having a plurality of attributes. The training data is horizontally parallelized across two or more processing elements. This horizontal parallelizing includes dividing the training data into two or more subsets of records; assigning each subset of records to a corresponding processing element of the two or more processing elements; transmitting each subset of records to its assigned processing element; and sorting, at the two or more processing elements, the two or more subsets of records to two or more candidate leaves of a decision tree. The output from horizontally parallelizing is converted into input for vertically parallelizing the training data. The training data is vertically parallelized across the two or more processing elements. The decision tree is grown based at least in part on the horizontally parallelizing, the converting, and the vertically parallelizing.
Public/Granted literature
Information query
Patent Agency Ranking
0/0