Scripting distributed, parallel programs
Abstract:
Provided is a process having steps including obtaining a specification of a data analysis to be performed in parallel on a computing cluster; parsing the specification of the data analysis; determining which data is implicated in each portion of the data analysis to be assigned to a plurality of computing nodes of the computing cluster; determining that a portion of the implicated data is not already present in memory of at least some of the plurality of computing nodes of the computing cluster; distributing the portion of the implicated data according to an index that positions related values of the data on the same computing nodes of the computing cluster; determining which computing nodes of the computing cluster have data relevant to which rules in the data analysis and send relevant rules to the corresponding computing nodes; executing the rules on the computing nodes; and aggregating results of executing the rules.
Public/Granted literature
Information query
Patent Agency Ranking
0/0