System and method for executing data processing tasks using resilient distributed datasets (RDDs) in a storage device
Abstract:
A system and method of providing enhanced data processing and analysis in an infrastructure for distributed computing and large-scale data processing. This infrastructure uses the Apache Spark framework to divide an application into a large number of small fragments of work, each of which may be performed on one of a large number of compute nodes. The work may involve Spark transformations, operations, and actions, which may be used to categorize and analyze large amounts of data in distributed systems. This infrastructure includes a cluster with a driver node and a plurality of worker nodes. The worker nodes may be, or may include, intelligent solid state drives capable of executing data processing functions under the Apache Spark framework. The use of intelligent solid state drives reduces the need to exchange data with a central processing unit (CPU) in a server.
Information query
Patent Agency Ranking
0/0