-
公开(公告)号:US20180143861A1
公开(公告)日:2018-05-24
申请号:US15873095
申请日:2018-01-17
Applicant: Ab Initio Technology LLC
Inventor: Tim Wakeling , Mark Buxbaum , Mark Staknis
IPC: G06F9/50
CPC classification number: G06F9/5038 , G06F2209/506
Abstract: Managing task execution includes: receiving a specification of a plurality of tasks to be performed by respective functional modules; processing a flow of input data using a dataflow graph that includes nodes representing data processing components connected by links representing flows of data between data processing components; in response to at least one flow of data provided by at least one data processing component, generating a flow of messages; and in response to each of the messages in the flow of messages, performing an iteration of a set of one or more tasks using one or more corresponding functional modules.
-
公开(公告)号:US20170220646A1
公开(公告)日:2017-08-03
申请号:US15431984
申请日:2017-02-14
Applicant: Ab Initio Technology LLC
Inventor: Ian Schechter , Tim Wakeling , Ann M. Wollrath
IPC: G06F17/30
CPC classification number: G06F16/2471 , G06F9/5066 , G06F16/13 , G06F16/1734 , G06F16/254 , G06F16/284 , G06F16/285 , G06F16/9024
Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
公开(公告)号:US09189529B2
公开(公告)日:2015-11-17
申请号:US13834491
申请日:2013-03-15
Applicant: Ab Initio Technology LLC
Inventor: Mark Buxbaum , Tim Wakeling
CPC classification number: G06F17/30563 , G06F17/30572
Abstract: A method includes receiving information provided by a data processing application during execution of the data processing application. The information is indicative of at least one of a source of data for the data processing application and a destination of data from the data processing application. The method includes dynamically analyzing the information during execution of the data processing application to identify a queue in communication with the data processing application; and dynamically analyzing the information during execution of the data processing application to identify a relationship between the data processing application and the queue, including at least one of identifying that the queue is the source of data for the data processing application and identifying that the queue is the destination of data from the data processing application.
Abstract translation: 一种方法包括在数据处理应用的执行期间接收由数据处理应用提供的信息。 该信息指示数据处理应用的数据源和来自数据处理应用的数据的目的地中的至少一个。 该方法包括在执行数据处理应用程序期间动态地分析信息以识别与数据处理应用程序通信的队列; 以及在执行所述数据处理应用程序期间动态地分析所述信息以识别所述数据处理应用程序与所述队列之间的关系,所述关系包括识别所述队列是所述数据处理应用程序的数据源并识别所述队列 来自数据处理应用程序的数据的目的地。
-
公开(公告)号:US20220365928A1
公开(公告)日:2022-11-17
申请号:US17878106
申请日:2022-08-01
Applicant: Ab Initio Technology LLC
Inventor: Ian Schechter , Tim Wakeling , Ann M. Wollrath
IPC: G06F16/2458 , G06F16/13 , G06F16/25 , G06F16/28 , G06F16/17 , G06F16/901 , G06F9/50
Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
公开(公告)号:US11188381B2
公开(公告)日:2021-11-30
申请号:US16294329
申请日:2019-03-06
Applicant: Ab Initio Technology LLC
Inventor: Dino LaChiusa , Joyce L. Vigneau , Mark Buxbaum , Brad Lee Miller , Tim Wakeling
Abstract: A method of managing components in a processing environment is provided. The method includes monitoring (i) a status of each of one or more computing devices, (ii) a status of each of one or more applications, each application hosted by at least one of the computing devices, and (iii) a status of each of one or more jobs, each job associated with at least one of the applications; determining that one of the status of one of the computing devices, the status of one of the applications, and the status of one of the jobs is indicative of a performance issue associated with the corresponding computing device, application, or job, the determination being made based on a comparison of a performance of the computing device, application, or job and at least one predetermined criterion; and enabling an action to be performed associated with the performance issue.
-
公开(公告)号:US20200319932A1
公开(公告)日:2020-10-08
申请号:US16906193
申请日:2020-06-19
Applicant: Ab Initio Technology LLC
Inventor: Harry Michael Wolfson , Joel Gould , Anthony Yeracaris , Tim Wakeling
IPC: G06F9/50
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.
-
公开(公告)号:US20200311098A1
公开(公告)日:2020-10-01
申请号:US16902949
申请日:2020-06-16
Applicant: Ab Initio Technology LLC
Inventor: Tim Wakeling , Adam Weiss
IPC: G06F16/25 , G06F16/28 , G06F16/2457 , G06F40/197
Abstract: Mapping data stored in a data storage system for use by a computer system includes processing specifications of dataflow graphs that include nodes representing computations interconnected by links representing flows of data. At least one of the dataflow graphs receives a flow of data from at least one input dataset and at least one of the dataflow graphs provides a flow of data to at least one output dataset. A mapper identifies one or more sets of datasets. Each dataset in a given set matches one or more criteria for identifying different versions of a single dataset. A user interface is provided to receive a mapping between at least two datasets in a given set. The mapping received over the user interface is stored in association with a dataflow graph that provides data to or receives data from the datasets of the mapping.
-
公开(公告)号:US20200265047A1
公开(公告)日:2020-08-20
申请号:US16865975
申请日:2020-05-04
Applicant: Ab Initio Technology LLC
Inventor: Ian Schechter , Tim Wakeling , Ann M. Wollrath
IPC: G06F16/2458 , G06F9/50 , G06F16/901 , G06F16/17 , G06F16/28 , G06F16/25 , G06F16/13
Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
公开(公告)号:US10528395B2
公开(公告)日:2020-01-07
申请号:US15873095
申请日:2018-01-17
Applicant: Ab Initio Technology LLC
Inventor: Tim Wakeling , Mark Buxbaum , Mark Staknis
Abstract: Managing task execution includes: receiving a specification of a plurality of tasks to be performed by respective functional modules; processing a flow of input data using a dataflow graph that includes nodes representing data processing components connected by links representing flows of data between data processing components; in response to at least one flow of data provided by at least one data processing component, generating a flow of messages; and in response to each of the messages in the flow of messages, performing an iteration of a set of one or more tasks using one or more corresponding functional modules.
-
公开(公告)号:US09507682B2
公开(公告)日:2016-11-29
申请号:US13678921
申请日:2012-11-16
Applicant: Ab Initio Technology LLC
Inventor: Mark Buxbaum , Michael G. Mulligan , Tim Wakeling , Matthew Darcy Atterbury
CPC classification number: G06F11/3041 , G06F11/3003 , G06F11/3082 , G06F11/323 , G06F11/3404 , G06F11/3419 , G06F11/3476 , G06F2201/865 , G06Q30/0201
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic graph performance monitoring. One of the methods includes receiving multiple units of work that each include one or more work elements. The method includes determining a characteristic of the first unit of work. The method includes identifying, by a component of the first dataflow graph, a second dataflow graph from multiple available dataflow graphs based on the determined characteristic, the multiple available dataflow graphs being stored in a data storage system. The method includes processing the first unit of work using the second dataflow graph. The method includes determining one or more performance metrics associated with the processing.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于动态图表性能监视。 其中一种方法包括接收多个工作单元,每个单元包括一个或多个工作单元。 该方法包括确定第一工作单元的特性。 该方法包括基于所确定的特性,通过第一数据流图的分量来识别来自多个可用数据流图的第二数据流图,所述多个可用数据流图被存储在数据存储系统中。 该方法包括使用第二数据流图处理第一工作单元。 该方法包括确定与该处理相关联的一个或多个性能度量。
-
-
-
-
-
-
-
-
-