PROCESSING DATA FROM MULTIPLE SOURCES
    22.
    发明申请

    公开(公告)号:US20170220646A1

    公开(公告)日:2017-08-03

    申请号:US15431984

    申请日:2017-02-14

    Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.

    Queue monitoring and visualization
    23.
    发明授权
    Queue monitoring and visualization 有权
    队列监控和可视化

    公开(公告)号:US09189529B2

    公开(公告)日:2015-11-17

    申请号:US13834491

    申请日:2013-03-15

    CPC classification number: G06F17/30563 G06F17/30572

    Abstract: A method includes receiving information provided by a data processing application during execution of the data processing application. The information is indicative of at least one of a source of data for the data processing application and a destination of data from the data processing application. The method includes dynamically analyzing the information during execution of the data processing application to identify a queue in communication with the data processing application; and dynamically analyzing the information during execution of the data processing application to identify a relationship between the data processing application and the queue, including at least one of identifying that the queue is the source of data for the data processing application and identifying that the queue is the destination of data from the data processing application.

    Abstract translation: 一种方法包括在数据处理应用的执行期间接收由数据处理应用提供的信息。 该信息指示数据处理应用的数据源和来自数据处理应用的数据的目的地中的至少一个。 该方法包括在执行数据处理应用程序期间动态地分析信息以识别与数据处理应用程序通信的队列; 以及在执行所述数据处理应用程序期间动态地分析所述信息以识别所述数据处理应用程序与所述队列之间的关系,所述关系包括识别所述队列是所述数据处理应用程序的数据源并识别所述队列 来自数据处理应用程序的数据的目的地。

    PROCESSING DATA FROM MULTIPLE SOURCES

    公开(公告)号:US20220365928A1

    公开(公告)日:2022-11-17

    申请号:US17878106

    申请日:2022-08-01

    Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.

    Integrated monitoring and control of processing environment

    公开(公告)号:US11188381B2

    公开(公告)日:2021-11-30

    申请号:US16294329

    申请日:2019-03-06

    Abstract: A method of managing components in a processing environment is provided. The method includes monitoring (i) a status of each of one or more computing devices, (ii) a status of each of one or more applications, each application hosted by at least one of the computing devices, and (iii) a status of each of one or more jobs, each job associated with at least one of the applications; determining that one of the status of one of the computing devices, the status of one of the applications, and the status of one of the jobs is indicative of a performance issue associated with the corresponding computing device, application, or job, the determination being made based on a comparison of a performance of the computing device, application, or job and at least one predetermined criterion; and enabling an action to be performed associated with the performance issue.

    WORKLOAD AUTOMATION AND DATA LINEAGE ANALYSIS

    公开(公告)号:US20200319932A1

    公开(公告)日:2020-10-08

    申请号:US16906193

    申请日:2020-06-19

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.

    MAPPING INSTANCES OF A DATASET WITHIN A DATA MANAGEMENT SYSTEM

    公开(公告)号:US20200311098A1

    公开(公告)日:2020-10-01

    申请号:US16902949

    申请日:2020-06-16

    Abstract: Mapping data stored in a data storage system for use by a computer system includes processing specifications of dataflow graphs that include nodes representing computations interconnected by links representing flows of data. At least one of the dataflow graphs receives a flow of data from at least one input dataset and at least one of the dataflow graphs provides a flow of data to at least one output dataset. A mapper identifies one or more sets of datasets. Each dataset in a given set matches one or more criteria for identifying different versions of a single dataset. A user interface is provided to receive a mapping between at least two datasets in a given set. The mapping received over the user interface is stored in association with a dataflow graph that provides data to or receives data from the datasets of the mapping.

    PROCESSING DATA FROM MULTIPLE SOURCES
    28.
    发明申请

    公开(公告)号:US20200265047A1

    公开(公告)日:2020-08-20

    申请号:US16865975

    申请日:2020-05-04

    Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.

    Dynamic graph performance monitoring
    30.
    发明授权
    Dynamic graph performance monitoring 有权
    动态图性能监控

    公开(公告)号:US09507682B2

    公开(公告)日:2016-11-29

    申请号:US13678921

    申请日:2012-11-16

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic graph performance monitoring. One of the methods includes receiving multiple units of work that each include one or more work elements. The method includes determining a characteristic of the first unit of work. The method includes identifying, by a component of the first dataflow graph, a second dataflow graph from multiple available dataflow graphs based on the determined characteristic, the multiple available dataflow graphs being stored in a data storage system. The method includes processing the first unit of work using the second dataflow graph. The method includes determining one or more performance metrics associated with the processing.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于动态图表性能监视。 其中一种方法包括接收多个工作单元,每个单元包括一个或多个工作单元。 该方法包括确定第一工作单元的特性。 该方法包括基于所确定的特性,通过第一数据流图的分量来识别来自多个可用数据流图的第二数据流图,所述多个可用数据流图被存储在数据存储系统中。 该方法包括使用第二数据流图处理第一工作单元。 该方法包括确定与该处理相关联的一个或多个性能度量。

Patent Agency Ranking