-
公开(公告)号:US11663033B2
公开(公告)日:2023-05-30
申请号:US17179155
申请日:2021-02-18
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
CPC classification number: G06F9/46 , G06F16/211 , G06F16/14
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
2.
公开(公告)号:US20200065136A1
公开(公告)日:2020-02-27
申请号:US16667609
申请日:2019-10-29
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
3.
公开(公告)号:US20190138654A1
公开(公告)日:2019-05-09
申请号:US15943603
申请日:2018-04-02
Applicant: Cloudera, Inc.
Inventor: Sudhanshu Arora , Mark Donsky , Guang Yao Leng , Naren Koneru , Chang She , Vikas Singh , Himabindu Vuppula
Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.
-
公开(公告)号:US20210334301A1
公开(公告)日:2021-10-28
申请号:US17367194
申请日:2021-07-02
Applicant: Cloudera, Inc.
Inventor: Sudhanshu Arora , Mark Donsky , Guang Yao Leng , Naren Koneru , Chang She , Vikas Singh , Himabindu Vuppula
Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.
-
公开(公告)号:US10514948B2
公开(公告)日:2019-12-24
申请号:US15808805
申请日:2017-11-09
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
公开(公告)号:US11086917B2
公开(公告)日:2021-08-10
申请号:US16802196
申请日:2020-02-26
Applicant: Cloudera, Inc.
Inventor: Sudhanshu Arora , Mark Donsky , Guang Yao Leng , Naren Koneru , Chang She , Vikas Singh , Himabindu Vuppula
Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.
-
公开(公告)号:US20210173696A1
公开(公告)日:2021-06-10
申请号:US17179155
申请日:2021-02-18
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
公开(公告)号:US20190138345A1
公开(公告)日:2019-05-09
申请号:US15808805
申请日:2017-11-09
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
IPC: G06F9/46
CPC classification number: G06F9/46 , G06F16/14 , G06F16/211
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
公开(公告)号:US11663257B2
公开(公告)日:2023-05-30
申请号:US17367194
申请日:2021-07-02
Applicant: Cloudera, Inc.
Inventor: Sudhanshu Arora , Mark Donsky , Guang Yao Leng , Naren Koneru , Chang She , Vikas Singh , Himabindu Vuppula
CPC classification number: G06F16/345 , G06F9/45558 , G06F16/288 , G06F16/38 , G06N5/04 , G06F16/182 , G06F2009/4557
Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.
-
公开(公告)号:US10929173B2
公开(公告)日:2021-02-23
申请号:US16667609
申请日:2019-10-29
Applicant: Cloudera, Inc.
Inventor: Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She
Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.
-
-
-
-
-
-
-
-
-