CAPACITY-BASED LOAD BALANCING IN SHARED RESOURCE POOL

    公开(公告)号:US20250094237A1

    公开(公告)日:2025-03-20

    申请号:US18470772

    申请日:2023-09-20

    Abstract: A system provides capacity-based load balancing across model endpoints of a cloud-based artificial intelligence (AI) model. The system includes a consumption determination engine executable to determine a net resource consumption for processing tasks in a workload generated by a client application for input to the trained machine learning model. The system also includes a load balancer that determines a distribution of available resource capacity in a shared resource pool comprising compute resources at each of the multiple model endpoints. The load balancer allocates parallelizable tasks of the workload among the compute resources at the multiple model endpoints based on the net resource consumption of the tasks and on the distribution of available resource capacity in the shared resource pool.

    CONGESTION CONTROL FOR AUTOMATIC COMPUTE CAPACITY SATURATION

    公开(公告)号:US20250094240A1

    公开(公告)日:2025-03-20

    申请号:US18470795

    申请日:2023-09-20

    Abstract: A disclosed method facilitates an increase in utilization with respect to a resource quota allocated to a tenant from a shared resource pool. The method includes transmitting a lease request to a quota service on behalf of the tenant, where the lease request identifies a processing task and specifies quantity of cloud-based resources requested from the shared resource pool for execution of the processing task. The method further provides for determining, based on a feedback signal received from the quota service, whether grant of the lease request would cause the tenant to exceed a resource quota allocated to the tenant and dynamically decreasing parallelism of active tasks being processed by the cloud-based resources on behalf of the tenant in response to determining that grant of the lease request would cause the tenant to exceed the resource quota limit.

    REQUEST SEGMENTATION FOR REDUCED MEMORY CONSUMPTION BY TRAINED SEQUENTIAL MODELS

    公开(公告)号:US20250094233A1

    公开(公告)日:2025-03-20

    申请号:US18470827

    申请日:2023-09-20

    Abstract: A disclosed method reduces memory consumption of a trained sequential model. The method includes receiving, from a client application, an initial processing request identifying an input sequence to be processed by the trained sequential model and an initial value for an output size parameter specifying a requested size of output from the trained sequential model. The method further includes sequentially transmitting, to the trained sequential model, multiple partial processing requests based on the initial processing request that each specify a fraction of the initial value as the output size parameter and receiving a sequence of output responses from the trained sequential model generated in response to processing the multiple partial processing requests. The method further provides for returning, to the client application, a final merged response that includes the sequence of output responses.

Patent Agency Ranking