RULE ENABLED COMPOSITIONAL REASONING SYSTEM
    3.
    发明申请

    公开(公告)号:WO2022060574A1

    公开(公告)日:2022-03-24

    申请号:PCT/US2021/048811

    申请日:2021-09-02

    Abstract: A computer-implemented method is provided for compositional reasoning. The method includes producing (320) a set of primitive predictions from an input sequence. Each of the primitive predictions is of a single action of a tracked subject to be composed in a complex action comprising multiple single actions. The method further includes performing (330) contextual rule filtering of the primitive predictions to pass through filtered primitive predictions that interact with one or more entities of interest in the input sequence with respect to predefined contextual interaction criteria. The method includes performing (340), by a processor device, temporal rule matching by matching the filtered primitive predictions according to pre-defined temporal rules to identify complex event patterns in the sequence of primitive predictions.

    ACTION RECOGNITION WITH HIGH-ORDER INTERACTION THROUGH SPATIAL-TEMPORAL OBJECT TRACKING

    公开(公告)号:WO2021050772A1

    公开(公告)日:2021-03-18

    申请号:PCT/US2020/050254

    申请日:2020-09-10

    Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.

    SELF-SUPERVISED SEQUENTIAL VARIATIONAL AUTOENCODER FOR DISENTANGLED DATA GENERATION

    公开(公告)号:WO2021096739A1

    公开(公告)日:2021-05-20

    申请号:PCT/US2020/058857

    申请日:2020-11-04

    Abstract: A computer-implemented method is provided for disentangled data generation. The method includes accessing (410), by a variational autoencoder, a plurality of supervision signals. The method further includes accessing (420), by the variational autoencoder, a plurality of auxiliary tasks that utilize the supervision signals as reward signals to learn a disentangled representation. The method also includes training (430) the variational autoencoder to disentangle a sequential data input into a time-invariant factor and a time- varying factor using a self-supervised training approach which is based on outputs of the auxiliary tasks obtained by using the supervision signals to accomplish the plurality of auxiliary tasks.

    SPATIO-TEMPORAL INTERACTIONS FOR VIDEO UNDERSTANDING

    公开(公告)号:WO2021050769A1

    公开(公告)日:2021-03-18

    申请号:PCT/US2020/050251

    申请日:2020-09-10

    Abstract: Aspects of the present disclosure describe systems, methods and structures including a network that recognizes action(s) from learned relationship(s) between various objects in video(s). Interaction(s) of objects over space and time is learned from a series of frames of the video. Object-like representations are learned directly from various 2D CNN layers by capturing the 2D CNN channels, resizing them to an appropriate dimension and then providing them to a transformer network that learns higher-order relationship(s) between them. To effectively learn object-like representations, we 1) combine channels from a first and last convolutional layer in the 2D CNN, and 2) optionally cluster the channel (feature map) representations so that channels representing the same object type are grouped together.

    SPATIO-TEMPORAL INTERACTION NETWORK FOR LEARNING OBJECT INTERACTIONS

    公开(公告)号:WO2019013913A1

    公开(公告)日:2019-01-17

    申请号:PCT/US2018/036814

    申请日:2018-06-11

    Abstract: Systems and methods for improving video understanding tasks based on higher-order object interactions (HOIs) between object features are provided. A plurality of frames of a video are obtained. A coarse-grained feature representation is generated by generating an image feature for each of for each of a plurality of timesteps respectively corresponding to each of the frames and performing attention based on the image features. A fine-grained feature representation is generated by generating an object feature for each of the plurality of timesteps and generating the HOIs between the object features. The coarse-grained and the fine-grained feature representations are concatenated to generate a concatenated feature representation.

    SEMI-AUTOMATIC DATA COLLECTION AND ASSOCIATION FOR MULTI-CAMERA TRACKING

    公开(公告)号:WO2022250970A1

    公开(公告)日:2022-12-01

    申请号:PCT/US2022/028945

    申请日:2022-05-12

    Abstract: A surveillance system is provided. The surveillance system includes a processor device (110) configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and. collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi -camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.

    MULTI-HOP TRANSFORMER FOR SPATIO-TEMPORAL REASONING AND LOCALIZATION

    公开(公告)号:WO2022066388A1

    公开(公告)日:2022-03-31

    申请号:PCT/US2021/048832

    申请日:2021-09-02

    Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting (1001) feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing (1003) object representation learning and detection, linking (1005) objects through time via tracking to generate object tracks and image feature tracks, feeding (1007) the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing (1009) video representation learning and recognition from the objects and image context to locate a target object within the video stream.

    KEYPOINT BASED POSE-TRACKING USING ENTAILMENT

    公开(公告)号:WO2021050773A1

    公开(公告)日:2021-03-18

    申请号:PCT/US2020/050255

    申请日:2020-09-10

    Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be reused for other pose tasks such as pose-based action recognition

Patent Agency Ranking