Zixi Cai

CV
3papers
282citations
Novelty52%
AI Score43

3 Papers

44.4DCJun 3
Clownfish: Scaling DAG-based BFT Consensus via Sparse Edges

Feifan Wang, Jingfan Yu, Zixi Cai et al.

Directed Acyclic Graph (DAG) based BFT protocols have demonstrated the capability to achieve significantly high throughput in practice. Recent advancements focused on minimizing the good-case latency of these protocols, approaching the theoretical lower bound. However, the high communication complexity inherent in existing DAG-based protocols limits their scalability. This primarily arises because each vertex in the DAG must include a linear number of edges (references) to vertices from previous rounds. We present Clownfish, a partially synchronous DAG-based BFT protocol designed to address the scalability bottleneck. Clownfish achieves lower communication complexity by selectively reducing the number of edges in DAG vertices. When using a communication-optimal consistent broadcast, Clownfish attains quadratic total communication complexity per round, outperforming prior DAG-based protocols. Clownfish also reduces the additional latency in failure cases by optimizing the round advancement rule. Additionally, Clownfish supports multiple leaders per round to reduce average latency while maintaining its lower communication complexity. Our experimental evaluation demonstrates that Clownfish provides significantly better scalability than existing DAG-based protocols.

CVDec 22, 2018
Temporal Hockey Action Recognition via Pose and Optical Flows

Zixi Cai, Helmut Neher, Kanav Vats et al.

Recognizing actions in ice hockey using computer vision poses challenges due to bulky equipment and inadequate image quality. A novel two-stream framework has been designed to improve action recognition accuracy for hockey using three main components. First, pose is estimated via the Part Affinity Fields model to extract meaningful cues from the player. Second, optical flow (using LiteFlowNet) is used to extract temporal features. Third, pose and optical flow streams are fused and passed to fully-connected layers to estimate the hockey player's action. A novel publicly available dataset named HARPET (Hockey Action Recognition Pose Estimation, Temporal) was created, composed of sequences of annotated actions and pose of hockey players including their hockey sticks as an extension of human body pose. Three contributions are recognized. (1) The novel two-stream architecture achieves 85% action recognition accuracy, with the inclusion of optical flows increasing accuracy by about 10%. (2) The unique localization of hand-held objects (e.g., hockey sticks) as part of pose increases accuracy by about 13%. (3) For pose estimation, a bigger and more general dataset, MSCOCO, is successfully used for transfer learning to a smaller and more specific dataset, HARPET, achieving a PCKh of 87%.

CVMar 28, 2018
Pose2Seg: Detection Free Human Instance Segmentation

Song-Hai Zhang, Ruilong Li, Xin Dong et al.

The standard approach to image instance segmentation is to perform the object detection first, and then segment the object from the detection bounding-box. More recently, deep learning methods like Mask R-CNN perform them jointly. However, little research takes into account the uniqueness of the "human" category, which can be well defined by the pose skeleton. Moreover, the human pose skeleton can be used to better distinguish instances with heavy occlusion than using bounding-boxes. In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, rather than proposal region detection. We demonstrate that our pose-based framework can achieve better accuracy than the state-of-art detection-based approach on the human instance segmentation problem, and can moreover better handle occlusion. Furthermore, there are few public datasets containing many heavily occluded humans along with comprehensive annotations, which makes this a challenging problem seldom noticed by researchers. Therefore, in this paper we introduce a new benchmark "Occluded Human (OCHuman)", which focuses on occluded humans with comprehensive annotations including bounding-box, human pose and instance masks. This dataset contains 8110 detailed annotated human instances within 4731 images. With an average 0.67 MaxIoU for each person, OCHuman is the most complex and challenging dataset related to human instance segmentation. Through this dataset, we want to emphasize occlusion as a challenging problem for researchers to study.