CVOct 23, 2023Code
GRLib: An Open-Source Hand Gesture Detection and Recognition Python LibraryJan Warchocki, Mikhail Vlasenko, Yke Bauke Eisma
Hand gesture recognition systems provide a natural way for humans to interact with computer systems. Although various algorithms have been designed for this task, a host of external conditions, such as poor lighting or distance from the camera, make it difficult to create an algorithm that performs well across a range of environments. In this work, we present GRLib: an open-source Python library able to detect and classify static and dynamic hand gestures. Moreover, the library can be trained on existing data for improved classification robustness. The proposed solution utilizes a feed from an RGB camera. The retrieved frames are then subjected to data augmentation and passed on to MediaPipe Hands to perform hand landmark detection. The landmarks are then classified into their respective gesture class. The library supports dynamic hand gestures through trajectories and keyframe extraction. It was found that the library outperforms another publicly available HGR system - MediaPipe Solutions, on three diverse, real-world datasets. The library is available at https://github.com/mikhail-vlasenko/grlib and can be installed with pip.
CYSep 19, 2024
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics examJoost de Winter, Dimitra Dodou, Yke Bauke Eisma
The processes underlying human cognition are often divided into System 1, which involves fast, intuitive thinking, and System 2, which involves slow, deliberate reasoning. Previously, large language models were criticized for lacking the deeper, more analytical capabilities of System 2. In September 2024, OpenAI introduced the o1 model series, designed to handle System 2-like reasoning. While OpenAI's benchmarks are promising, independent validation is still needed. In this study, we tested the o1-preview model twice on the Dutch 'Mathematics B' final exam. It scored a near-perfect 76 and 74 out of 76 points. For context, only 24 out of 16,414 students in the Netherlands achieved a perfect score. By comparison, the GPT-4o model scored 66 and 62 out of 76, well above the Dutch students' average of 40.63 points. Neither model had access to the exam figures. Since there was a risk of model contami-nation (i.e., the knowledge cutoff for o1-preview and GPT-4o was after the exam was published online), we repeated the procedure with a new Mathematics B exam that was published after the cutoff date. The results again indicated that o1-preview performed strongly (97.8th percentile), which suggests that contamination was not a factor. We also show that there is some variability in the output of o1-preview, which means that sometimes there is 'luck' (the answer is correct) or 'bad luck' (the output has diverged into something that is incorrect). We demonstrate that the self-consistency approach, where repeated prompts are given and the most common answer is selected, is a useful strategy for identifying the correct answer. It is concluded that while OpenAI's new model series holds great potential, certain risks must be considered.
CVJul 15, 2024
Anticipating Future Object Compositions without ForgettingYoussef Zahran, Gertjan Burghouts, Yke Bauke Eisma
Despite the significant advancements in computer vision models, their ability to generalize to novel object-attribute compositions remains limited. Existing methods for Compositional Zero-Shot Learning (CZSL) mainly focus on image classification. This paper aims to enhance CZSL in object detection without forgetting prior learned knowledge. We use Grounding DINO and incorporate Compositional Soft Prompting (CSP) into it and extend it with Compositional Anticipation. We achieve a 70.5% improvement over CSP on the harmonic mean (HM) between seen and unseen compositions on the CLEVR dataset. Furthermore, we introduce Contrastive Prompt Tuning to incrementally address model confusion between similar compositions. We demonstrate the effectiveness of this method and achieve an increase of 14.5% in HM across the pretrain, increment, and unseen sets. Collectively, these methods provide a framework for learning various compositions with limited data, as well as improving the performance of underperforming compositions when additional data becomes available.