psifx -- Psychological and Social Interactions Feature Extraction Package
This work addresses the need for efficient and accessible data processing in psychological and social science research, though it is incremental as it packages existing methods into a toolkit.
The authors tackled the problem of automating and standardizing data annotation for human sciences research by developing psifx, a multi-modal feature extraction toolkit that provides tools for audio, video, and text analysis, resulting in a plug-and-play package designed to democratize access to state-of-the-art machine learning techniques.
psifx is a plug-and-play multi-modal feature extraction toolkit, aiming to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research. It is motivated by a need (a) to automate and standardize data annotation processes that typically require expensive, lengthy, and inconsistent human labour; (b) to develop and distribute open-source community-driven psychology research software; and (c) to enable large-scale access and ease of use for non-expert users. The framework contains an array of tools for tasks such as speaker diarization, closed-caption transcription and translation from audio; body, hand, and facial pose estimation and gaze tracking with multi-person tracking from video; and interactive textual feature extraction supported by large language models. The package has been designed with a modular and task-oriented approach, enabling the community to add or update new tools easily. This combination creates new opportunities for in-depth study of real-time behavioral phenomena in psychological and social science research.