CVDec 18, 2024
Real-Time Position-Aware View Synthesis from Single-View InputManu Gond, Emin Zerman, Sebastian Knorr et al.
Recent advancements in view synthesis have significantly enhanced immersive experiences across various computer graphics and multimedia applications, including telepresence and entertainment. By enabling the generation of new perspectives from a single input view, view synthesis allows users to better perceive and interact with their environment. However, many state-of-the-art methods, while achieving high visual quality, face limitations in real-time performance, which makes them less suitable for live applications where low latency is critical. In this paper, we present a lightweight, position-aware network designed for real-time view synthesis from a single input image and a target camera pose. The proposed framework consists of a Position Aware Embedding, which efficiently maps positional information from the target pose to generate high dimensional feature maps. These feature maps, along with the input image, are fed into a Rendering Network that merges features from dual encoder branches to resolve both high and low level details, producing a realistic new view of the scene. Experimental results demonstrate that our method achieves superior efficiency and visual quality compared to existing approaches, particularly in handling complex translational movements without explicit geometric operations like warping. This work marks a step toward enabling real-time live and interactive telepresence applications.
HCJul 1, 2025
Scope Meets Screen: Lessons Learned in Designing Composite Visualizations for Marksmanship Training Across Skill LevelsEmin Zerman, Jonas Carlsson, Mårten Sjöström
Marksmanship practices are required in various professions, including police, military personnel, hunters, as well as sports shooters, such as Olympic shooting, biathlon, and modern pentathlon. The current form of training and coaching is mostly based on repetition, where the coach does not see through the eyes of the shooter, and analysis is limited to stance and accuracy post-session. In this study, we present a shooting visualization system and evaluate its perceived effectiveness for both novice and expert shooters. To achieve this, five composite visualizations were developed using first-person shooting video recordings enriched with overlaid metrics and graphical summaries. These views were evaluated with 10 participants (5 expert marksmen, 5 novices) through a mixed-methods study including shot-count and aiming interpretation tasks, pairwise preference comparisons, and semi-structured interviews. The results show that a dashboard-style composite view, combining raw video with a polar plot and selected graphs, was preferred in 9 of 10 cases and supported understanding across skill levels. The insights gained from this design study point to the broader value of integrating first-person video with visual analytics for coaching, and we suggest directions for applying this approach to other precision-based sports.
CVAug 7, 2020
A Study on Visual Perception of Light Field ContentAilbhe Gill, Emin Zerman, Cagri Ozcinar et al.
The effective design of visual computing systems depends heavily on the anticipation of visual attention, or saliency. While visual attention is well investigated for conventional 2D images and video, it is nevertheless a very active research area for emerging immersive media. In particular, visual attention of light fields (light rays of a scene captured by a grid of cameras or micro lenses) has only recently become a focus of research. As they may be rendered and consumed in various ways, a primary challenge that arises is the definition of what visual perception of light field content should be. In this work, we present a visual attention study on light field content. We conducted perception experiments displaying them to users in various ways and collected corresponding visual attention data. Our analysis highlights characteristics of user behaviour in light field imaging applications. The light field data set and attention data are provided with this paper.
HCJul 17, 2020
A Case Study on Video Color Transfer: Exploring User Motivations, Expectations, and SatisfactionMairéad Grogan, Emin Zerman, Gareth W. Young et al.
Multimedia and creativity software products are being used to edit and control various elements of creative media practices. These days, the technical affordances of mobile multimedia devices and the advent of high-speed 5G internet access mean that these abilities are simpler and more readily available to be harnessed by mobile applications. In this paper, using a prototype application, we discuss how potential users of such technology are motivated to use a video recoloring application and explore the role that user expectation and satisfaction play in this process. By exploring this topic and focusing on the human-computer interaction, we found that color transfer interactions are driven by several intrinsic motivations and that user expectations and satisfaction ratings can be maintained via clear visualizations of the processes to be undertaken. Furthermore, we reveal the specific language that users use to communicate video recoloring when regarding user motivations, expectations, and satisfaction. This research provides important information for developers of state-of-art recoloring processes and contributes to dialogues surrounding the users of mobile multimedia technology in practice.
MMAug 22, 2019
ColorNet -- Estimating Colorfulness in Natural ImagesEmin Zerman, Aakanksha Rana, Aljosa Smolic
Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the mapping from feature space to the ideal colorfulness scores for a variety of natural colored images. Additionally, we propose to overcome the lack of adequate annotated dataset problem by combining/aligning two publicly available colorfulness databases using the results of a new subjective test which employs a common subset of both databases. Using the obtained subjectively annotated dataset with 180 colored images, we finally demonstrate the efficacy of our proposed model over the traditional methods, both quantitatively and qualitatively.