HCSep 6, 2022
The HoloLens in Medicine: A systematic Review and TaxonomyChristina Gsaxner, Jianning Li, Antonio Pepe et al.
The HoloLens (Microsoft Corp., Redmond, WA), a head-worn, optically see-through augmented reality display, is the main player in the recent boost in medical augmented reality research. In medical settings, the HoloLens enables the physician to obtain immediate insight into patient information, directly overlaid with their view of the clinical scenario, the medical student to gain a better understanding of complex anatomies or procedures, and even the patient to execute therapeutic tasks with improved, immersive guidance. In this systematic review, we provide a comprehensive overview of the usage of the first-generation HoloLens within the medical domain, from its release in March 2016, until the year of 2021, were attention is shifting towards it's successor, the HoloLens 2. We identified 171 relevant publications through a systematic search of the PubMed and Scopus databases. We analyze these publications in regard to their intended use case, technical methodology for registration and tracking, data sources, visualization as well as validation and evaluation. We find that, although the feasibility of using the HoloLens in various medical scenarios has been shown, increased efforts in the areas of precision, reliability, usability, workflow and perception are necessary to establish AR in clinical practice.
CVMay 11, 2018Code
Clinical evaluation of semi-automatic opensource algorithmic software segmentation of the mandibular bone: Practical feasibility and assessment of a new course of actionJürgen Wallner, Kerstin Hochegger, Xiaojun Chen et al.
Computer assisted technologies based on algorithmic software segmentation are an increasing topic of interest in complex surgical cases. However - due to functional instability, time consuming software processes, personnel resources or licensed-based financial costs many segmentation processes are often outsourced from clinical centers to third parties and the industry. Therefore, the aim of this trial was to assess the practical feasibility of an easy available, functional stable and licensed-free segmentation approach to be used in the clinical practice. In this retrospective, randomized, controlled trail the accuracy and accordance of the open-source based segmentation algorithm GrowCut (GC) was assessed through the comparison to the manually generated ground truth of the same anatomy using 10 CT lower jaw data-sets from the clinical routine. Assessment parameters were the segmentation time, the volume, the voxel number, the Dice Score (DSC) and the Hausdorff distance (HD). Overall segmentation times were about one minute. Mean DSC values of over 85% and HD below 33.5 voxel could be achieved. Statistical differences between the assessment parameters were not significant (p<0.05) and correlation coefficients were close to the value one (r > 0.94). Complete functional stable and time saving segmentations with high accuracy and high positive correlation could be performed by the presented interactive open-source based approach. In the cranio-maxillofacial complex the used method could represent an algorithmic alternative for image-based segmentation in the clinical practice for e.g. surgical treatment planning or visualization of postoperative results and offers several advantages. Systematic comparisons to other segmentation approaches or with a greater data amount are areas of future works.
GRJul 1, 2025
A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External MemoryFelix Windisch, Thomas Köhler, Lukas Radl et al.
Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.
GRApr 17, 2025
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian RenderingMichael Steiner, Thomas Köhler, Lukas Radl et al.
Although 3D Gaussian Splatting (3DGS) has revolutionized 3D reconstruction, it still faces challenges such as aliasing, projection artifacts, and view inconsistencies, primarily due to the simplification of treating splats as 2D entities. We argue that incorporating full 3D evaluation of Gaussians throughout the 3DGS pipeline can effectively address these issues while preserving rasterization efficiency. Specifically, we introduce an adaptive 3D smoothing filter to mitigate aliasing and present a stable view-space bounding method that eliminates popping artifacts when Gaussians extend beyond the view frustum. Furthermore, we promote tile-based culling to 3D with screen-space planes, accelerating rendering and reducing sorting costs for hierarchical rasterization. Our method achieves state-of-the-art quality on in-distribution evaluation sets and significantly outperforms other approaches for out-of-distribution views. Our qualitative evaluations further demonstrate the effective removal of aliasing, distortions, and popping artifacts, ensuring real-time, artifact-free rendering.
GRJun 23, 2025
SOF: Sorted Opacity Fields for Fast Unbounded Surface ReconstructionLukas Radl, Felix Windisch, Thomas Deixelberger et al.
Recent advances in 3D Gaussian representations have significantly improved the quality and efficiency of image-based scene reconstruction. Their explicit nature facilitates real-time rendering and fast optimization, yet extracting accurate surfaces - particularly in large-scale, unbounded environments - remains a difficult task. Many existing methods rely on approximate depth estimates and global sorting heuristics, which can introduce artifacts and limit the fidelity of the reconstructed mesh. In this paper, we present Sorted Opacity Fields (SOF), a method designed to recover detailed surfaces from 3D Gaussians with both speed and precision. Our approach improves upon prior work by introducing hierarchical resorting and a robust formulation of Gaussian depth, which better aligns with the level-set. To enhance mesh quality, we incorporate a level-set regularizer operating on the opacity field and introduce losses that encourage geometrically-consistent primitive shapes. In addition, we develop a parallelized Marching Tetrahedra algorithm tailored to our opacity formulation, reducing meshing time by up to an order of magnitude. As demonstrated by our quantitative evaluation, SOF achieves higher reconstruction accuracy while cutting total processing time by more than a factor of three. These results mark a step forward in turning efficient Gaussian-based rendering into equally efficient geometry extraction.
CVMar 18, 2024
Deep Medial Voxels: Learned Medial Axis Approximations for Anatomical Shape ModelingAntonio Pepe, Richard Schussnig, Jianning Li et al.
Shape reconstruction from imaging volumes is a recurring need in medical image analysis. Common workflows start with a segmentation step, followed by careful post-processing and,finally, ad hoc meshing algorithms. As this sequence can be timeconsuming, neural networks are trained to reconstruct shapes through template deformation. These networks deliver state-ofthe-art results without manual intervention, but, so far, they have primarily been evaluated on anatomical shapes with little topological variety between individuals. In contrast, other works favor learning implicit shape models, which have multiple benefits for meshing and visualization. Our work follows this direction by introducing deep medial voxels, a semi-implicit representation that faithfully approximates the topological skeleton from imaging volumes and eventually leads to shape reconstruction via convolution surfaces. Our reconstruction technique shows potential for both visualization and computer simulations.
CVAug 18, 2025
IntelliCap: Intelligent Guidance for Consistent View SamplingAyaka Yasunaga, Hideo Saito, Dieter Schmalstieg et al.
Novel view synthesis from images, for example, with 3D Gaussian splatting, has made great progress. Rendering fidelity and speed are now ready even for demanding virtual reality applications. However, the problem of assisting humans in collecting the input images for these rendering algorithms has received much less attention. High-quality view synthesis requires uniform and dense view sampling. Unfortunately, these requirements are not easily addressed by human camera operators, who are in a hurry, impatient, or lack understanding of the scene structure and the photographic process. Existing approaches to guide humans during image acquisition concentrate on single objects or neglect view-dependent material characteristics. We propose a novel situated visualization technique for scanning at multiple scales. During the scanning of a scene, our method identifies important objects that need extended image coverage to properly represent view-dependent appearance. To this end, we leverage semantic segmentation and category identification, ranked by a vision-language model. Spherical proxies are generated around highly ranked objects to guide the user during scanning. Our results show superior performance in real scenes compared to conventional view sampling strategies.
IVNov 22, 2021
Automated cross-sectional view selection in CT angiography of aortic dissections with uncertainty awareness and retrospective clinical annotationsAntonio Pepe, Jan Egger, Marina Codari et al.
Objective: Surveillance imaging of chronic aortic diseases, such as dissections, relies on obtaining and comparing cross-sectional diameter measurements at predefined aortic landmarks, over time. Due to a lack of robust tools, the orientation of the cross-sectional planes is defined manually by highly trained operators. We show how manual annotations routinely collected in a clinic can be efficiently used to ease this task, despite the presence of a non-negligible interoperator variability in the measurements. Impact: Ill-posed but repetitive imaging tasks can be eased or automated by leveraging imperfect, retrospective clinical annotations. Methodology: In this work, we combine convolutional neural networks and uncertainty quantification methods to predict the orientation of such cross-sectional planes. We use clinical data randomly processed by 11 operators for training, and test on a smaller set processed by 3 independent operators to assess interoperator variability. Results: Our analysis shows that manual selection of cross-sectional planes is characterized by 95% limits of agreement (LOA) of $10.6^\circ$ and $21.4^\circ$ per angle. Our method showed to decrease static error by $3.57^\circ$ ($40.2$%) and $4.11^\circ$ ($32.8$%) against state of the art and LOA by $5.4^\circ$ ($49.0$%) and $16.0^\circ$ ($74.6$%) against manual processing. Conclusion: This suggests that pre-existing annotations can be an inexpensive resource in clinics to ease ill-posed and repetitive tasks like cross-section extraction for surveillance of aortic dissections.
HCOct 12, 2020
Evaluating Mixed and Augmented Reality: A Systematic Literature Review (2009-2019)Leonel Merino, Magdalena Schwarzl, Matthias Kraus et al.
We present a systematic review of 458 papers that report on evaluations in mixed and augmented reality (MR/AR) published in ISMAR, CHI, IEEE VR, and UIST over a span of 11 years (2009-2019). Our goal is to provide guidance for future evaluations of MR/AR approaches. To this end, we characterize publications by paper type (e.g., technique, design study), research topic (e.g., tracking, rendering), evaluation scenario (e.g., algorithm performance, user performance), cognitive aspects (e.g., perception, emotion), and the context in which evaluations were conducted (e.g., lab vs. in-the-wild). We found a strong coupling of types, topics, and scenarios. We observe two groups: (a) technology-centric performance evaluations of algorithms that focus on improving tracking, displays, reconstruction, rendering, and calibration, and (b) human-centric studies that analyze implications of applications and design, human factors on perception, usability, decision making, emotion, and attention. Amongst the 458 papers, we identified 248 user studies that involved 5,761 participants in total, of whom only 1,619 were identified as female. We identified 43 data collection methods used to analyze 10 cognitive aspects. We found nine objective methods, and eight methods that support qualitative analysis. A majority (216/248) of user studies are conducted in a laboratory setting. Often (138/248), such studies involve participants in a static way. However, we also found a fair number (30/248) of in-the-wild studies that involve participants in a mobile fashion. We consider this paper to be relevant to academia and industry alike in presenting the state-of-the-art and guiding the steps to designing, conducting, and analyzing results of evaluations in MR/AR.
CVJul 22, 2019
DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAMRyo Hachiuma, Christian Pirchheim, Dieter Schmalstieg et al.
We present DetectFusion, an RGB-D SLAM system that runs in real-time and can robustly handle semantically known and unknown objects that can move dynamically in the scene. Our system detects, segments and assigns semantic class labels to known objects in the scene, while tracking and reconstructing them even when they move independently in front of the monocular camera. In contrast to related work, we achieve real-time computational performance on semantic instance segmentation with a novel method combining 2D object detection and 3D geometric segmentation. In addition, we propose a method for detecting and segmenting the motion of semantically unknown objects, thus further improving the accuracy of camera tracking and map reconstruction. We show that our method performs on par or better than previous work in terms of localization and object reconstruction accuracy, while achieving about 20 FPS even if the objects are segmented in each frame.
CVMar 12, 2018
In-depth Assessment of an Interactive Graph-based Approach for the Segmentation for Pancreatic Metastasis in Ultrasound Acquisitions of the Liver with two Specialists in Internal MedicineJan Egger, Xiaojun Chen, Lucas Bettac et al.
The manual outlining of hepatic metastasis in (US) ultrasound acquisitions from patients suffering from pancreatic cancer is common practice. However, such pure manual measurements are often very time consuming, and the results repeatedly differ between the raters. In this contribution, we study the in-depth assessment of an interactive graph-based approach for the segmentation for pancreatic metastasis in US images of the liver with two specialists in Internal Medicine. Thereby, evaluating the approach with over one hundred different acquisitions of metastases. The two physicians or the algorithm had never assessed the acquisitions before the evaluation. In summary, the physicians first performed a pure manual outlining followed by an algorithmic segmentation over one month later. As a result, the experts satisfied in up to ninety percent of algorithmic segmentation results. Furthermore, the algorithmic segmentation was much faster than manual outlining and achieved a median Dice Similarity Coefficient (DSC) of over eighty percent. Ultimately, the algorithm enables a fast and accurate segmentation of liver metastasis in clinical US images, which can support the manual outlining in daily practice.
CVOct 9, 2017
Algorithm guided outlining of 105 pancreatic cancer liver metastases in UltrasoundAlexander Hann, Lucas Bettac, Mark M. Haenle et al.
Manual segmentation of hepatic metastases in ultrasound images acquired from patients suffering from pancreatic cancer is common practice. Semiautomatic measurements promising assistance in this process are often assessed using a small number of lesions performed by examiners who already know the algorithm. In this work, we present the application of an algorithm for the segmentation of liver metastases due to pancreatic cancer using a set of 105 different images of metastases. The algorithm and the two examiners had never assessed the images before. The examiners first performed a manual segmentation and, after five weeks, a semiautomatic segmentation using the algorithm. They were satisfied in up to 90% of the cases with the semiautomatic segmentation results. Using the algorithm was significantly faster and resulted in a median Dice similarity score of over 80%. Estimation of the inter-operator variability by using the intra class correlation coefficient was good with 0.8. In conclusion, the algorithm facilitates fast and accurate segmentation of liver metastases, comparable to the current gold standard of manual segmentation.
CVAug 17, 2017
Computer-aided position planning of miniplates to treat facial bone defectsJan Egger, Jürgen Wallner, Markus Gall et al.
In this contribution, a software system for computer-aided position planning of miniplates to treat facial bone defects is proposed. The intra-operatively used bone plates have to be passively adapted on the underlying bone contours for adequate bone fragment stabilization. However, this procedure can lead to frequent intra-operatively performed material readjustments especially in complex surgical cases. Our approach is able to fit a selection of common implant models on the surgeon's desired position in a 3D computer model. This happens with respect to the surrounding anatomical structures, always including the possibility of adjusting both the direction and the position of the used osteosynthesis material. By using the proposed software, surgeons are able to pre-plan the out coming implant in its form and morphology with the aid of a computer-visualized model within a few minutes. Further, the resulting model can be stored in STL file format, the commonly used format for 3D printing. Using this technology, surgeons are able to print the virtual generated implant, or create an individually designed bending tool. This method leads to adapted osteosynthesis materials according to the surrounding anatomy and requires further a minimum amount of money and time.
CVApr 18, 2017
Interactive Outlining of Pancreatic Cancer Liver Metastases in Ultrasound ImagesJan Egger, Dieter Schmalstieg, Xiaojun Chen et al.
Ultrasound (US) is the most commonly used liver imaging modality worldwide. Due to its low cost, it is increasingly used in the follow-up of cancer patients with metastases localized in the liver. In this contribution, we present the results of an interactive segmentation approach for liver metastases in US acquisitions. A (semi-) automatic segmentation is still very challenging because of the low image quality and the low contrast between the metastasis and the surrounding liver tissue. Thus, the state of the art in clinical practice is still manual measurement and outlining of the metastases in the US images. We tackle the problem by providing an interactive segmentation approach providing real-time feedback of the segmentation results. The approach has been evaluated with typical US acquisitions from the clinical routine, and the datasets consisted of pancreatic cancer metastases. Even for difficult cases, satisfying segmentations results could be achieved because of the interactive real-time behavior of the approach. In total, 40 clinical images have been evaluated with our method by comparing the results against manual ground truth segmentations. This evaluation yielded to an average Dice Score of 85% and an average Hausdorff Distance of 13 pixels.
HCApr 12, 2017
How Sensemaking Tools Influence Display Space UsageThomas Geymayer, Manuela Waldner, Alexander Lex et al.
We explore how the availability of a sensemaking tool influences users' knowledge externalization strategies. On a large display, users were asked to solve an intelligence analysis task with or without a bidirectionally linked concept-graph (BLC) to organize insights into concepts (nodes) and relations (edges). In BLC, both nodes and edges maintain "deep links" to the exact source phrases and sections in associated documents. In our control condition, we were able to reproduce previously described spatial organization behaviors using document windows on the large display. When using BLC, however, we found that analysts apply spatial organization to BLC nodes instead, use significantly less display space and have significantly fewer open windows.
HCMar 22, 2017
Adaptive User Perspective Rendering for Handheld Augmented RealityPeter Mohr, Markus Tatzgern, Jens Grubert et al.
Handheld Augmented Reality commonly implements some variant of magic lens rendering, which turns only a fraction of the user's real environment into AR while the rest of the environment remains unaffected. Since handheld AR devices are commonly equipped with video see-through capabilities, AR magic lens applications often suffer from spatial distortions, because the AR environment is presented from the perspective of the camera of the mobile device. Recent approaches counteract this distortion based on estimations of the user's head position, rendering the scene from the user's perspective. To this end, approaches usually apply face-tracking algorithms on the front camera of the mobile device. However, this demands high computational resources and therefore commonly affects the performance of the application beyond the already high computational load of AR applications. In this paper, we present a method to reduce the computational demands for user perspective rendering by applying lightweight optical flow tracking and an estimation of the user's motion before head tracking is started. We demonstrate the suitability of our approach for computationally limited mobile devices and we compare it to device perspective rendering, to head tracked user perspective rendering, as well as to fixed point of view user perspective rendering.
GRMar 22, 2017
HTC Vive MeVisLab integration via OpenVR for medical applicationsJan Egger, Markus Gall, Jürgen Wallner et al.
Virtual Reality, an immersive technology that replicates an environment via computer-simulated reality, gets a lot of attention in the entertainment industry. However, VR has also great potential in other areas, like the medical domain, Examples are intervention planning, training and simulation. This is especially of use in medical operations, where an aesthetic outcome is important, like for facial surgeries. Alas, importing medical data into Virtual Reality devices is not necessarily trivial, in particular, when a direct connection to a proprietary application is desired. Moreover, most researcher do not build their medical applications from scratch, but rather leverage platforms like MeVisLab, MITK, OsiriX or 3D Slicer. These platforms have in common that they use libraries like ITK and VTK, and provide a convenient graphical interface. However, ITK and VTK do not support Virtual Reality directly. In this study, the usage of a Virtual Reality device for medical data under the MeVisLab platform is presented. The OpenVR library is integrated into the MeVisLab platform, allowing a direct and uncomplicated usage of the head mounted display HTC Vive inside the MeVisLab platform. Medical data coming from other MeVisLab modules can directly be connected per drag-and-drop to the Virtual Reality module, rendering the data inside the HTC Vive for immersive virtual reality inspection.
HCJan 15, 2017
A feasibility study on SSVEP-based interaction with motivating and immersive virtual and augmented realityJosef Faller, Brendan Z. Allison, Clemens Brunner et al.
Non-invasive steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) systems offer high bandwidth compared to other BCI types and require only minimal calibration and training. Virtual reality (VR) has been already validated as effective, safe, affordable and motivating feedback modality for BCI experiments. Augmented reality (AR) enhances the physical world by superimposing informative, context sensitive, computer generated content. In the context of BCI, AR can be used as a friendlier and more intuitive real-world user interface, thereby facilitating a more seamless and goal directed interaction. This can improve practicality and usability of BCI systems and may help to compensate for their low bandwidth. In this feasibility study, three healthy participants had to finish a complex navigation task in immersive VR and AR conditions using an online SSVEP BCI. Two out of three subjects were successful in all conditions. To our knowledge, this is the first work to present an SSVEP BCI that operates using target stimuli integrated in immersive VR and AR (head-mounted display and camera). This research direction can benefit patients by introducing more intuitive and effective real-world interaction (e.g. smart home control). It may also be relevant for user groups that require or benefit from hands free operation (e.g. due to temporary situational disability).
HCJan 14, 2017
Towards Interaction Around Unmodified Camera-equipped Mobile DevicesJens Grubert, Eyal Ofek, Michel Pahud et al.
Around-device interaction promises to extend the input space of mobile and wearable devices beyond the common but restricted touchscreen. So far, most around-device interaction approaches rely on instrumenting the device or the environment with additional sensors. We believe, that the full potential of ordinary cameras, specifically user-facing cameras, which are integrated in most mobile devices today, are not used to their full potential, yet. We To this end, we present a novel approach for extending the input space around unmodified mobile devices using built-in front-facing cameras of unmodified handheld devices. Our approach estimates hand poses and gestures through reflections in sunglasses, ski goggles or visors. Thereby, GlassHands creates an enlarged input space, rivaling input reach on large touch displays. We discuss the idea, its limitations and future work.
CVMar 2, 2016
US-Cut: Interactive Algorithm for rapid Detection and Segmentation of Liver Tumors in Ultrasound AcquisitionsJan Egger, Philip Voglreiter, Mark Dokter et al.
Ultrasound (US) is the most commonly used liver imaging modality worldwide. It plays an important role in follow-up of cancer patients with liver metastases. We present an interactive segmentation approach for liver tumors in US acquisitions. Due to the low image quality and the low contrast between the tumors and the surrounding tissue in US images, the segmentation is very challenging. Thus, the clinical practice still relies on manual measurement and outlining of the tumors in the US images. We target this problem by applying an interactive segmentation algorithm to the US data, allowing the user to get real-time feedback of the segmentation results. The algorithm has been developed and tested hand-in-hand by physicians and computer scientists to make sure a future practical usage in a clinical setting is feasible. To cover typical acquisitions from the clinical routine, the approach has been evaluated with dozens of datasets where the tumors are hyperechoic (brighter), hypoechoic (darker) or isoechoic (similar) in comparison to the surrounding liver tissue. Due to the interactive real-time behavior of the approach, it was possible even in difficult cases to find satisfying segmentations of the tumors within seconds and without parameter settings, and the average tumor deviation was only 1.4mm compared with manual measurements. However, the long term goal is to ease the volumetric acquisition of liver tumors in order to evaluate for treatment response. Additional aim is the registration of intraoperative US images via the interactive segmentations to the patient's pre-interventional CT acquisitions.
CVOct 21, 2015
Interactive Volumetry Of Liver Ablation ZonesJan Egger, Harald Busse, Philipp Brandmaier et al.
Percutaneous radiofrequency ablation (RFA) is a minimally invasive technique that destroys cancer cells by heat. The heat results from focusing energy in the radiofrequency spectrum through a needle. Amongst others, this can enable the treatment of patients who are not eligible for an open surgery. However, the possibility of recurrent liver cancer due to incomplete ablation of the tumor makes post-interventional monitoring via regular follow-up scans mandatory. These scans have to be carefully inspected for any conspicuousness. Within this study, the RF ablation zones from twelve post-interventional CT acquisitions have been segmented semi-automatically to support the visual inspection. An interactive, graph-based contouring approach, which prefers spherically shaped regions, has been applied. For the quantitative and qualitative analysis of the algorithm's results, manual slice-by-slice segmentations produced by clinical experts have been used as the gold standard (which have also been compared among each other). As evaluation metric for the statistical validation, the Dice Similarity Coefficient (DSC) has been calculated. The results show that the proposed tool provides lesion segmentation with sufficient accuracy much faster than manual segmentation. The visual feedback and interactivity make the proposed tool well suitable for the clinical workflow.