Martin Huber

EM
h-index10
14papers
264citations
Novelty35%
AI Score36

14 Papers

CVOct 26, 2022Code
Rapid and robust endoscopic content area estimation: A lean GPU-based pipeline and curated benchmark dataset

Charlie Budd, Luis C. Garcia-Peraza-Herrera, Martin Huber et al.

Endoscopic content area refers to the informative area enclosed by the dark, non-informative, border regions present in most endoscopic footage. The estimation of the content area is a common task in endoscopic image processing and computer vision pipelines. Despite the apparent simplicity of the problem, several factors make reliable real-time estimation surprisingly challenging. The lack of rigorous investigation into the topic combined with the lack of a common benchmark dataset for this task has been a long-lasting issue in the field. In this paper, we propose two variants of a lean GPU-based computational pipeline combining edge detection and circle fitting. The two variants differ by relying on handcrafted features, and learned features respectively to extract content area edge point candidates. We also present a first-of-its-kind dataset of manually annotated and pseudo-labelled content areas across a range of surgical indications. To encourage further developments, the curated dataset, and an implementation of both algorithms, has been made public (https://doi.org/10.7303/syn32148000, https://github.com/charliebudd/torch-content-area). We compare our proposed algorithm with a state-of-the-art U-Net-based approach and demonstrate significant improvement in terms of both accuracy (Hausdorff distance: 6.3 px versus 118.1 px) and computational time (Average runtime per frame: 0.13 ms versus 11.2 ms).

CVJul 21, 2023
Deep Reinforcement Learning Based System for Intraoperative Hyperspectral Video Autofocusing

Charlie Budd, Jianrong Qiu, Oscar MacCormac et al.

Hyperspectral imaging (HSI) captures a greater level of spectral detail than traditional optical imaging, making it a potentially valuable intraoperative tool when precise tissue differentiation is essential. Hardware limitations of current optical systems used for handheld real-time video HSI result in a limited focal depth, thereby posing usability issues for integration of the technology into the operating room. This work integrates a focus-tunable liquid lens into a video HSI exoscope, and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing dataset. We benchmarked our proposed autofocus algorithm against traditional policies, and found our novel approach to perform significantly ($p<0.05$) better than traditional techniques ($0.070\pm.098$ mean absolute focal error compared to $0.146\pm.148$). In addition, we performed a blinded usability trial by having two neurosurgeons compare the system with different autofocus policies, and found our novel approach to be the most favourable, making our system a desirable addition for intraoperative HSI.

ROApr 29, 2025Code
Hydra: Marker-Free RGB-D Hand-Eye Calibration

Martin Huber, Huanyu Tian, Christopher E. Mower et al.

This work presents an RGB-D imaging-based approach to marker-free hand-eye calibration using a novel implementation of the iterative closest point (ICP) algorithm with a robust point-to-plane (PTP) objective formulated on a Lie algebra. Its applicability is demonstrated through comprehensive experiments using three well known serial manipulators and two RGB-D cameras. With only three randomly chosen robot configurations, our approach achieves approximately 90% successful calibrations, demonstrating 2-3x higher convergence rates to the global optimum compared to both marker-based and marker-free baselines. We also report 2 orders of magnitude faster convergence time (0.8 +/- 0.4 s) for 9 robot configurations over other marker-free methods. Our method exhibits significantly improved accuracy (5 mm in task space) over classical approaches (7 mm in task space) whilst being marker-free. The benchmarking dataset and code are open sourced under Apache 2.0 License, and a ROS 2 integration with robot abstraction is provided to facilitate deployment.

ROOct 27, 2025
Localising under the drape: proprioception in the era of distributed surgical robotic system

Martin Huber, Nicola A. Cavalcanti, Ayoob Davoodi et al.

Despite their mechanical sophistication, surgical robots remain blind to their surroundings. This lack of spatial awareness causes collisions, system recoveries, and workflow disruptions, issues that will intensify with the introduction of distributed robots with independent interacting arms. Existing tracking systems rely on bulky infrared cameras and reflective markers, providing only limited views of the surgical scene and adding hardware burden in crowded operating rooms. We present a marker-free proprioception method that enables precise localisation of surgical robots under their sterile draping despite associated obstruction of visual cues. Our method solely relies on lightweight stereo-RGB cameras and novel transformer-based deep learning models. It builds on the largest multi-centre spatial robotic surgery dataset to date (1.4M self-annotated images from human cadaveric and preclinical in vivo studies). By tracking the entire robot and surgical scene, rather than individual markers, our approach provides a holistic view robust to occlusions, supporting surgical scene understanding and context-aware control. We demonstrate an example of potential clinical benefits during in vivo breathing compensation with access to tissue dynamics, unobservable under state of the art tracking, and accurately locate in multi-robot systems for future intelligent interaction. In addition, and compared with existing systems, our method eliminates markers and improves tracking visibility by 25%. To our knowledge, this is the first demonstration of marker-free proprioception for fully draped surgical robots, reducing setup complexity, enhancing safety, and paving the way toward modular and autonomous robotic surgery.

IVOct 21, 2021
2020 CATARACTS Semantic Segmentation Challenge

Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi et al.

Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

IVSep 30, 2021
Deep Homography Estimation in Dynamic Surgical Scenes for Laparoscopic Camera Motion Extraction

Martin Huber, Sébastien Ourselin, Christos Bergeles et al.

Current laparoscopic camera motion automation relies on rule-based approaches or only focuses on surgical tools. Imitation Learning (IL) methods could alleviate these shortcomings, but have so far been applied to oversimplified setups. Instead of extracting actions from oversimplified setups, in this work we introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions. We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences through a novel homography generation algorithm. The synthetic camera motion serves as a supervisory signal for camera motion estimation that is invariant to object and tool motion. We perform an extensive evaluation of state-of-the-art (SOTA) Deep Neural Networks (DNNs) across multiple compute regimes, finding our method transfers from our camera motion free da Vinci surgery dataset to videos of laparoscopic interventions, outperforming classical homography estimation approaches in both, precision by 41%, and runtime on a CPU by 43%.

GNMay 4, 2021
Business analytics meets artificial intelligence: Assessing the demand effects of discounts on Swiss train tickets

Martin Huber, Jonas Meier, Hannes Wallimann

We assess the demand effects of discounts on train tickets issued by the Swiss Federal Railways, the so-called `supersaver tickets', based on machine learning, a subfield of artificial intelligence. Considering a survey-based sample of buyers of supersaver tickets, we investigate which customer- or trip-related characteristics (including the discount rate) predict buying behavior, namely: booking a trip otherwise not realized by train, buying a first- rather than second-class ticket, or rescheduling a trip (e.g.\ away from rush hours) when being offered a supersaver ticket. Predictive machine learning suggests that customer's age, demand-related information for a specific connection (like departure time and utilization), and the discount level permit forecasting buying behavior to a certain extent. Furthermore, we use causal machine learning to assess the impact of the discount rate on rescheduling a trip, which seems relevant in the light of capacity constraints at rush hours. Assuming that (i) the discount rate is quasi-random conditional on our rich set of characteristics and (ii) the buying decision increases weakly monotonically in the discount rate, we identify the discount rate's effect among `always buyers', who would have traveled even without a discount, based on our survey that asks about customer behavior in the absence of discounts. We find that on average, increasing the discount rate by one percentage point increases the share of rescheduled trips by 0.16 percentage points among always buyers. Investigating effect heterogeneity across observables suggests that the effects are higher for leisure travelers and during peak hours when controlling several other characteristics.

MLApr 22, 2021
Deep learning for detecting bid rigging: Flagging cartel participants based on convolutional neural networks

Martin Huber, David Imhof

Adding to the literature on the data-driven detection of bid-rigging cartels, we propose a novel approach based on deep learning (a subfield of artificial intelligence) that flags cartel participants based on their pairwise bidding interactions with other firms. More concisely, we combine a so-called convolutional neural network for image recognition with graphs that in a pairwise manner plot the normalized bid values of some reference firm against the normalized bids of any other firms participating in the same tenders as the reference firm. Based on Japanese and Swiss procurement data, we construct such graphs for both collusive and competitive episodes (i.e when a bid-rigging cartel is or is not active) and use a subset of graphs to train the neural network such that it learns distinguishing collusive from competitive bidding patterns. We use the remaining graphs to test the neural network's out-of-sample performance in correctly classifying collusive and competitive bidding interactions. We obtain a very decent average accuracy of around 90% or slightly higher when either applying the method within Japanese, Swiss, or mixed data (in which Swiss and Japanese graphs are pooled). When using data from one country for training to test the trained model's performance in the other country (i.e. transnationally), predictive performance decreases (likely due to institutional differences in procurement procedures across countries), but often remains satisfactorily high. All in all, the generally quite high accuracy of the convolutional neural network despite being trained in a rather small sample of a few 100 graphs points to a large potential of deep learning approaches for flagging and fighting bid-rigging cartels.

GNJan 19, 2021
The fiscal response to revenue shocks

Simon Berset, Martin Huber, Mark Schelker

We study the impact of fiscal revenue shocks on local fiscal policy. We focus on the very volatile revenues from the immovable property gains tax in the canton of Zurich, Switzerland, and analyze fiscal behavior following large and rare positive and negative revenue shocks. We apply causal machine learning strategies and implement the post-double-selection LASSO estimator to identify the causal effect of revenue shocks on public finances. We show that local policymakers overall predominantly smooth fiscal shocks. However, we also find some patterns consistent with fiscal conservatism, where positive shocks are smoothed, while negative ones are mitigated by spending cuts.

EMDec 1, 2020
Evaluating (weighted) dynamic treatment effects by double machine learning

Hugo Bodory, Martin Huber, Lukáš Lafférs

We consider evaluating the causal effects of dynamic treatments, i.e. of multiple treatment sequences in various periods, based on double machine learning to control for observed, time-varying covariates in a data-driven way under a selection-on-observables assumption. To this end, we make use of so-called Neyman-orthogonal score functions, which imply the robustness of treatment effect estimation to moderate (local) misspecifications of the dynamic outcome and treatment models. This robustness property permits approximating outcome and treatment models by double machine learning even under high dimensional covariates and is combined with data splitting to prevent overfitting. In addition to effect estimation for the total population, we consider weighted estimation that permits assessing dynamic treatment effects in specific subgroups, e.g. among those treated in the first treatment period. We demonstrate that the estimators are asymptotically normal and $\sqrt{n}$-consistent under specific regularity conditions and investigate their finite sample properties in a simulation study. Finally, we apply the methods to the Job Corps study in order to assess different sequences of training programs under a large set of covariates.

EMNov 30, 2020
Double machine learning for sample selection models

Michela Bia, Martin Huber, Lukáš Lafférs

This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent under specific regularity conditions concerning the machine learners and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data for evaluating the effect of training on hourly wages which are only observed conditional on employment. The estimator is available in the causalweight package for the statistical software R.

EMApr 12, 2020
A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

Hannes Wallimann, David Imhof, Martin Huber

We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.

EMOct 1, 2019
An introduction to flexible methods for policy evaluation

Martin Huber

This chapter covers different approaches to policy evaluation for assessing the causal effect of a treatment or intervention on an outcome of interest. As an introduction to causal inference, the discussion starts with the experimental evaluation of a randomized treatment. It then reviews evaluation methods based on selection on observables (assuming a quasi-random treatment given observed covariates), instrumental variables (inducing a quasi-random shift in the treatment), difference-in-differences and changes-in-changes (exploiting changes in outcomes over time), as well as regression discontinuities and kinks (using changes in the treatment assignment at some threshold of a running variable). The chapter discusses methods particularly suited for data with many observations for a flexible (i.e. semi- or nonparametric) modeling of treatment effects, and/or many (i.e. high dimensional) observed covariates by applying machine learning to select and control for covariates in a data-driven way. This is not only useful for tackling confounding by controlling for instance for factors jointly affecting the treatment and the outcome, but also for learning effect heterogeneities across subgroups defined upon observable covariates and optimally targeting those groups for which the treatment is most effective.

HCJan 25, 2017
Design and Implementation of a Semantic Dialogue System for Radiologists

Daniel Sonntag, Martin Huber, Manuel Möller et al.

This chapter describes a semantic dialogue system for radiologists in a comprehensive case study within the large-scale MEDICO project. MEDICO addresses the need for advanced semantic technologies in the search for medical image and patient data. The objectives are, first, to enable a seamless integration of medical images and different user applications by providing direct access to image semantics, and second, to design and implement a multimodal dialogue shell for the radiologist. Speech-based semantic image retrieval and annotation of medical images should provide the basis for help in clinical decision support and computer aided diagnosis. We will describe the clinical workflow and interaction requirements and focus on the design and implementation of a multimodal user interface for patient/image search or annotation and its implementation while using a speech-based dialogue shell. Ontology modeling provides the backbone for knowledge representation in the dialogue shell and the specific medical application domain; ontology structures are the communication basis of our combined semantic search and retrieval architecture which includes the MEDICO server, the triple store, the semantic search API, the medical visualization toolkit MITK, and the speech-based dialogue shell, amongst others. We will focus on usability aspects of multimodal applications, our storyboard and the implemented speech and touchscreen interaction design.