OCJun 23, 2019
On the Ergodic Control of EnsemblesAndre R. Fioravanti, Jakub Marecek, Robert N. Shorten et al.
Across smart-grid and smart-city application domains, there are many problems where an ensemble of agents is to be controlled such that both the aggregate behaviour and individual-level perception of the system's performance are acceptable. In many applications, traditional PI control is used to regulate aggregate ensemble performance. Our principal contribution in this note is to demonstrate that PI control may not be always suitable for this purpose, and in some situations may lead to a loss of ergodicity for closed-loop systems. Building on this observation, a theoretical framework is proposed to both analyse and design control systems for the regulation of large scale ensembles of agents with a probabilistic intent. Examples are given to illustrate our results.
SYApr 23, 2018
On the Design of an Intelligent Speed Advisory System for CyclistsYingqi Gu, Mingming Liu, Matheus Souza et al.
Traffic-related pollution is becoming a major societal problem globally. Cyclists are particularly exposed to this form of pollution due to their proximity to vehicles' tailpipes. In a number of recent studies, it is been shown that exposure to this form of pollution eventually outweighs the cardio-vascular benefits associated with cycling. Hence during cycling there are conflicting effects that affect the cyclist. On the one hand, cycling effort gives rise to health benefits, whereas exposure to pollution clearly does not. Mathematically speaking, these conflicting effects give rise to convex utility functions that describe the health threats accrued to cyclists. More particularly, and roughly speaking, for a given level of background pollution, there is an optimal length of journey time that minimises the health risks to a cyclist. In this paper, we consider a group of cyclists that share a common route. This may be recreational cyclists, or cyclists that travel together from an origin to destination. Given this context, we ask the following question. What is the common speed at which the cyclists should travel, so that the overall health risks can be minimised? We formulate this as an optimisation problem with consensus constraints. More specifically, we design an intelligent speed advisory system that recommends a common speed to a group of cyclists taking into account different levels of fitness of the cycling group, or different levels of electric assist in the case that some or all cyclists use e-bikes (electric bikes). To do this, we extend a recently derived consensus result to the case of quasi-convex utility functions. Simulation studies in different scenarios demonstrate the efficacy of our proposed system.
CVMar 20, 2022
CRISPnet: Color Rendition ISP NetMatheus Souza, Wolfgang Heidrich
Image signal processors (ISPs) are historically grown legacy software systems for reconstructing color images from noisy raw sensor measurements. They are usually composited of many heuristic blocks for denoising, demosaicking, and color restoration. Color reproduction in this context is of particular importance, since the raw colors are often severely distorted, and each smart phone manufacturer has developed their own characteristic heuristics for improving the color rendition, for example of skin tones and other visually important colors. In recent years there has been strong interest in replacing the historically grown ISP systems with deep learned pipelines. Much progress has been made in approximating legacy ISPs with such learned models. However, so far the focus of these efforts has been on reproducing the structural features of the images, with less attention paid to color rendition. Here we present CRISPnet, the first learned ISP model to specifically target color rendition accuracy relative to a complex, legacy smart phone ISP. We achieve this by utilizing both image metadata (like a legacy ISP would), as well as by learning simple global semantics based on image classification -- similar to what a legacy ISP does to determine the scene type. We also contribute a new ISP image dataset consisting of both high dynamic range monitor data, as well as real-world data, both captured with an actual cell phone ISP pipeline under a variety of lighting conditions, exposure times, and gain settings.
32.5CVMay 18
Low Latency Gaze Tracking via Latent Optical SensingYidan Zheng, Matheus Souza, Kaizhang Kang et al.
We present a real-time gaze tracking system that directly acquires task-relevant latent features using a fully passive optical encoder. Instead of forming and processing full-resolution images, our approach leverages a microlens array with a co-designed binary chromium mask to perform spatially multiplexed optical encoding, producing a compact set of measurements sufficient for gaze estimation. By integrating sensing and feature extraction in the optical domain, the proposed system eliminates the need for high-bandwidth image readout and substantially reduces computational overhead. The encoded measurements are captured by a 4 x 4 phototransistor array and mapped to gaze direction using a lightweight neural network. Our proof-of-concept prototype enables an end-to-end sensing-to-inference latency of 3.4 ms, outperforming published research systems. We demonstrate the effectiveness of our approach on both simulated and real-world data, achieving competitive gaze estimation accuracy while significantly improving latency and energy efficiency compared to conventional camera-based pipelines. This work highlights the potential of task-driven optical sensing for ultra-low-latency, computationally efficient human-computer interaction systems.
CVJan 8, 2024Code
Limitations of Data-Driven Spectral Reconstruction -- An Optics-Aware AnalysisQiang Fu, Matheus Souza, Eunsue Choi et al.
Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware. Published work reports exceedingly high numerical scores for this reconstruction task, yet real-world performance lags substantially behind. We systematically analyze the performance of such methods. First, we evaluate the overfitting limitations with respect to current datasets by training the networks with less data, validating the trained models with unseen yet slightly modified data and cross-dataset validation. Second, we reveal fundamental limitations in the ability of RGB to spectral methods to deal with metameric or near-metameric conditions, which have so far gone largely unnoticed due to the insufficiencies of existing datasets. We validate the trained models with metamer data generated by metameric black theory and re-training the networks with various forms of metamers. This methodology can also be used for data augmentation as a partial mitigation of the dataset issues, although the RGB to spectral inverse problem remains fundamentally ill-posed. Finally, we analyze the potential for modifying the problem setting to achieve better performance by exploiting optical encoding provided by either optical aberrations or deliberate optical design. Our experiments show such approaches provide improved results under certain circumstances, but their overall performance is limited by the same dataset issues. We conclude that future progress on snapshot spectral imaging will heavily depend on the generation of improved datasets which can then be used to design effective optical encoding strategies. Code: https://github.com/vccimaging/OpticsAwareHSI-Analysis.
IVJul 9, 2024
Latent Space ImagingMatheus Souza, Yidan Zheng, Kaizhang Kang et al.
Digital imaging systems have traditionally relied on brute-force measurement and processing of pixels arranged on regular grids. In contrast, the human visual system performs significant data reduction from the large number of photoreceptors to the optic nerve, effectively encoding visual information into a low-bandwidth latent space representation optimized for brain processing. Inspired by this, we propose a similar approach to advance artificial vision systems. Latent Space Imaging introduces a new paradigm that combines optics and software to encode image information directly into the semantically rich latent space of a generative model. This approach substantially reduces bandwidth and memory demands during image capture and enables a range of downstream tasks focused on the latent space. We validate this principle through an initial hardware prototype based on a single-pixel camera. By implementing an amplitude modulation scheme that encodes into the generative model's latent space, we achieve compression ratios ranging from 1:100 to 1:1000 during imaging, and up to 1:16384 for downstream applications. This approach leverages the model's intrinsic linear boundaries, demonstrating the potential of latent space imaging for highly efficient imaging hardware, adaptable future applications in high-speed imaging, and task-specific cameras with significantly reduced hardware complexity.
CVJan 6, 2024
MetaISP -- Exploiting Global Scene Structure for Accurate Multi-Device Color RenditionMatheus Souza, Wolfgang Heidrich
Image signal processors (ISPs) are historically grown legacy software systems for reconstructing color images from noisy raw sensor measurements. Each smartphone manufacturer has developed its ISPs with its own characteristic heuristics for improving the color rendition, for example, skin tones and other visually essential colors. The recent interest in replacing the historically grown ISP systems with deep-learned pipelines to match DSLR's image quality improves structural features in the image. However, these works ignore the superior color processing based on semantic scene analysis that distinguishes mobile phone ISPs from DSLRs. Here, we present MetaISP, a single model designed to learn how to translate between the color and local contrast characteristics of different devices. MetaISP takes the RAW image from device A as input and translates it to RGB images that inherit the appearance characteristics of devices A, B, and C. We achieve this result by employing a lightweight deep learning technique that conditions its output appearance based on the device of interest. In this approach, we leverage novel attention mechanisms inspired by cross-covariance to learn global scene semantics. Additionally, we use the metadata that typically accompanies RAW images and estimate scene illuminants when they are unavailable.
GRJun 2, 2024
End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave ModelXinge Yang, Matheus Souza, Kunyi Wang et al.
Hybrid refractive-diffractive lenses combine the light efficiency of refractive lenses with the information encoding power of diffractive optical elements (DOE), showing great potential as the next generation of imaging systems. However, accurately simulating such hybrid designs is generally difficult, and in particular, there are no existing differentiable image formation models for hybrid lenses with sufficient accuracy. In this work, we propose a new hybrid ray-tracing and wave-propagation (ray-wave) model for accurate simulation of both optical aberrations and diffractive phase modulation, where the DOE is placed between the last refractive surface and the image sensor, i.e. away from the Fourier plane that is often used as a DOE position. The proposed ray-wave model is fully differentiable, enabling gradient back-propagation for end-to-end co-design of refractive-diffractive lens optimization and the image reconstruction network. We validate the accuracy of the proposed model by comparing the simulated point spread functions (PSFs) with theoretical results, as well as simulation experiments that show our model to be more accurate than solutions implemented in commercial software packages like Zemax. We demonstrate the effectiveness of the proposed model through real-world experiments and show significant improvements in both aberration correction and extended depth-of-field (EDoF) imaging. We believe the proposed model will motivate further investigation into a wide range of applications in computational imaging, computational photography, and advanced optical design. Code will be released upon publication.
SPJul 31, 2020
Predictability and Fairness in Social SensingRamen Ghosh, Jakub Marecek, Wynita M. Griggs et al.
We consider the design of distributed algorithms that govern the manner in which agents contribute to a social sensing platform. Specifically, we are interested in situations where fairness among the agents contributing to the platform is needed. A notable example are platforms operated by public bodies, where fairness is a legal requirement. The design of such distributed systems is challenging due to the fact that we wish to simultaneously realise an efficient social sensing platform, but also deliver a predefined quality of service to the agents (for example, a fair opportunity to contribute to the platform). In this paper, we introduce iterated function systems (IFS) as a tool for the design and analysis of systems of this kind. We show how the IFS framework can be used to realise systems that deliver a predictable quality of service to agents, can be used to underpin contracts governing the interaction of agents with the social sensing platform, and which are efficient. To illustrate our design via a use case, we consider a large, high-density network of participating parked vehicles. When awoken by an administrative centre, this network proceeds to search for moving missing entities of interest using RFID-based techniques. We regulate which vehicles are actively searching for the moving entity of interest at any point in time. In doing so, we seek to equalise vehicular energy consumption across the network. This is illustrated through simulations of a search for a missing Alzheimer's patient in Melbourne, Australia. Experimental results are presented to illustrate the efficacy of our system and the predictability of access of agents to the platform independent of initial conditions.