Zhuoyue Huang

2papers

2 Papers

62.0LGApr 13
Loss-Driven Bayesian Active Learning

Zhuoyue Huang, Freddie Bickford Smith, Tom Rainforth · oxford

The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We propose a rigorous loss-driven approach to Bayesian active learning that allows data acquisition to directly target the loss associated with a given decision problem. In particular, we show how any loss can be used to derive a unique objective for optimal data acquisition. Critically, we then show that any loss taking the form of a weighted Bregman divergence permits analytic computation of a central component of its corresponding objective, making the approach applicable in practice. In regression and classification experiments with a range of different losses, we find our approach reduces test losses relative to existing techniques.

LGJun 14, 2024
SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings

Barbora Barancikova, Zhuoyue Huang, Cristopher Salvi

Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretisation of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the data. The forward and backward processes gradually perturb and denoise log-signatures while preserving their algebraic structure. To recover a signal from its log-signature, we provide new closed-form inversion formulae expressing the coefficients obtained by expanding the signal in a given basis (e.g. Fourier or orthogonal polynomials) as explicit polynomial functions of the log-signature. Finally, we show that combining SigDiffusions with these inversion formulae results in high-quality long time series generation, competitive with the current state-of-the-art on various datasets of synthetic and real-world examples.