EPOct 9, 2025
Understanding Exoplanet Habitability: A Bayesian ML Framework for Predicting Atmospheric Absorption SpectraVasuda Trehan, Kevin H. Knuth, M. J. Way
The evolution of space technology in recent years, fueled by advancements in computing such as Artificial Intelligence (AI) and machine learning (ML), has profoundly transformed our capacity to explore the cosmos. Missions like the James Webb Space Telescope (JWST) have made information about distant objects more easily accessible, resulting in extensive amounts of valuable data. As part of this work-in-progress study, we are working to create an atmospheric absorption spectrum prediction model for exoplanets. The eventual model will be based on both collected observational spectra and synthetic spectral data generated by the ROCKE-3D general circulation model (GCM) developed by the climate modeling program at NASA's Goddard Institute for Space Studies (GISS). In this initial study, spline curves are used to describe the bin heights of simulated atmospheric absorption spectra as a function of one of the values of the planetary parameters. Bayesian Adaptive Exploration is then employed to identify areas of the planetary parameter space for which more data are needed to improve the model. The resulting system will be used as a forward model so that planetary parameters can be inferred given a planet's atmospheric absorption spectrum. This work is expected to contribute to a better understanding of exoplanetary properties and general exoplanet climates and habitability.
AIFeb 13, 2016
Designing Intelligent InstrumentsKevin H. Knuth, Philip M. Erner, Scott Frasso
Remote science operations require automated systems that can both act and react with minimal human intervention. One such vision is that of an intelligent instrument that collects data in an automated fashion, and based on what it learns, decides which new measurements to take. This innovation implements experimental design and unites it with data analysis in such a way that it completes the cycle of learning. This cycle is the basis of the Scientific Method. The three basic steps of this cycle are hypothesis generation, inquiry, and inference. Hypothesis generation is implemented by artificially supplying the instrument with a parameterized set of possible hypotheses that might be used to describe the physical system. The act of inquiry is handled by an inquiry engine that relies on Bayesian adaptive exploration where the optimal experiment is chosen as the one which maximizes the expected information gain. The inference engine is implemented using the nested sampling algorithm, which provides the inquiry engine with a set of posterior samples from which the expected information gain can be estimated. With these computational structures in place, the instrument will refine its hypotheses, and repeat the learning cycle by taking measurements until the system under study is described within a pre-specified tolerance. We will demonstrate our first attempts toward achieving this goal with an intelligent instrument constructed using the LEGO MINDSTORMS NXT robotics platform.
DATA-ANJan 21, 2015
Convergent Bayesian formulations of blind source separation and electromagnetic source estimationKevin H. Knuth, Herbert G. Vaughan
We consider two areas of research that have been developing in parallel over the last decade: blind source separation (BSS) and electromagnetic source estimation (ESE). BSS deals with the recovery of source signals when only mixtures of signals can be obtained from an array of detectors and the only prior knowledge consists of some information about the nature of the source signals. On the other hand, ESE utilizes knowledge of the electromagnetic forward problem to assign source signals to their respective generators, while information about the signals themselves is typically ignored. We demonstrate that these two techniques can be derived from the same starting point using the Bayesian formalism. This suggests a means by which new algorithms can be developed that utilize as much relevant information as possible. We also briefly mention some preliminary work that supports the value of integrating information used by these two techniques and review the kinds of information that may be useful in addressing the ESE problem.
DATA-ANJan 21, 2015
Difficulties applying recent blind source separation techniques to EEG and MEGKevin H. Knuth
High temporal resolution measurements of human brain activity can be performed by recording the electric potentials on the scalp surface (electroencephalography, EEG), or by recording the magnetic fields near the surface of the head (magnetoencephalography, MEG). The analysis of the data is problematic due to the fact that multiple neural generators may be simultaneously active and the potentials and magnetic fields from these sources are superimposed on the detectors. It is highly desirable to un-mix the data into signals representing the behaviors of the original individual generators. This general problem is called blind source separation and several recent techniques utilizing maximum entropy, minimum mutual information, and maximum likelihood estimation have been applied. These techniques have had much success in separating signals such as natural sounds or speech, but appear to be ineffective when applied to EEG or MEG signals. Many of these techniques implicitly assume that the source distributions have a large kurtosis, whereas an analysis of EEG/MEG signals reveals that the distributions are multimodal. This suggests that more effective separation techniques could be designed for EEG and MEG signals.
DATA-ANDec 19, 2014
Information-Theoretic Methods for Identifying Relationships among Climate VariablesKevin H. Knuth, Deniz Gençağa, William B. Rossow
Information-theoretic quantities, such as entropy, are used to quantify the amount of information a given variable provides. Entropies can be used together to compute the mutual information, which quantifies the amount of information two variables share. However, accurately estimating these quantities from data is extremely challenging. We have developed a set of computational techniques that allow one to accurately compute marginal and joint entropies. These algorithms are probabilistic in nature and thus provide information on the uncertainty in our estimates, which enable us to establish statistical significance of our findings. We demonstrate these methods by identifying relations between cloud data from the International Satellite Cloud Climatology Project (ISCCP) and data from other sources, such as equatorial pacific sea surface temperatures (SST).
MENov 11, 2014
Bayesian Evidence and Model SelectionKevin H. Knuth, Michael Habeck, Nabin K. Malakar et al.
In this paper we review the concepts of Bayesian evidence and Bayes factors, also known as log odds ratios, and their application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Specific attention is paid to the Laplace approximation, variational Bayes, importance sampling, thermodynamic integration, and nested sampling and its recent variants. Analogies to statistical physics, from which many of these techniques originate, are discussed in order to provide readers with deeper insights that may lead to new techniques. The utility of Bayesian model testing in the domain sciences is demonstrated by presenting four specific practical examples considered within the context of signal processing in the areas of signal detection, sensor characterization, scientific model selection and molecular force characterization.
IMMar 18, 2014
Bayesian Source Separation Applied to Identifying Complex Organic Molecules in SpaceKevin H. Knuth, Man Kit Tse, Joshua Choinsky et al.
Emission from a class of benzene-based molecules known as Polycyclic Aromatic Hydrocarbons (PAHs) dominates the infrared spectrum of star-forming regions. The observed emission appears to arise from the combined emission of numerous PAH species, each with its unique spectrum. Linear superposition of the PAH spectra identifies this problem as a source separation problem. It is, however, of a formidable class of source separation problems given that different PAH sources potentially number in the hundreds, even thousands, and there is only one measured spectral signal for a given astrophysical site. Fortunately, the source spectra of the PAHs are known, but the signal is also contaminated by other spectral sources. We describe our ongoing work in developing Bayesian source separation techniques relying on nested sampling in conjunction with an ON/OFF mechanism enabling simultaneous estimation of the probability that a particular PAH species is present and its contribution to the spectrum.
MLNov 13, 2013
Informed Source Separation: A Bayesian TutorialKevin H. Knuth
Source separation problems are ubiquitous in the physical sciences; any situation where signals are superimposed calls for source separation to estimate the original signals. In this tutorial I will discuss the Bayesian approach to the source separation problem. This approach has a specific advantage in that it requires the designer to explicitly describe the signal model in addition to any other information or assumptions that go into the problem description. This leads naturally to the idea of informed source separation, where the algorithm design incorporates relevant information about the specific problem. This approach promises to enable researchers to design their own high-quality algorithms that are specifically tailored to the problem at hand.