LGNov 27, 2022
Machine Learning for Smart and Energy-Efficient BuildingsHari Prasanna Das, Yu-Wen Lin, Utkarsha Agwan et al. · berkeley
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimized, all the while maintaining satisfactory levels of occupant comfort, health, and safety. Recently, Machine Learning has been proven to be an invaluable tool in deriving important insights from data and optimizing various systems. In this work, we review the ways in which machine learning has been leveraged to make buildings smart and energy-efficient. For the convenience of readers, we provide a brief introduction of several machine learning paradigms and the components and functioning of each smart building system we cover. Finally, we discuss challenges faced while implementing machine learning algorithms in smart buildings and provide future avenues for research at the intersection of smart buildings and machine learning.
LGMar 10, 2022
Conditional Synthetic Data Generation for Personal Thermal Comfort ModelsHari Prasanna Das, Costas J. Spanos · berkeley
Personal thermal comfort models aim to predict an individual's thermal comfort response, instead of the average response of a large group. Recently, machine learning algorithms have proven to be having enormous potential as a candidate for personal thermal comfort models. But, often within the normal settings of a building, personal thermal comfort data obtained via experiments are heavily class-imbalanced. There are a disproportionately high number of data samples for the "Prefer No Change" class, as compared with the "Prefer Warmer" and "Prefer Cooler" classes. Machine learning algorithms trained on such class-imbalanced data perform sub-optimally when deployed in the real world. To develop robust machine learning-based applications using the above class-imbalanced data, as well as for privacy-preserving data sharing, we propose to implement a state-of-the-art conditional synthetic data generator to generate synthetic data corresponding to the low-frequency classes. Via experiments, we show that the synthetic data generated has a distribution that mimics the real data distribution. The proposed method can be extended for use by other smart building datasets/use-cases.
SYMay 21, 2019
HVAC Energy Cost Optimization for a Multi-zone Building via a Decentralized ApproachYu Yang, Guoqiang Hu, Costas J. Spanos
It has been well acknowledged that buildings account for a large proportion of the world's energy consumption. However, the energy use of buildings, especially the heating, ventilation and air-conditioning (HVAC), is far from being efficient. There still exists a dramatic potential to save energy through improving building energy efficiency. Therefore, this paper studies the control of HVAC system for multi-zone buildings with the objective to reduce energy consumption cost while satisfying thermal comfort. In particular, the thermal couplings due to the heat transfer between the adjacent zones are incorporated in the optimization. Considering that a centralized method is generally computationally prohibitive for large buildings, an efficient decentralized approach is developed, based on the Accelerated Distributed Augmented Lagrangian (ADAL) method [1]. To evaluate the performance of the proposed method, we first compare it with a centralized method, in which the optimal solution of a small-scale problem can be obtained. We find that this decentralized approach can almost approach the optimal solution of the problem. Further, this decentralized approach is compared with the Distributed Token-Based Scheduling Strategy (DTBSS) [2]. The numeric results reveal that when the number of zones is relatively small (less than 20), the two decentralized methods can achieve a comparable performance regarding the cost of the HVAC system. However, with an increase of the number of zones in the building, the proposed decentralized approach demonstrates better performance with a considerable reduction of the total cost. Moreover, the decentralized approach proposed in this paper demonstrate better scalability with less average computation required.
LGOct 13, 2022
Personalized Federated Hypernetworks for Privacy Preservation in Multi-Task Reinforcement LearningDoseok Jang, Larry Yan, Lucas Spangher et al.
Multi-Agent Reinforcement Learning currently focuses on implementations where all data and training can be centralized to one machine. But what if local agents are split across multiple tasks, and need to keep data private between each? We develop the first application of Personalized Federated Hypernetworks (PFH) to Reinforcement Learning (RL). We then present a novel application of PFH to few-shot transfer, and demonstrate significant initial increases in learning. PFH has never been demonstrated beyond supervised learning benchmarks, so we apply PFH to an important domain: RL price-setting for energy demand response. We consider a general case across where agents are split across multiple microgrids, wherein energy consumption data must be kept private within each microgrid. Together, our work explores how the fields of personalized federated learning and RL can come together to make learning efficient across multiple tasks while keeping data secure.
LGOct 16, 2019Code
Design, Benchmarking and Explainability Analysis of a Game-Theoretic Framework towards Energy Efficiency in Smart InfrastructureIoannis C. Konstantakopoulos, Hari Prasanna Das, Andrew R. Barkan et al.
In this paper, we propose a gamification approach as a novel framework for smart building infrastructure with the goal of motivating human occupants to reconsider personal energy usage and to have positive effects on their environment. Human interaction in the context of cyber-physical systems is a core component and consideration in the implementation of any smart building technology. Research has shown that the adoption of human-centric building services and amenities leads to improvements in the operational efficiency of these cyber-physical systems directed towards controlling building energy usage. We introduce a strategy in form of a game-theoretic framework that incorporates humans-in-the-loop modeling by creating an interface to allow building managers to interact with occupants and potentially incentivize energy efficient behavior. Prior works on game theoretic analysis typically rely on the assumption that the utility function of each individual agent is known a priori. Instead, we propose novel utility learning framework for benchmarking that employs robust estimations of occupant actions towards energy efficiency. To improve forecasting performance, we extend the utility learning scheme by leveraging deep bi-directional recurrent neural networks. Using the proposed methods on data gathered from occupant actions for resources such as room lighting, we forecast patterns of energy resource usage to demonstrate the prediction performance of the methods. The results of our study show that we can achieve a highly accurate representation of the ground truth for occupant energy resource usage. We also demonstrate the explainable nature on human decision making towards energy usage inherent in the dataset using graphical lasso and granger causality algorithms. Finally, we open source the de-identified, high-dimensional data pertaining to the energy game-theoretic framework.
AIJun 1, 2025
Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused DecisionJiahui Zhou, Dan Li, Lin Li et al.
The reasoning capabilities of large language models (LLMs) have significantly advanced their performance by enabling in-depth understanding of diverse tasks. With growing interest in applying LLMs to the time series domain, this has proven nontrivial, as evidenced by the limited efficacy of straightforwardly adapting text-domain reasoning techniques. Although recent work has shown promise in several time series tasks, further leveraging advancements in LLM reasoning remains under-explored for time series classification (TSC) tasks, despite their prevalence and significance in many real-world applications. In this paper, we propose ReasonTSC, a novel framework designed to effectively leverage LLM reasoning for time series classification through both a multi-turn reasoning and a fused decision-making strategy tailored to TSC. Rather than straightforwardly applying existing reasoning techniques or relying solely on LLMs' built-in reasoning capabilities, ReasonTSC first steers the model to think over the essential characteristics of time series data. Next, it integrates predictions and confidence scores from plug-in classifiers, e.g., domain-specific time series models, as in-context examples. Finally, ReasonTSC guides the LLM through a structured reasoning process: it evaluates the initial assessment, backtracks to consider alternative hypotheses, and compares their merits before arriving at a final classification. Extensive experiments and systematic ablation studies demonstrate that ReasonTSC consistently outperforms both existing time series reasoning baselines and plug-in models, and is even capable of identifying and correcting plug-in models' false predictions.
LGSep 14, 2021
Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic DataHari Prasanna Das, Ryan Tran, Japjot Singh et al.
$\textbf{Background:}$ At the onset of a pandemic, such as COVID-19, data with proper labeling/attributes corresponding to the new disease might be unavailable or sparse. Machine Learning (ML) models trained with the available data, which is limited in quantity and poor in diversity, will often be biased and inaccurate. At the same time, ML algorithms designed to fight pandemics must have good performance and be developed in a time-sensitive manner. To tackle the challenges of limited data, and label scarcity in the available data, we propose generating conditional synthetic data, to be used alongside real data for developing robust ML models. $\textbf{Methods:}$ We present a hybrid model consisting of a conditional generative flow and a classifier for conditional synthetic data generation. The classifier decouples the feature representation for the condition, which is fed to the flow to extract the local noise. We generate synthetic data by manipulating the local noise with fixed conditional feature representation. We also propose a semi-supervised approach to generate synthetic samples in the absence of labels for a majority of the available data. $\textbf{Results:}$ We performed conditional synthetic generation for chest computed tomography (CT) scans corresponding to normal, COVID-19, and pneumonia afflicted patients. We show that our method significantly outperforms existing models both on qualitative and quantitative performance, and our semi-supervised approach can efficiently synthesize conditional samples under label scarcity. As an example of downstream use of synthetic data, we show improvement in COVID-19 detection from CT scans with conditional synthetic data augmentation.
CVAug 25, 2021
CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial TrainingHari Prasanna Das, Ryan Tran, Japjot Singh et al.
How to generate conditional synthetic data for a domain without utilizing information about its labels/attributes? Our work presents a solution to the above question. We propose a transfer learning-based framework utilizing normalizing flows, coupled with both maximum-likelihood and adversarial training. We model a source domain (labels available) and a target domain (labels unavailable) with individual normalizing flows, and perform domain alignment to a common latent space using adversarial discriminators. Due to the invertible property of flow models, the mapping has exact cycle consistency. We also learn the joint distribution of the data samples and attributes in the source domain by employing an encoder to map attributes to the latent space via adversarial training. During the synthesis phase, given any combination of attributes, our method can generate synthetic samples conditioned on them in the target domain. Empirical studies confirm the effectiveness of our method on benchmarked datasets. We envision our method to be particularly useful for synthetic data generation in label-scarce systems by generating non-trivial augmentations via attribute transformations. These synthetic samples will introduce more entropy into the label-scarce domain than their geometric and photometric transformation counterparts, helpful for robust downstream tasks.
LGOct 5, 2019
A Novel Graphical Lasso based approach towards Segmentation Analysis in Energy Game-Theoretic FrameworksHari Prasanna Das, Ioannis C. Konstantakopoulos, Aummul Baneen Manasawala et al.
Energy game-theoretic frameworks have emerged to be a successful strategy to encourage energy efficient behavior in large scale by leveraging human-in-the-loop strategy. A number of such frameworks have been introduced over the years which formulate the energy saving process as a competitive game with appropriate incentives for energy efficient players. However, prior works involve an incentive design mechanism which is dependent on knowledge of utility functions for all the players in the game, which is hard to compute especially when the number of players is high, common in energy game-theoretic frameworks. Our research proposes that the utilities of players in such a framework can be grouped together to a relatively small number of clusters, and the clusters can then be targeted with tailored incentives. The key to above segmentation analysis is to learn the features leading to human decision making towards energy usage in competitive environments. We propose a novel graphical lasso based approach to perform such segmentation, by studying the feature correlations in a real-world energy social game dataset. To further improve the explainability of the model, we perform causality study using grangers causality. Proposed segmentation analysis results in characteristic clusters demonstrating different energy usage behaviors. We also present avenues to implement intelligent incentive design using proposed segmentation method.
LGAug 22, 2019
Efficient Task-Specific Data Valuation for Nearest Neighbor AlgorithmsRuoxi Jia, David Dao, Boxin Wang et al.
Given a data set $\mathcal{D}$ containing millions of data points and a data consumer who is willing to pay for \$$X$ to train a machine learning (ML) model over $\mathcal{D}$, how should we distribute this \$$X$ to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all $N$ data points, it requires $O(2^N)$ model evaluations for exact computation and $O(N\log N)$ for $(ε, δ)$-approximation. In this paper, we focus on one popular family of ML models relying on $K$-nearest neighbors ($K$NN). The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity! Moreover, for $(ε, δ)$-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity $O(N^{h(ε,K)}\log N)$ when $ε$ is not too small and $K$ is not too large. We empirically evaluate our algorithms on up to $10$ million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further. We then extend our algorithms to other scenarios such as (1) weighed $K$NN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation.
LGAug 5, 2019
Likelihood Contribution based Multi-scale Architecture for Generative FlowsHari Prasanna Das, Pieter Abbeel, Costas J. Spanos
Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process. However, flow models suffer from the challenge of having high dimensional latent space, the same in dimension as the input space. An effective solution to the above challenge as proposed by Dinh et al. (2016) is a multi-scale architecture, which is based on iterative early factorization of a part of the total dimensions at regular intervals. Prior works on generative flow models involving a multi-scale architecture perform the dimension factorization based on static masking. We propose a novel multi-scale architecture that performs data-dependent factorization to decide which dimensions should pass through more flow layers. To facilitate the same, we introduce a heuristic based on the contribution of each dimension to the total log-likelihood which encodes the importance of the dimensions. Our proposed heuristic is readily obtained as part of the flow training process, enabling the versatile implementation of our likelihood contribution based multi-scale architecture for generic flow models. We present such implementations for several state-of-the-art flow models and demonstrate improvements in log-likelihood score and sampling quality on standard image benchmarks. We also conduct ablation studies to compare the proposed method with other options for dimension factorization.
LGOct 24, 2018
Segmentation Analysis in Human Centric Cyber-Physical Systems using Graphical LassoHari Prasanna Das, Ioannis C. Konstantakopoulos, Aummul Baneen Manasawala et al.
A generalized gamification framework is introduced as a form of smart infrastructure with potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. The proposed framework enables a Human-Centric Cyber-Physical System using an interface to allow building managers to interact with occupants. The interface is designed for occupant engagement-integration supporting learning of their preferences over resources in addition to understanding how preferences change as a function of external stimuli such as physical control, time or incentives. Towards intelligent and autonomous incentive design, a noble statistical learning algorithm performing occupants energy usage behavior segmentation is proposed. We apply the proposed algorithm, Graphical Lasso, on energy resource usage data by the occupants to obtain feature correlations--dependencies. Segmentation analysis results in characteristic clusters demonstrating different energy usage behaviors. The features--factors characterizing human decision-making are made explainable.
CRJul 11, 2016
Privacy-Enhanced Architecture for Occupancy-based HVAC ControlRuoxi Jia, Roy Dong, S. Shankar Sastry et al.
Large-scale sensing and actuation infrastructures have allowed buildings to achieve significant energy savings; at the same time, these technologies introduce significant privacy risks that must be addressed. In this paper, we present a framework for modeling the trade-off between improved control performance and increased privacy risks due to occupancy sensing. More specifically, we consider occupancy-based HVAC control as the control objective and the location traces of individual occupants as the private variables. Previous studies have shown that individual location information can be inferred from occupancy measurements. To ensure privacy, we design an architecture that distorts the occupancy data in order to hide individual occupant location information while maintaining HVAC performance. Using mutual information between the individual's location trace and the reported occupancy measurement as a privacy metric, we are able to optimally design a scheme to minimize privacy risk subject to a control performance guarantee. We evaluate our framework using real-world occupancy data: first, we verify that our privacy metric accurately assesses the adversary's ability to infer private variables from the distorted sensor measurements; then, we show that control performance is maintained through simulations of building operations using these distorted occupancy readings.
MLJul 16, 2014
Sequential Logistic Principal Component Analysis (SLPCA): Dimensional Reduction in Streaming Multivariate Binary-State SystemZhaoyi Kang, Costas J. Spanos
Sequential or online dimensional reduction is of interests due to the explosion of streaming data based applications and the requirement of adaptive statistical modeling, in many emerging fields, such as the modeling of energy end-use profile. Principal Component Analysis (PCA), is the classical way of dimensional reduction. However, traditional Singular Value Decomposition (SVD) based PCA fails to model data which largely deviates from Gaussian distribution. The Bregman Divergence was recently introduced to achieve a generalized PCA framework. If the random variable under dimensional reduction follows Bernoulli distribution, which occurs in many emerging fields, the generalized PCA is called Logistic PCA (LPCA). In this paper, we extend the batch LPCA to a sequential version (i.e. SLPCA), based on the sequential convex optimization theory. The convergence property of this algorithm is discussed compared to the batch version of LPCA (i.e. BLPCA), as well as its performance in reducing the dimension for multivariate binary-state systems. Its application in building energy end-use profile modeling is also investigated.
HCJul 16, 2014
SoundLoc: Acoustic Method for Indoor Localization without InfrastructureRuoxi Jia, Ming Jin, Costas J. Spanos
Identifying locations of occupants is beneficial to energy management in buildings. A key observation in indoor environment is that distinct functional areas are typically controlled by separate HVAC and lighting systems and room level localization is sufficient to provide a powerful tool for energy usage reduction by occupancy-based actuation of the building facilities. Based upon this observation, this paper focuses on identifying the room where a person or a mobile device is physically present. Existing room localization methods, however, require special infrastructure to annotate rooms. SoundLoc is a room-level localization system that exploits the intrinsic acoustic properties of individual rooms and obviates the needs for infrastructures. As we show in the study, rooms' acoustic properties can be characterized by Room Impulse Response (RIR). Nevertheless, obtaining precise RIRs is a time-consuming and expensive process. The main contributions of our work are the following. First, a cost-effective RIR measurement system is implemented and the Noise Adaptive Extraction of Reverberation (NAER) algorithm is developed to estimate room acoustic parameters in noisy conditions. Second, a comprehensive physical and statistical analysis of features extracted from RIRs is performed. Also, SoundLoc is evaluated using the dataset consisting of ten (10) different rooms. The overall accuracy of 97.8% achieved demonstrates the potential to be integrated into automatic mapping of building space.
HCJun 22, 2014
Environmental Sensing by Wearable Device for Indoor Activity and Location EstimationMing Jin, Han Zou, Kevin Weekly et al.
We present results from a set of experiments in this pilot study to investigate the causal influence of user activity on various environmental parameters monitored by occupant carried multi-purpose sensors. Hypotheses with respect to each type of measurements are verified, including temperature, humidity, and light level collected during eight typical activities: sitting in lab / cubicle, indoor walking / running, resting after physical activity, climbing stairs, taking elevators, and outdoor walking. Our main contribution is the development of features for activity and location recognition based on environmental measurements, which exploit location- and activity-specific characteristics and capture the trends resulted from the underlying physiological process. The features are statistically shown to have good separability and are also information-rich. Fusing environmental sensing together with acceleration is shown to achieve classification accuracy as high as 99.13%. For building applications, this study motivates a sensor fusion paradigm for learning individualized activity, location, and environmental preferences for energy management and user comfort.