CVAug 16, 2022
Context-Aware Streaming Perception in Dynamic EnvironmentsGur-Eyal Sela, Ionel Gog, Justin Wong et al. · berkeley
Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy.
NINov 27, 2018
Pible: Battery-Free Mote for Perpetual Indoor BLE ApplicationsFrancesco Fraternali, Bharathan Balaji, Yuvraj Agarwal et al.
Smart building applications require a large-scale deployment of sensors distributed across the environment. Recent innovations in smart environments are driven by wireless networked sensors as they are easy to deploy. However, replacing these batteries at scale is a non-trivial, labor-intensive task. Energy harvesting has emerged as a potential solution to avoid battery replacement but requires compromises such as application specific design, simplified communication protocol or reduced quality of service. We explore the design space of battery-free sensor nodes using commercial off the shelf components, and present Pible: a Perpetual Indoor BLE sensor node that leverages ambient light and can support numerous smart building applications. We analyze node-lifetime, quality of service and light availability trade-offs and present a predictive algorithm that adapts to changing lighting conditions to maximize node lifetime and application quality of service. Using a 20 node, 15-day deployment in a real building under varying lighting conditions, we show feasible applications that can be implemented using Pible and the boundary conditions under which they can fail.
LGSep 30, 2022Code
B2RL: An open-source Dataset for Building Batch Reinforcement LearningHsin-Yu Liu, Xiaohan Fu, Bharathan Balaji et al.
Batch reinforcement learning (BRL) is an emerging research area in the RL community. It learns exclusively from static datasets (i.e. replay buffers) without interaction with the environment. In the offline settings, existing replay experiences are used as prior knowledge for BRL models to find the optimal policy. Thus, generating replay buffers is crucial for BRL model benchmark. In our B2RL (Building Batch RL) dataset, we collected real-world data from our building management systems, as well as buffers generated by several behavioral policies in simulation environments. We believe it could help building experts on BRL research. To the best of our knowledge, we are the first to open-source building datasets for the purpose of BRL learning.
SYJan 27, 2016
Quiver: Using Control Perturbations to Increase the Observability of Sensor Data in Smart BuildingsJason Koh, Bharathan Balaji, Vahideh Akhlaghi et al.
Modern buildings consist of hundreds of sensors and actuators for monitoring and operation of systems such as HVAC, light and security. To enable portable applications in next generation smart buildings, we need models and standardized ontologies that represent these sensors across diverse types of buildings. Recent research has shown that extracting information such as sensor type with available metadata and timeseries data analysis is difficult due to heterogeneity of systems and lack of support for interoperability. We propose perturbations in the control system as a mechanism to increase the observability of building systems to extract contextual information and develop standardized models. We design Quiver, an experimental framework for actuation of building HVAC system that enables us to perturb the control system safely. Using Quiver, we demonstrate three applications using empirical experiments on a real commercial building: colocation of data points, identification of point type and mapping of dependency between actuators. Our results show that we can colocate data points in HVAC terminal units with 98.4 % accuracy and 63 % coverage. We can identify point types of the terminal units with 85.3 % accuracy. Finally, we map the dependency links between actuators with an accuracy of 73.5 %, with 8.1 % and 18.4 % false positives and false negatives respectively.
SYJan 7, 2017
Modeling Actuation Constraints for IoT ApplicationsBharathan Balaji, Brad Campbell, Amit Levy et al.
Internet of Things (IoT) promises to bring ease of monitoring, better efficiency and innovative services across many domains with connected devices around us. With information from critical parts of infrastructure and powerful cloud-based data analytics, many applications can be developed to gain insights about IoT systems as well as transform their capabilities. Actuation applications form an essential part of these IoT systems, as they enable automation as well as fast low-level decision making. However, modern IoT systems are designed for data acquisition, and actuation applications are implemented in an ad-hoc manner. We identify modeling constraints in a systematic manner as indispensable to support actuation applications because constraints encompass high-level policies dictated by laws of physics, legal policies, user preferences. We explore data models for constraints inIoT system with the example of a home heating system and illustrate the challenges in enforcing these constraints in theIoT system architecture.
CVApr 10, 2023
Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ CamerasSandeep Singh Sandha, Bharathan Balaji, Luis Garcia et al.
Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios.
CLNov 11, 2025
SpiderGen: Towards Procedure Generation For Carbon Life Cycle Assessments with Generative AIAnupama Sitaraman, Bharathan Balaji, Yuvraj Agarwal
Investigating the effects of climate change and global warming caused by GHG emissions have been a key concern worldwide. These emissions are largely contributed to by the production, use and disposal of consumer products. Thus, it is important to build tools to estimate the environmental impact of consumer goods, an essential part of which is conducting Life Cycle Assessments (LCAs). LCAs specify and account for the appropriate processes involved with the production, use, and disposal of the products. We present SpiderGen, an LLM-based workflow which integrates the taxonomy and methodology of traditional LCA with the reasoning capabilities and world knowledge of LLMs to generate graphical representations of the key procedural information used for LCA, known as Product Category Rules Process Flow Graphs (PCR PFGs). We additionally evaluate the output of SpiderGen by comparing it with 65 real-world LCA documents. We find that SpiderGen provides accurate LCA process information that is either fully correct or has minor errors, achieving an F1-Score of 65% across 10 sample data points, as compared to 53% using a one-shot prompting method. We observe that the remaining errors occur primarily due to differences in detail between LCA documents, as well as differences in the "scope" of which auxiliary processes must also be included. We also demonstrate that SpiderGen performs better than several baselines techniques, such as chain-of-thought prompting and one-shot prompting. Finally, we highlight SpiderGen's potential to reduce the human effort and costs for estimating carbon impact, as it is able to produce LCA process information for less than \$1 USD in under 10 minutes as compared to the status quo LCA, which can cost over \$25000 USD and take up to 21-person days.
CVDec 30, 2025
RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video AnalyticsGur-Eyal Sela, Kumar Krishna Agrawal, Bharathan Balaji et al.
Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic model size selection (DMSS) is an attractive approach: it is content-aware but treats models as black boxes, and could potentially reduce cost by up to 10x without model retraining or modification. Without ground truth labels at runtime, we observe that DMSS methods use two stages per segment: (i) sampling a few models to calculate prediction statistics (e.g., confidences), then (ii) selection of the model size from those statistics. Prior systems fail to generalize to diverse workloads, particularly to mobile videos and lower accuracy targets. We identify that the failure modes stem from inefficient sampling whose cost exceeds its benefit, and inaccurate per-segment accuracy prediction. In this work, we present RedunCut, a new DMSS system that addresses both: It uses a measurement-driven planner that estimates the cost-benefit tradeoff of sampling, and a lightweight, data-driven performance model to improve accuracy prediction. Across road-vehicle, drone, and surveillance videos and multiple model families and tasks, RedunCut reduces compute cost by 14-62% at fixed accuracy and remains robust to limited historical data and to drift.
ROJan 30, 2024Code
OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman FilteringAlexander Schperberg, Yusuke Tanaka, Saviz Mowlavi et al.
State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState
94.3LGMay 11
Internalizing Curriculum Judgment for LLM Reinforcement Fine-TuningHan Zheng, Yining Ma, Karthick Gunasekaran et al.
In LLM Reinforcement Fine-Tuning (RFT), curriculum learning drives both efficiency and performance. Yet, current methods externalize curriculum judgment via handcrafted heuristics or auxiliary models, risking misalignment with the policy's training dynamics. In this paper, we introduce METIS (METacognitive Internalized Self-judgment), a novel framework that internalizes curriculum judgment as a native capability. Leveraging a critical observation that within-prompt reward variance effectively gauges prompt informativeness, METIS predicts this metric based on recent training outcomes as lightweight in-context learning examples. This intrinsic self-judgment then dynamically dictates the training allocation. Moreover, METIS closes the loop between judgment and optimization by jointly optimizing the standard RFT rewards and a self-judgment reward. This allows the policy to learn what to learn next, as a form of metacognition. Across extensive discrete and continuous RFT benchmarks from mathematical reasoning, code generation, to agentic function-calling, METIS consistently delivers superior performance while accelerating convergence by up to 67%. By bypassing handcrafted heuristics and auxiliary models, our work establishes a simple, closed-loop, and highly efficient curriculum internalization paradigm for LLM reinforcement fine-tuning.
CLOct 22, 2025Code
An Expert-grounded benchmark of General Purpose LLMs in LCAArtur Donaldson, Bharathan Balaji, Cajetan Oriekezie et al.
Purpose: Artificial intelligence (AI), and in particular large language models (LLMs), are increasingly being explored as tools to support life cycle assessment (LCA). While demonstrations exist across environmental and social domains, systematic evidence on their reliability, robustness, and usability remains limited. This study provides the first expert-grounded benchmark of LLMs in LCA, addressing the absence of standardized evaluation frameworks in a field where no clear ground truth or consensus protocols exist. Methods: We evaluated eleven general-purpose LLMs, spanning both commercial and open-source families, across 22 LCA-related tasks. Seventeen experienced practitioners reviewed model outputs against criteria directly relevant to LCA practice, including scientific accuracy, explanation quality, robustness, verifiability, and adherence to instructions. We collected 168 expert reviews. Results: Experts judged 37% of responses to contain inaccurate or misleading information. Ratings of accuracy and quality of explanation were generally rated average or good on many models even smaller models, and format adherence was generally rated favourably. Hallucination rates varied significantly, with some models producing hallucinated citations at rates of up to 40%. There was no clear-cut distinction between ratings on open-weight versus closed-weight LLMs, with open-weight models outperforming or competing on par with closed-weight models on criteria such as accuracy and quality of explanation. Conclusion: These findings highlight the risks of applying LLMs naïvely in LCA, such as when LLMs are treated as free-form oracles, while also showing benefits especially around quality of explanation and alleviating labour intensiveness of simple tasks. The use of general-purpose LLMs without grounding mechanisms presents ...
CLAug 5, 2025Code
CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented GenerationKaiwen Zhao, Bharathan Balaji, Stephen Lee
Product sustainability reports provide valuable insights into the environmental impacts of a product and are often distributed in PDF format. These reports often include a combination of tables and text, which complicates their analysis. The lack of standardization and the variability in reporting formats further exacerbate the difficulty of extracting and interpreting relevant information from large volumes of documents. In this paper, we tackle the challenge of answering questions related to carbon footprints within sustainability reports available in PDF format. Unlike previous approaches, our focus is on addressing the difficulties posed by the unstructured and inconsistent nature of text extracted from PDF parsing. To facilitate this analysis, we introduce CarbonPDF-QA, an open-source dataset containing question-answer pairs for 1735 product report documents, along with human-annotated answers. Our analysis shows that GPT-4o struggles to answer questions with data inconsistencies. To address this limitation, we propose CarbonPDF, an LLM-based technique specifically designed to answer carbon footprint questions on such datasets. We develop CarbonPDF by fine-tuning Llama 3 with our training data. Our results show that our technique outperforms current state-of-the-art techniques, including question-answering (QA) systems finetuned on table and text data.
LGNov 5, 2019Code
DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement LearningBharathan Balaji, Sunil Mallya, Sahika Genc et al.
DeepRacer is a platform for end-to-end experimentation with RL and can be used to systematically investigate the key challenges in developing intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation and solution of a robust reinforcement learning algorithm, 2) narrowing the reality gap through joint perception and dynamics, 3) distributed on-demand compute architecture for training optimal policies, and 4) a robust evaluation method to identify when to stop training. It is the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning. We open source our code and video demo on GitHub: https://git.io/fjxoJ.
IVMar 19, 2024
FUELVISION: A Multimodal Data Fusion and Multimodel Ensemble Algorithm for Wildfire Fuels MappingRiyaaz Uddien Shaik, Mohamad Alipour, Eric Rowell et al.
Accurate assessment of fuel conditions is a prerequisite for fire ignition and behavior prediction, and risk management. The method proposed herein leverages diverse data sources including Landsat-8 optical imagery, Sentinel-1 (C-band) Synthetic Aperture Radar (SAR) imagery, PALSAR (L-band) SAR imagery, and terrain features to capture comprehensive information about fuel types and distributions. An ensemble model was trained to predict landscape-scale fuels such as the 'Scott and Burgan 40' using the as-received Forest Inventory and Analysis (FIA) field survey plot data obtained from the USDA Forest Service. However, this basic approach yielded relatively poor results due to the inadequate amount of training data. Pseudo-labeled and fully synthetic datasets were developed using generative AI approaches to address the limitations of ground truth data availability. These synthetic datasets were used for augmenting the FIA data from California to enhance the robustness and coverage of model training. The use of an ensemble of methods including deep learning neural networks, decision trees, and gradient boosting offered a fuel mapping accuracy of nearly 80\%. Through extensive experimentation and evaluation, the effectiveness of the proposed approach was validated for regions of the 2021 Dixie and Caldor fires. Comparative analyses against high-resolution data from the National Agriculture Imagery Program (NAIP) and timber harvest maps affirmed the robustness and reliability of the proposed approach, which is capable of near-real-time fuel mapping.
HCJul 18, 2020
Quick Question: Interrupting Users for Microtasks with Reinforcement LearningBo-Jhang Ho, Bharathan Balaji, Mehmet Koseoglu et al.
Human attention is a scarce resource in modern computing. A multitude of microtasks vie for user attention to crowdsource information, perform momentary assessments, personalize services, and execute actions with a single touch. A lot gets done when these tasks take up the invisible free moments of the day. However, an interruption at an inappropriate time degrades productivity and causes annoyance. Prior works have exploited contextual cues and behavioral data to identify interruptibility for microtasks with much success. With Quick Question, we explore use of reinforcement learning (RL) to schedule microtasks while minimizing user annoyance and compare its performance with supervised learning. We model the problem as a Markov decision process and use Advantage Actor Critic algorithm to identify interruptible moments based on context and history of user interactions. In our 5-week, 30-participant study, we compare the proposed RL algorithm against supervised learning methods. While the mean number of responses between both methods is commensurate, RL is more effective at avoiding dismissal of notifications and improves user experience over time.
LGNov 24, 2019
ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization ProblemsBharathan Balaji, Jordan Bell-Masterson, Enes Bilgin et al.
Reinforcement Learning (RL) has achieved state-of-the-art results in domains such as robotics and games. We build on this previous work by applying RL algorithms to a selection of canonical online stochastic optimization problems with a range of practical applications: Bin Packing, Newsvendor, and Vehicle Routing. While there is a nascent literature that applies RL to these problems, there are no commonly accepted benchmarks which can be used to compare proposed approaches rigorously in terms of performance, scale, or generalizability. This paper aims to fill that gap. For each problem we apply both standard approaches as well as newer RL algorithms and analyze results. In each case, the performance of the trained RL policy is competitive with or superior to the corresponding baselines, while not requiring much in the way of domain knowledge. This highlights the potential of RL in real-world dynamic resource allocation problems.
SYSep 4, 2019
ACES -- Automatic Configuration of Energy Harvesting Sensors with Reinforcement LearningFrancesco Fraternali, Bharathan Balaji, Yuvraj Agarwal et al.
Internet of Things forms the backbone of modern building applications. Wireless sensors are being increasingly adopted for their flexibility and reduced cost of deployment. However, most wireless sensors are powered by batteries today and large deployments are inhibited by manual battery replacement. Energy harvesting sensors provide an attractive alternative, but they need to provide adequate quality of service to applications given uncertain energy availability. We propose using reinforcement learning to optimize the operation of energy harvesting sensors to maximize sensing quality with available energy. We present our system ACES that uses reinforcement learning for periodic and event-driven sensing indoors with ambient light energy harvesting. Our custom-built board uses a supercapacitor to store energy temporarily, senses light, motion events and relays them using Bluetooth Low Energy. Using simulations and real deployments, we show that our sensor nodes adapt to their lighting conditions and continuously sends measurements and events across nights and weekends. We use deployment data to continually adapt sensing to changing environmental patterns and transfer learning to reduce the training time in real deployments. In our 60 node deployment lasting two weeks, we observe a dead time of 0.1%. The periodic sensors that measure luminosity have a mean sampling period of 90 seconds and the event sensors that detect motion with PIR captured 86% of the events on average compared to a battery-powered node.
LGNov 27, 2018
Scaling Configuration of Energy Harvesting Sensors with Reinforcement LearningFrancesco Fraternali, Bharathan Balaji, Rajesh Gupta
With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.
CLJan 2, 2018
Did you hear that? Adversarial Examples Against Automatic Speech RecognitionMoustafa Alzantot, Bharathan Balaji, Mani Srivastava
Speech is a common and effective way of communication between humans, and modern consumer devices such as smartphones and home hubs are equipped with deep learning based accurate automatic speech recognition to enable natural interaction between humans and machines. Recently, researchers have demonstrated powerful attacks against machine learning models that can fool them to produceincorrect results. However, nearly all previous research in adversarial attacks has focused on image recognition and object detection models. In this short paper, we present a first of its kind demonstration of adversarial attacks against speech classification model. Our algorithm performs targeted attacks with 87% success by adding small background noise without having to know the underlying model parameter and architecture. Our attack only changes the least significant bits of a subset of audio clip samples, and the noise does not change 89% the human listener's perception of the audio clip as evaluated in our human study.
CYDec 19, 2016
Managing Commercial HVAC Systems: What do Building Operators Really Need?Bharathan Balaji, Nadir Weibel, Yuvraj Agarwal
Buildings form an essential part of modern life; people spend a significant amount of their time in them, and they consume large amounts of energy. A variety of systems provide services such as lighting, air conditioning and security which are managed using Building Management Systems (BMS) by building operators. To better understand the capability of current BMS and characterize common practices of building operators, we investigated their use across five institutions in the US. We interviewed ten operators and discovered that BMS do not address a number of key concerns for the management of buildings. Our analysis is rooted in the everyday work of building operators and highlights a number of design suggestions to help improve the user experience and management of BMS, ultimately leading to improvements in productivity, as well as buildings comfort and energy efficiency.
HCJan 26, 2016
Genie: A Longitudinal Study Comparing Physical and Software-augmented Thermostats in Office BuildingsBharathan Balaji, Jason Koh, Nadir Weibel et al.
Thermostats are primary interfaces for occupants of office buildings to express their comfort preferences. However, standard thermostats are often ineffective due to inaccessibility, lack of information, or limited responsiveness, leading to occupant discomfort. Software thermostats based on web or smartphone applications provide alternative interfaces to occupants with minimal deployment cost. However, their usage and effectiveness have not been studied extensively in real settings. In this paper we present Genie, a novel software-augmented thermostat that we deployed and studied at our university over a period of 21 months. Our data shows that providing wider thermal control to users does not lead to system abuse and that the effect on energy consumption is minimal while improving comfort and energy awareness. We believe that increased introduction of software thermostats in office buildings will have important effects on comfort and energy consumption and we provide key design recommendations for their implementation and deployment.
SYSep 17, 2015
HVACMeter: Apportionment of HVAC Power to Thermal Zones and Air Handler UnitsJason Koh, Bharathan Balaji, Rajesh Gupta et al.
Heating, Ventilation and Air Conditioning (HVAC) systems consume almost half of the total energy use of commercial buildings. To optimize HVAC energy usage, it is important to understand the energy consumption of individual HVAC components at fine granularities. However, buildings typically only have aggregate building level power and thermal meters. We present HVACMeter, a system which leverages existing sensors in commercial HVAC systems to estimate the energy consumed by individual components of the HVAC system, as well by each thermal zone in buildings. HVACMeter can be generalized to any HVAC system as it uses the basic understanding of HVAC operation, heat transfer equations, and historical sensor data to estimate energy. We deploy HVACMeter to three buildings on our campus, to identify the set of sensors that are important for accurately disaggregating energy use at the level of each Air Handler Unit and each thermal zone within these buildings. HVACMeter power estimations have on an average 44.5 % less RMSE than that of mean power estimates. Furthermore, we highlight the usefulness of HVACMeter energy estimation model for a building fault detection application by quantifying the amount of energy that can be saved by fixing particular faults.