CLJun 30, 2022Code
Democratizing Ethical Assessment of Natural Language Generation ModelsAmin Rasekh, Ian Eisenberg
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context. Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms by generating discriminatory language, hateful speech, profane content, and other harmful material. Ethical assessment of these models is therefore critical. But it is also a challenging task, requiring an expertise in several specialized domains, such as computational linguistics and social justice. While significant strides have been made by the research community in this domain, accessibility of such ethical assessments to the wider population is limited due to the high entry barriers. This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models: Tool for Ethical Assessment of Language generation models (TEAL), a component of Credo AI Lens, an open-source assessment framework.
CYJul 19, 2021
Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case StudyAmitoj Singh, Jingshu Chen, Lihao Zhang et al.
An independent ethical assessment of an artificial intelligence system is an impartial examination of the system's development, deployment, and use in alignment with ethical values. System-level qualitative frameworks that describe high-level requirements and component-level quantitative metrics that measure individual ethical dimensions have been developed over the past few years. However, there exists a gap between the two, which hinders the execution of independent ethical assessments in practice. This study bridges this gap and designs a holistic independent ethical assessment process for a text classification model with a special focus on the task of hate speech detection. The assessment is further augmented with protected attributes mining and counterfactual-based analysis to enhance bias assessment. It covers assessments of technical performance, data bias, embedding bias, classification bias, and interpretability. The proposed process is demonstrated through an assessment of a deep hate speech detection model.
CRJan 25, 2020
A Review of Cybersecurity Incidents in the Water SectorAmin Hassanzadeh, Amin Rasekh, Stefano Galelli et al.
This study presents a critical review of disclosed, documented, and malicious cybersecurity incidents in the water sector to inform safeguarding efforts against cybersecurity threats. The review is presented within a technical context of industrial control system architectures, attack-defense models, and security solutions. Fifteen incidents were selected and analyzed through a search strategy that included a variety of public information sources ranging from federal investigation reports to scientific papers. For each individual incident, the situation, response, remediation, and lessons learned were compiled and described. The findings of this review indicate an increase in the frequency, diversity, and complexity of cyberthreats to the water sector. Although the emergence of new threats, such as ransomware or cryptojacking, was found, a recurrence of similar vulnerabilities and threats, such as insider threats, was also evident, emphasizing the need for an adaptive, cooperative, and comprehensive approach to water cyberdefense.
LGJan 30, 2019
Enhanced Variational Inference with Dyadic TransformationSarin Chandy, Amin Rasekh
Variational autoencoder is a powerful deep generative model with variational inference. The practice of modeling latent variables in the VAE's original formulation as normal distributions with a diagonal covariance matrix limits the flexibility to match the true posterior distribution. We propose a new transformation, dyadic transformation (DT), that can model a multivariate normal distribution. DT is a single-stage transformation with low computational requirements. We demonstrate empirically on MNIST dataset that DT enhances the posterior flexibility and attains competitive results compared to other VAE enhancements.
CRMay 31, 2018
Cyberattack Detection using Deep Generative Models with Variational InferenceSarin E. Chandy, Amin Rasekh, Zachary A. Barker et al.
Recent years have witnessed a rise in the frequency and intensity of cyberattacks targeted at critical infrastructure systems. This study designs a versatile, data-driven cyberattack detection platform for infrastructure systems cybersecurity, with a special demonstration in water sector. A deep generative model with variational inference autonomously learns normal system behavior and detects attacks as they occur. The model can process the natural data in its raw form and automatically discover and learn its representations, hence augmenting system knowledge discovery and reducing the need for laborious human engineering and domain expertise. The proposed model is applied to a simulated cyberattack detection problem involving a drinking water distribution system subject to programmable logic controller hacks, malicious actuator activation, and deception attacks. The model is only provided with observations of the system, such as pump pressure and tank water level reads, and is blind to the internal structures and workings of the water distribution system. The simulated attacks are manifested in the model's generated reproduction probability plot, indicating its ability to discern the attacks. There is, however, need for improvements in reducing false alarms, especially by optimizing detection thresholds. Altogether, the results indicate ability of the model in distinguishing attacks and their repercussions from normal system operation in water distribution systems, and the promise it holds for cyberattack detection in other domains.
OHJul 1, 2014
A Dynamic Simulation-Optimization Model for Adaptive Management of Urban Water Distribution System Contamination ThreatsAmin Rasekh, Kelly Brumbelow
Urban water distribution systems hold a critical and strategic position in preserving public health and industrial growth. Despite the ubiquity of these urban systems, aging infrastructure, and increased risk of terrorism, decision support models for a timely and adaptive contamination emergency response still remain at an undeveloped stage. Emergency response is characterized as a progressive, interactive, and adaptive process that involves parallel activities of processing streaming information and executing response actions. This study develops a dynamic decision support model that adaptively simulates the time-varying emergency environment and tracks changing best health protection response measures at every stage of an emergency in real-time. Feedback mechanisms between the contaminated network, emergency managers, and consumers are incorporated in a dynamic simulation model to capture time-varying characteristics of an emergency environment. An evolutionary-computation-based dynamic optimization model is developed to adaptively identify time-dependant optimal health protection measures during an emergency. This dynamic simulation-optimization model treats perceived contaminant source attributes as time-varying parameters to account for perceived contamination source updates as more data stream in over time. Performance of the developed dynamic decision support model is analyzed and demonstrated using a mid-size virtual city that resembles the dynamics and complexity of real-world urban systems. This adaptive emergency response optimization model is intended to be a major component of an all-inclusive cyberinfrastructure for efficient contamination threat management, which is currently under development.
CYJan 30, 2014
Human Activity Recognition using SmartphoneAmin Rasekh, Chien-An Chen, Yan Lu
Human activity recognition has wide applications in medical research and human survey system. In this project, we design a robust activity recognition system based on a smartphone. The system uses a 3-dimentional smartphone accelerometer as the only sensor to collect time series signals, from which 31 features are generated in both time and frequency domain. Activities are classified using 4 different passive learning methods, i.e., quadratic classifier, k-nearest neighbor algorithm, support vector machine, and artificial neural networks. Dimensionality reduction is performed through both feature extraction and subset selection. Besides passive learning, we also apply active learning algorithms to reduce data labeling expense. Experiment results show that the classification rate of passive learning reaches 84.4% and it is robust to common positions and poses of cellphone. The results of active learning on real data demonstrate a reduction of labeling labor to achieve comparable performance with passive learning.