CLJul 5, 2023Code
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical RecordsViktor Schlegel, Hao Li, Yuping Wu et al. · tencent-ai
This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.
AIApr 27, 2023Code
Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel ClassificationThanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap et al.
Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.
CLJun 5, 2023
PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language ModelsHao Li, Yuping Wu, Viktor Schlegel et al. · tencent-ai
Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.
CVJun 28, 2022
Generating near-infrared facial expression datasets with dimensional affect labelsCalvin Chen, Stefan Winkler
Facial expression analysis has long been an active research area of computer vision. Traditional methods mainly analyse images for prototypical discrete emotions; as a result, they do not provide an accurate depiction of the complex emotional states in humans. Furthermore, illumination variance remains a challenge for face analysis in the visible light spectrum. To address these issues, we propose using a dimensional model based on valence and arousal to represent a wider range of emotions, in combination with near infra-red (NIR) imagery, which is more robust to illumination changes. Since there are no existing NIR facial expression datasets with valence-arousal labels available, we present two complementary data augmentation methods (face morphing and CycleGAN approach) to create NIR image datasets with dimensional emotion labels from existing categorical and/or visible-light datasets. Our experiments show that these generated NIR datasets are comparable to existing datasets in terms of data quality and baseline prediction performance.
CLAug 22, 2024
LLMs are not Zero-Shot Reasoners for Biomedical Information ExtractionAishik Nagar, Viktor Schlegel, Thanh-Tung Nguyen et al.
Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extraction. To bridge this gap, in this paper, we systematically benchmark LLM performance in Medical Classification and Named Entity Recognition (NER) tasks. We aim to disentangle the contribution of different factors to the performance, particularly the impact of LLMs' task knowledge and reasoning capabilities, their (parametric) domain knowledge, and addition of external knowledge. To this end, we evaluate various open LLMs - including BioMistral and Llama-2 models - on a diverse set of biomedical datasets, using standard prompting, Chain of-Thought (CoT) and Self Consistency based reasoning as well as Retrieval-Augmented Generation (RAG) with PubMed and Wikipedia corpora. Counter intuitively, our results reveal that standard prompting consistently outperforms more complex techniques across both tasks, laying bare the limitations in the current application of CoT, self-consistency and RAG in the biomedical domain. Our findings suggest that advanced prompting methods developed for knowledge- or reasoning-intensive tasks, such as CoT or RAG, are not easily portable to biomedical tasks where precise structured outputs are required. This highlights the need for more effective integration of external knowledge and reasoning mechanisms in LLMs to enhance their performance in real-world biomedical applications.
CLSep 8, 2024
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?Neeladri Bhuiya, Viktor Schlegel, Stefan Winkler
State-of-the-art Large Language Models (LLMs) are accredited with an increasing number of different capabilities, ranging from reading comprehension, over advanced mathematical and reasoning skills to possessing scientific knowledge. In this paper we focus on their multi-hop reasoning capability: the ability to identify and integrate information from multiple textual sources. Given the concerns with the presence of simplifying cues in existing multi-hop reasoning benchmarks, which allow models to circumvent the reasoning requirement, we set out to investigate, whether LLMs are prone to exploiting such simplifying cues. We find evidence that they indeed circumvent the requirement to perform multi-hop reasoning, but they do so in more subtle ways than what was reported about their fine-tuned pre-trained language model (PLM) predecessors. Motivated by this finding, we propose a challenging multi-hop reasoning benchmark, by generating seemingly plausible multi-hop reasoning chains, which ultimately lead to incorrect answers. We evaluate multiple open and proprietary state-of-the-art LLMs, and find that their performance to perform multi-hop reasoning is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives. We conduct a deeper analysis and find evidence that while LLMs tend to ignore misleading lexical cues, misleading reasoning paths indeed present a significant challenge.
CLAug 26, 2024
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic DialoguesKuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel et al.
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solutions. Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs). Specifically, we leverage the in-context learning capabilities of LLMs and instruct them to generate ASR-like errors based on a few available medical dialogue examples with audio recordings. Experimental results show that LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems. This approach addresses the challenges of noisy ASR outputs in critical applications, offering a robust solution to enhance the reliability of clinical dialogue summarization.
ASSep 9, 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute BudgetAndy T. Liu, Yi-Cheng Lin, Haibin Wu et al.
Despite their impressive success, training foundation models remains computationally costly. This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget. We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size. Our goal is to make analytical steps toward understanding the training dynamics of speech foundation models. We benchmark SSL objectives in an entirely comparable setting and find that other factors contribute more significantly to the success of SSL. Our results show that slimmer model architectures outperform common small architectures under the same compute and parameter budget. We demonstrate that the size of the pre-training data remains crucial, even with data augmentation during SSL training, as performance suffers when iterating over limited data. Finally, we identify a trade-off between model size and data size, highlighting an optimal model size for a given compute budget.
LGOct 26, 2023
MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution ShiftDexter Neo, Stefan Winkler, Tsuhan Chen
We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.
MLJun 12, 2018Code
The Unusual Effectiveness of Averaging in GAN TrainingYasin Yazıcı, Chuan-Sheng Foo, Stefan Winkler et al.
We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known to lead to convergence in bilinear settings, we provide the -- to our knowledge -- first theoretical arguments in support of EMA. We show that EMA converges to limit cycles around the equilibrium with vanishing amplitude as the discount parameter approaches one for simple bilinear games and also enhances the stability of general GAN training. We establish experimentally that both techniques are strikingly effective in the non-convex-concave GAN setting as well. Both improve inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets -- mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet -- to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean CelebA face images.\footnote{~The code is available at \url{https://github.com/yasinyazici/EMA_GAN}}
CLOct 17, 2024
Representation Learning of Structured Data for Medical Foundation ModelsVijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu et al.
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text.
CLDec 21, 2023
Automated Clinical Coding for Outpatient DepartmentsViktor Schlegel, Abhinav Ramesh Kashyap, Thanh-Tung Nguyen et al.
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
CLJun 6, 2024
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringAnand Subramanian, Viktor Schlegel, Abhinav Ramesh Kashyap et al.
There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for success on down-stream tasks. Addressing this gap, we use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains. Our multifaceted analysis of the performance of 15 LLMs, further broken down by sub-domain, source of knowledge and model architecture, uncovers success factors such as instruction tuning that lead to improved recall and comprehension. We further show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results, even generalising to unseen specialist sub-domains. We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented context. To foster research and collaboration in this field we share M-QALM, our resources, standardised methodology, and evaluation results, with the research community to facilitate further advancements in clinical knowledge representation learning within language models.
CLMay 27, 2023
A Two-Stage Decoder for Efficient ICD CodingThanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap et al.
Clinical notes in healthcare facilities are tagged with the International Classification of Diseases (ICD) code; a list of classification codes for medical diagnoses and procedures. ICD coding is a challenging multilabel text classification problem due to noisy clinical document inputs and long-tailed label distribution. Recent automated ICD coding efforts improve performance by encoding medical notes and codes with additional data and knowledge bases. However, most of them do not reflect how human coders generate the code: first, the coders select general code categories and then look for specific subcategories that are relevant to a patient's condition. Inspired by this, we propose a two-stage decoding mechanism to predict ICD codes. Our model uses the hierarchical properties of the codes to split the prediction into two steps: At first, we predict the parent code and then predict the child code based on the previous prediction. Experiments on the public MIMIC-III data set show that our model performs well in single-model settings without external data or knowledge.
CLMay 22, 2023
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and BeyondAbhinav Ramesh Kashyap, Thanh-Tung Nguyen, Viktor Schlegel et al.
Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification. They capture the meaning of a sentence, enabling machines to understand and reason over human language. In recent years, significant progress has been made in developing methods for learning sentence representations, including unsupervised, supervised, and transfer learning approaches. However there is no literature review on sentence representations till now. In this paper, we provide an overview of the different methods for sentence representation learning, focusing mostly on deep learning models. We provide a systematic organization of the literature, highlighting the key contributions and challenges in this area. Overall, our review highlights the importance of this area in natural language processing, the progress made in sentence representation learning, and the challenges that remain. We conclude with directions for future research, suggesting potential avenues for improving the quality and efficiency of sentence representations.
CVJan 13, 2022
Trusted Media Challenge Dataset and User StudyWeiling Chen, Sheng Lun Benjamin Chua, Stefan Winkler et al.
The development of powerful deep learning technologies has brought about some negative effects to both society and individuals. One such issue is the emergence of fake media. To tackle the issue, we have organized the Trusted Media Challenge (TMC) to explore how Artificial Intelligence (AI) technologies could be leveraged to combat fake media. To enable further research, we are releasing the dataset that we had prepared from the TMC challenge, consisting of 4,380 fake and 2,563 real videos, with various video and/or audio manipulation methods employed to produce different types of fake media. All the videos in the TMC dataset are accompanied with audios and have a minimum resolution of 360p. The videos have various durations, background, illumination, and may contain perturbations that mimic transmission errors and compression. We have also carried out a user study to demonstrate the quality of the TMC dataset and to compare the performance of humans and AI models. The results showed that the TMC dataset can fool human participants in many cases, and the winning AI models of the Trusted Media Challenge outperformed humans. The TMC dataset is available for research purpose upon request via tmc-dataset@aisingapore.org.
CVOct 19, 2021
Detecting Blurred Ground-based Sky/Cloud ImagesMayank Jain, Navya Jain, Yee Hui Lee et al.
Ground-based whole sky imagers (WSIs) are being used by researchers in various fields to study the atmospheric events. These ground-based sky cameras capture visible-light images of the sky at regular intervals of time. Owing to the atmospheric interference and camera sensor noise, the captured images often exhibit noise and blur. This may pose a problem in subsequent image processing stages. Therefore, it is important to accurately identify the blurred images. This is a difficult task, as clouds have varying shapes, textures, and soft edges whereas the sky acts as a homogeneous and uniform background. In this paper, we propose an efficient framework that can identify the blurred sky/cloud images. Using a static external marker, our proposed methodology has a detection accuracy of 94\%. To the best of our knowledge, our approach is the first of its kind in the automatic identification of blurred images for ground-based sky/cloud images.
CVJun 15, 2021
Efficient Facial Expression Analysis For Dimensional Affect Recognition Using Geometric FeaturesVassilios Vonikakis, Stefan Winkler
Despite their continued popularity, categorical approaches to affect recognition have limitations, especially in real-life situations. Dimensional models of affect offer important advantages for the recognition of subtle expressions and more fine-grained analysis. We introduce a simple but effective facial expression analysis (FEA) system for dimensional affect, solely based on geometric features and Partial Least Squares (PLS) regression. The system jointly learns to estimate Arousal and Valence ratings from a set of facial images. The proposed approach is robust, efficient, and exhibits comparable performance to contemporary deep learning models, while requiring a fraction of the computational resources.
CVMar 4, 2021
Morphset:Augmenting categorical emotion datasets with dimensional affect labels using face morphingVassilios Vonikakis, Dexter Neo, Stefan Winkler
Emotion recognition and understanding is a vital component in human-machine interaction. Dimensional models of affect such as those using valence and arousal have advantages over traditional categorical ones due to the complexity of emotional states in humans. However, dimensional emotion annotations are difficult and expensive to collect, therefore they are not as prevalent in the affective computing community. To address these issues, we propose a method to generate synthetic images from existing categorical emotion datasets using face morphing as well as dimensional labels in the circumplex space with full control over the resulting sample distribution, while achieving augmentation factors of at least 20x or more.
LGJun 25, 2020
Empirical Analysis of Overfitting and Mode Drop in GAN TrainingYasin Yazici, Chuan-Sheng Foo, Stefan Winkler et al.
We examine two key questions in GAN training, namely overfitting and mode drop, from an empirical perspective. We show that when stochasticity is removed from the training procedure, GANs can overfit and exhibit almost no mode drop. Our results shed light on important characteristics of the GAN training procedure. They also provide evidence against prevailing intuitions that GANs do not memorize the training set, and that mode dropping is mainly due to properties of the GAN objective rather than how it is optimized during training.
IVDec 16, 2019
Subjective Quality Assessment of Ground-based Camera ImagesLucie Lévêque, Soumyabrata Dev, Murhaf Hossari et al.
Image quality assessment is critical to control and maintain the perceived quality of visual content. Both subjective and objective evaluations can be utilised, however, subjective image quality assessment is currently considered the most reliable approach. Databases containing distorted images and mean opinion scores are needed in the field of atmospheric research with a view to improve the current state-of-the-art methodologies. In this paper, we focus on using ground-based sky camera images to understand the atmospheric events. We present a new image quality assessment dataset containing original and distorted nighttime images of sky/cloud from SWINSEG database. Subjective quality assessment was carried out in controlled conditions, as recommended by the ITU. Statistical analyses of the subjective scores showed the impact of noise type and distortion level on the perceived quality.
IVOct 11, 2019
Estimating Solar Irradiance Using Sky ImagersSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Ground-based whole sky cameras are extensively used for localized monitoring of clouds nowadays. They capture hemispherical images of the sky at regular intervals using a fisheye lens. In this paper, we propose a framework for estimating solar irradiance from pictures taken by those imagers. Unlike pyranometers, such sky images contain information about cloud coverage and can be used to derive cloud movement. An accurate estimation of solar irradiance using solely those images is thus a first step towards short-term forecasting of solar energy generation based on cloud movement. We derive and validate our model using pyranometers co-located with our whole sky imagers. We achieve a better performance in estimating solar irradiance and in particular its short-term variations as compared to other related methods using ground-based observations.
AO-PHApr 16, 2019
CloudSegNet: A Deep Network for Nychthemeron Cloud Image SegmentationSoumyabrata Dev, Atul Nautiyal, Yee Hui Lee et al.
We analyze clouds in the earth's atmosphere using ground-based sky cameras. An accurate segmentation of clouds in the captured sky/cloud image is difficult, owing to the fuzzy boundaries of clouds. Several techniques have been proposed that use color as the discriminatory feature for cloud detection. In the existing literature, however, analysis of daytime and nighttime images is considered separately, mainly because of differences in image characteristics and applications. In this paper, we propose a light-weight deep-learning architecture called CloudSegNet. It is the first that integrates daytime and nighttime (also known as nychthemeron) image segmentation in a single framework, and achieves state-of-the-art results on public databases.
HCApr 3, 2019
Recognition of Advertisement Emotions with Application to Computational AdvertisingAbhinav Shukla, Shruti Shriya Gullapuram, Harish Katti et al.
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coherent emotions across users; (2) explores the efficacy of content-centric convolutional neural network (CNN) features for AR vis-ã-vis handcrafted audio-visual descriptors; (3) examines user-centric ad AR from Electroencephalogram (EEG) responses acquired during ad-viewing, and (4) demonstrates how better affect predictions facilitate effective computational advertising as determined by a study involving 18 users. Experiments reveal that (a) CNN features outperform audiovisual descriptors for content-centric AR; (b) EEG features are able to encode ad-induced emotions better than content-based features; (c) Multi-task learning performs best among a slew of classification algorithms to achieve optimal AR, and (d) Pursuant to (b), EEG features also enable optimized ad insertion onto streamed video, as compared to content-based or manual insertion techniques in terms of ad memorability and overall user experience.
CVMar 15, 2019
Multi-label Cloud Segmentation Using a Deep NetworkSoumyabrata Dev, Shilpa Manandhar, Yee Hui Lee et al.
Different empirical models have been developed for cloud detection. There is a growing interest in using the ground-based sky/cloud images for this purpose. Several methods exist that perform binary segmentation of clouds. In this paper, we propose to use a deep learning architecture (U-Net) to perform multi-label sky/cloud image segmentation. The proposed approach outperforms recent literature by a large margin.
LGFeb 9, 2019
Venn GAN: Discovering Commonalities and Particularities of Multiple DistributionsYasin Yazıcı, Bruno Lecouat, Chuan-Sheng Foo et al.
We propose a GAN design which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of $K$ generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while non-shared ones capture unique aspects of them. We show the effectiveness of our method on various datasets (MNIST, Fashion MNIST, CIFAR-10, Omniglot, CelebA) with compelling results.
CVNov 21, 2018
PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their RelationshipLe Zhang, Songyou Peng, Stefan Winkler
Apparent personality and emotion analysis are both central to affective computing. Existing works solve them individually. In this paper we investigate if such high-level affect traits and their relationship can be jointly learned from face images in the wild. To this end, we introduce PersEmoN, an end-to-end trainable and deep Siamese-like network. It consists of two convolutional network branches, one for emotion and the other for apparent personality. Both networks share their bottom feature extraction module and are optimized within a multi-task learning framework. Emotion and personality networks are dedicated to their own annotated dataset. Furthermore, an adversarial-like loss function is employed to promote representation coherence among heterogeneous dataset sources. Based on this, we also explore the emotion-to-apparent-personality relationship. Extensive experiments demonstrate the effectiveness of PersEmoN.
HCSep 12, 2018
Investigating the generalizability of EEG-based Cognitive Load Estimation Across VisualizationsViral Parekh, Maneesh Bilalpur, Sharavan Kumar et al.
We examine if EEG-based cognitive load (CL) estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the nback~task. CL is estimated via two recent approaches: (a) Deep convolutional neural network, and (b) Proximal support vector machines. Experiments reveal that CL estimation suffers across visualizations motivating the need for effective machine learning techniques to benchmark visual interface usability for a given analytic task.
HCAug 18, 2018
EEG-based Evaluation of Cognitive Workload Induced by Acoustic Parameters for Data SonificationManeesh Bilalpur, Mohan Kankanhalli, Stefan Winkler et al.
Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been commonplace, not much research has been devoted to other forms of data rendering, eg, sonification. This work is the first to automatically estimate the cognitive load induced by different acoustic parameters considered for sonification in prior studies. We examine cognitive load via (a) perceptual data-sound mapping accuracies of users for the different acoustic parameters, (b) cognitive workload impressions explicitly reported by users, and (c) their implicit EEG responses compiled during the mapping task. Our main findings are that (i) low cognitive load-inducing (ie, more intuitive) acoustic parameters correspond to higher mapping accuracies, (ii) EEG spectral power analysis reveals higher $α$ band power for low cognitive load parameters, implying a congruent relationship between explicit and implicit user responses, and (iii) Cognitive load classification with EEG features achieves a peak F1-score of 0.64, confirming that reliable workload estimation is achievable with user EEG data compiled using wearable sensors.
CVMay 2, 2018
A Deep Network for Arousal-Valence Emotion Prediction with Acoustic-Visual CuesSongyou Peng, Le Zhang, Yutong Ban et al.
In this paper, we comprehensively describe the methodology of our submissions to the One-Minute Gradual-Emotion Behavior Challenge 2018.
CVMar 2, 2018
High-Dynamic-Range Imaging for Cloud SegmentationSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Sky/cloud images obtained from ground-based sky-cameras are usually captured using a fish-eye lens with a wide field of view. However, the sky exhibits a large dynamic range in terms of luminance, more than a conventional camera can capture. It is thus difficult to capture the details of an entire scene with a regular camera in a single shot. In most cases, the circumsolar region is over-exposed, and the regions near the horizon are under-exposed. This renders cloud segmentation for such images difficult. In this paper, we propose HDRCloudSeg -- an effective method for cloud segmentation using High-Dynamic-Range (HDR) imaging based on multi-exposure fusion. We describe the HDR image generation process and release a new database to the community for benchmarking. Our proposed approach is the first using HDR radiance maps for cloud segmentation and achieves very good results.
AO-PHAug 24, 2017
Study of Clear Sky Models for SingaporeSoumyabrata Dev, Shilpa Manandhar, Yee Hui Lee et al.
The estimation of total solar irradiance falling on the earth's surface is important in the field of solar energy generation and forecasting. Several clear-sky solar radiation models have been developed over the last few decades. Most of these models are based on empirical distribution of various geographical parameters; while a few models consider various atmospheric effects in the solar energy estimation. In this paper, we perform a comparative analysis of several popular clear-sky models, in the tropical region of Singapore. This is important in countries like Singapore, where we are primarily focused on reliable and efficient solar energy generation. We analyze and compare three popular clear-sky models that are widely used in the literature. We validate our solar estimation results using actual solar irradiance measurements obtained from collocated weather stations. We finally conclude the most reliable clear sky model for Singapore, based on all clear sky days in a year.
CVJul 18, 2017
Beyond Forward Shortcuts: Fully Convolutional Master-Slave Networks (MSNets) with Backward Skip Connections for Semantic SegmentationAbrar H. Abdulnabi, Stefan Winkler, Gang Wang
Recent deep CNNs contain forward shortcut connections; i.e. skip connections from low to high layers. Reusing features from lower layers that have higher resolution (location information) benefit higher layers to recover lost details and mitigate information degradation. However, during inference the lower layers do not know about high layer features, although they contain contextual high semantics that benefit low layers to adaptively extract informative features for later layers. In this paper, we study the influence of backward skip connections which are in the opposite direction to forward shortcuts, i.e. paths from high layers to low layers. To achieve this -- which indeed runs counter to the nature of feed-forward networks -- we propose a new fully convolutional model that consists of a pair of networks. A `Slave' network is dedicated to provide the backward connections from its top layers to the `Master' network's bottom layers. The Master network is used to produce the final label predictions. In our experiments we validate the proposed FCN model on ADE20K (ImageNet scene parsing), PASCAL-Context, and PASCAL VOC 2011 datasets.
CVMay 30, 2017
Nighttime sky/cloud image segmentationSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Imaging the atmosphere using ground-based sky cameras is a popular approach to study various atmospheric phenomena. However, it usually focuses on the daytime. Nighttime sky/cloud images are darker and noisier, and thus harder to analyze. An accurate segmentation of sky/cloud images is already challenging because of the clouds' non-rigid structure and size, and the lower and less stable illumination of the night sky increases the difficulty. Nonetheless, nighttime cloud imaging is essential in certain applications, such as continuous weather analysis and satellite communication. In this paper, we propose a superpixel-based method to segment nighttime sky/cloud images. We also release the first nighttime sky/cloud image segmentation database to the research community. The experimental results show the efficacy of our proposed algorithm for nighttime images.
CVApr 19, 2017
Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range capturesSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Ground-based whole sky imagers are popular for monitoring cloud formations, which is necessary for various applications. We present two new Wide Angle High-Resolution Sky Imaging System (WAHRSIS) models, which were designed especially to withstand the hot and humid climate of Singapore. The first uses a fully sealed casing, whose interior temperature is regulated using a Peltier cooler. The second features a double roof design with ventilation grids on the sides, allowing the outside air to flow through the device. Measurements of temperature inside these two devices show their ability to operate in Singapore weather conditions. Unlike our original WAHRSIS model, neither uses a mechanical sun blocker to prevent the direct sunlight from reaching the camera; instead they rely on high-dynamic-range imaging (HDRI) techniques to reduce the glare from the sun.
AO-PHMar 15, 2017
Cloud Radiative Effect Study Using Sky CameraSoumyabrata Dev, Shilpa Manandhar, Feng Yuan et al.
The analysis of clouds in the earth's atmosphere is important for a variety of applications, viz. weather reporting, climate forecasting, and solar energy generation. In this paper, we focus our attention on the impact of cloud on the total solar irradiance reaching the earth's surface. We use weather station to record the total solar irradiance. Moreover, we employ collocated ground-based sky camera to automatically compute the instantaneous cloud coverage. We analyze the relationship between measured solar irradiance and computed cloud coverage value, and conclude that higher cloud coverage greatly impacts the total solar irradiance. Such studies will immensely help in solar energy generation and forecasting.
CVJan 17, 2017
Systematic study of color spaces and components for the segmentation of sky/cloud imagesSoumyabrata Dev, Yee Hui Lee, Stefan Winkler
Sky/cloud imaging using ground-based Whole Sky Imagers (WSI) is a cost-effective means to understanding cloud cover and weather patterns. The accurate segmentation of clouds in these images is a challenging task, as clouds do not possess any clear structure. Several algorithms using different color models have been proposed in the literature. This paper presents a systematic approach for the selection of color spaces and components for optimal segmentation of sky/cloud images. Using mainly principal component analysis (PCA) and fuzzy clustering for evaluation, we identify the most suitable color components for this task.
CVNov 3, 2016
Rough Set Based Color Channel SelectionSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Color channel selection is essential for accurate segmentation of sky and clouds in images obtained from ground-based sky cameras. Most prior works in cloud segmentation use threshold based methods on color channels selected in an ad-hoc manner. In this letter, we propose the use of rough sets for color channel selection in visible-light images. Our proposed approach assesses color channels with respect to their contribution for segmentation, and identifies the most effective ones.
CVOct 21, 2016
Detecting Rainfall Onset Using Sky ImagesSoumyabrata Dev, Shilpa Manandhar, Yee Hui Lee et al.
Ground-based sky cameras (popularly known as Whole Sky Imagers) are increasingly used now-a-days for continuous monitoring of the atmosphere. These imagers have higher temporal and spatial resolutions compared to conventional satellite images. In this paper, we use ground-based sky cameras to detect the onset of rainfall. These images contain additional information about cloud coverage and movement and are therefore useful for accurate rainfall nowcast. We validate our results using rain gauge measurement recordings and achieve an accuracy of 89% for correct detection of rainfall onset.
CVOct 21, 2016
Short-term prediction of localized cloud motion using ground-based sky imagersSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Fine-scale short-term cloud motion prediction is needed for several applications, including solar energy generation and satellite communications. In tropical regions such as Singapore, clouds are mostly formed by convection; they are very localized, and evolve quickly. We capture hemispherical images of the sky at regular intervals of time using ground-based cameras. They provide a high resolution and localized cloud images. We use two successive frames to compute optical flow and predict the future location of clouds. We achieve good prediction accuracy for a lead time of up to 5 minutes.
CVJun 12, 2016
Color-based Segmentation of Sky/Cloud Images From Ground-based CamerasSoumyabrata Dev, Yee Hui Lee, Stefan Winkler
Sky/cloud images captured by ground-based cameras (a.k.a. whole sky imagers) are increasingly used nowadays because of their applications in a number of fields, including climate modeling, weather prediction, renewable energy generation, and satellite communications. Due to the wide variety of cloud types and lighting conditions in such images, accurate and robust segmentation of clouds is challenging. In this paper, we present a supervised segmentation framework for ground-based sky/cloud images based on a systematic analysis of different color spaces and components, using partial least squares (PLS) regression. Unlike other state-of-the-art methods, our proposed approach is entirely learning-based and does not require any manually-defined parameters. In addition, we release the Singapore Whole Sky IMaging SEGmentation Database (SWIMSEG), a large database of annotated sky/cloud images, to the research community.
CVJun 9, 2016
Machine Learning Techniques and Applications For Ground-based Image AnalysisSoumyabrata Dev, Bihan Wen, Yee Hui Lee et al.
Ground-based whole sky cameras have opened up new opportunities for monitoring the earth's atmosphere. These cameras are an important complement to satellite images by providing geoscientists with cheaper, faster, and more localized data. The images captured by whole sky imagers can have high spatial and temporal resolution, which is an important pre-requisite for applications such as solar energy modeling, cloud attenuation analysis, local weather prediction, etc. Extracting valuable information from the huge amount of image data by detecting and analyzing the various entities in these images is challenging. However, powerful machine learning techniques have become available to aid with the image analysis. This article provides a detailed walk-through of recent developments in these techniques and their applications in ground-based imaging. We aim to bridge the gap between computer vision and remote sensing with the help of illustrative examples. We demonstrate the advantages of using machine learning techniques in ground-based image analysis via three primary applications -- segmentation, classification, and denoising.
IMJun 8, 2016
Estimation of solar irradiance using ground-based whole sky imagersSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Ground-based whole sky imagers (WSIs) can provide localized images of the sky of high temporal and spatial resolution, which permits fine-grained cloud observation. In this paper, we show how images taken by WSIs can be used to estimate solar radiation. Sky cameras are useful here because they provide additional information about cloud movement and coverage, which are otherwise not available from weather station data. Our setup includes ground-based weather stations at the same location as the imagers. We use their measurements to validate our methods.
IMMay 21, 2016
WAHRSIS: A Low-cost, High-resolution Whole Sky Imager With Near-Infrared CapabilitiesSoumyabrata Dev, Florian M. Savoy, Yee Hui Lee et al.
Cloud imaging using ground-based whole sky imagers is essential for a fine-grained understanding of the effects of cloud formations, which can be useful in many applications. Some such imagers are available commercially, but their cost is relatively high, and their flexibility is limited. Therefore, we built a new daytime Whole Sky Imager (WSI) called Wide Angle High-Resolution Sky Imaging System. The strengths of our new design are its simplicity, low manufacturing cost and high resolution. Our imager captures the entire hemisphere in a single high-resolution picture via a digital camera using a fish-eye lens. The camera was modified to capture light across the visible as well as the near-infrared spectral ranges. This paper describes the design of the device as well as the geometric and radiometric calibration of the imaging system.
HCApr 6, 2016
PET: An Eye-tracking Dataset for Animal-centric PASCAL Object ClassesSyed Omer Gilani, Ramanathan Subramanian, Yan Yan et al.
We present the Pascal animal classes Eye Tracking database. Our database comprises eye movement recordings compiled from forty users for the bird, cat, cow, dog, horse and sheep {trainval} sets from the VOC 2012 image set. Different from recent eye-tracking databases such as \cite{kiwon_cvpr13_gaze,PapadopoulosCKF14}, a salient aspect of PET is that it contains eye movements recorded for both the free-viewing and visual search task conditions. While some differences in terms of overall gaze behavior and scanning patterns are observed between the two conditions, a very similar number of fixations are observed on target objects for both conditions. As a utility application, we show how feature pooling around fixated locations enables enhanced (animal) object classification accuracy.