CLJun 8, 2023
T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text ClassificationInigo Jauregi Unanue, Gholamreza Haffari, Massimo Piccardi
Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models vary significantly across languages and classification tasks, suggesting that the superposition of the language modelling and classification tasks is not always effective. For this reason, in this paper we propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages. The proposed approach couples 1) a neural machine translator translating from the targeted language to a high-resource language, with 2) a text classifier trained in the high-resource language, but the neural machine translator generates "soft" translations to permit end-to-end backpropagation during fine-tuning of the pipeline. Extensive experiments have been carried out over three cross-lingual text classification datasets (XNLI, MLDoc and MultiEURLEX), with the results showing that the proposed approach has significantly improved performance over a competitive baseline.
CLMar 6, 2022
A Multi-Document Coverage Reward for RELAXed Multi-Document SummarizationJacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi
Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models. However, a standing limitation of these models is that they are trained against limited references and with plain maximum-likelihood objectives. As for many other generative tasks, reinforcement learning (RL) offers the potential to improve the training of MDS models; yet, it requires a carefully-designed reward that can ensure appropriate leverage of both the reference summaries and the input documents. For this reason, in this paper we propose fine-tuning an MDS baseline with a reward that balances a reference-based metric such as ROUGE with coverage of the input documents. To implement the approach, we utilize RELAX (Grathwohl et al., 2018), a contemporary gradient estimator which is both low-variance and unbiased, and we fine-tune the baseline in a few-shot style for both stability and computational efficiency. Experimental results over the Multi-News and WCEP MDS datasets show significant improvements of up to +0.95 pp average ROUGE score and +3.17 pp METEOR score over the baseline, and competitive results with the literature. In addition, they show that the coverage of the input documents is increased, and evenly across all documents.
LGSep 19, 2022
Traffic incident duration prediction via a deep learning framework for text description encodingArtur Grigorev, Adriana-Simona Mihaita, Khaled Saleh et al.
Predicting the traffic incident duration is a hard problem to solve due to the stochastic nature of incident occurrence in space and time, a lack of information at the beginning of a reported traffic disruption, and lack of advanced methods in transport engineering to derive insights from past accidents. This paper proposes a new fusion framework for predicting the incident duration from limited information by using an integration of machine learning with traffic flow/speed and incident description as features, encoded via several Deep Learning methods (ANN autoencoder and character-level LSTM-ANN sentiment classifier). The paper constructs a cross-disciplinary modelling approach in transport and data science. The approach improves the incident duration prediction accuracy over the top-performing ML models applied to baseline incident reports. Results show that our proposed method can improve the accuracy by $60\%$ when compared to standard linear or support vector regression models, and a further $7\%$ improvement with respect to the hybrid deep learning auto-encoded GBDT model which seems to outperform all other models. The application area is the city of San Francisco, rich in both traffic incident logs (Countrywide Traffic Accident Data set) and past historical traffic congestion information (5-minute precision measurements from Caltrans Performance Measurement System).
CLMay 24
DTO: a Differentiable Training Objective for Effective Counterfactual Story RewritingAmelia Girard, Massimo Piccardi
Counterfactual story rewriting is a natural language processing task that requires updating an existing story to reflect a chosen alternative event, yet preserving all the unaffected storyline elements and overall coherence. While large language models have recently made remarkable progress on this task, it still remains challenging since the required modifications are typically very small in size and highly localized. As a consequence, models trained in a conventional manner with the maximum-likelihood training objective tend to overlook these nuances. At the same time, more sophisticated training approaches based on reinforcement learning are notoriously slow and difficult to set up. For these reasons, our paper proposes a novel, differentiable training objective (DTO) that directly optimizes for the requisite counterfactual improvements. In our approach, a transformer model is fine-tuned via end-to-end backpropagation against a fully differentiable loss function that jointly rewards (i) fidelity to the reference rewrite and (ii) semantic consistency with the source narrative. The empirical evaluation on the TimeTravel and ART datasets shows that the proposed DTO approach has been able to surpass a maximum-likelihood baseline and a preference-based approach, and perform competitively against two contemporary large language models in all evaluation metrics. These findings substantiate the effectiveness of task-specific differentiable objectives for nuanced, controlled text-generation tasks.
NIMay 18
Enhancing Network Resilience via Graph-Based Anomaly Detection in Sovereign FunctionsXin Hao, Wei Ni, Chenhan Zhang et al.
Sovereign network functions, e.g., routing protocols, are becoming increasingly complex and susceptible to failures arising from protocol configuration anomalies and anomalous configurations. This paper interprets the protocol configuration anomaly detection problem as detection of structural inconsistencies of connected nodes and edges in a bipartite graph that captures both physical network entities and logical protocol states. This graph structural inconsistency detector (GSID) model is proposed to solve the problem efficiently. To handle the heterogeneous nature of protocol configuration parameters, GSID employs an adaptive configuration encoder (ACE) that dynamically selects encoding strategies per parameter to preserve fine-grained numerical discrepancies. To expose the subtle inconsistencies of connected nodes and edges in the bipartite graph, GSID uses an inconsistency dynamic attention (IDA) mechanism that scores edges by drawing asymmetric attentions from both ends, rule compliance from one end and route connectivity from the other. It is demonstrated experimentally that GSID outperforms state-of-the-art baselines by threefold in F1 score and by 23.2% in accuracy. Ablation studies validate the effectiveness of both the ACE and IDA modules. Tests on unseen network scales and real-world network topologies show the superior adaptability of our GSID, compared to the baselines.
NIMay 19
Sample-Efficient Misconfiguration Classification for Network Resilience in Wireless CommunicationsXin Hao, Chenhan Zhang, Massimo Piccardi et al.
As modern wireless communication networks grow increasingly complex, network outages driven by the inconsistency between dynamic topologies and protocol configurations have become a critical concern. To solve this issue, we mathematically formulate a protocol misconfiguration classification problem as a graph-based learning task and solve it with our proposed EtaGATv2 algorithm, an edge-type-aware graph attention network with dynamic attention. EtaGATv2 addresses two critical challenges: i) it captures non-uniform symptom propagation for protocol misconfiguration classification tasks, where certain network paths and nodes become critical for diagnosis, and ii) it extracts protocol-specific features from heterogeneous routing protocols with distinct message-passing behaviors by utilizing edge-type-aware transformations. Experiments across diverse and real-world topologies demonstrate that EtaGATv2 reaches state-of-the-art performance with 50% of the training samples, making it particularly suitable for networks with dynamic topologies and limited negative-labeled data.
CLFeb 13
ViMedCSS: A Vietnamese Medical Code-Switching Speech Dataset & BenchmarkTung X. Nguyen, Nhu Vo, Giang-Son Nguyen et al.
Code-switching (CS), which is when Vietnamese speech uses English words like drug names or procedures, is a common phenomenon in Vietnamese medical communication. This creates challenges for Automatic Speech Recognition (ASR) systems, especially in low-resource languages like Vietnamese. Current most ASR systems struggle to recognize correctly English medical terms within Vietnamese sentences, and no benchmark addresses this challenge. In this paper, we construct a 34-hour \textbf{Vi}etnamese \textbf{Med}ical \textbf{C}ode-\textbf{S}witching \textbf{S}peech dataset (ViMedCSS) containing 16,576 utterances. Each utterance includes at least one English medical term drawn from a curated bilingual lexicon covering five medical topics. Using this dataset, we evaluate several state-of-the-art ASR models and examine different specific fine-tuning strategies for improving medical term recognition to investigate the best approach to solve in the dataset. Experimental results show that Vietnamese-optimized models perform better on general segments, while multilingual pretraining helps capture English insertions. The combination of both approaches yields the best balance between overall and code-switched accuracy. This work provides the first benchmark for Vietnamese medical code-switching and offers insights into effective domain adaptation for low-resource, multilingual ASR systems.
CLApr 26Code
Pref-CTRL: Preference Driven LLM Alignment using Representation EditingImranul Ashrafi, Inigo Jauregi Unanue, Massimo Piccardi
Test-time alignment methods offer a promising alternative to fine-tuning by steering the outputs of large language models (LLMs) at inference time with lightweight interventions on their internal representations. Recently, a prominent and effective approach, RE-Control (Kong et al., 2024), has proposed leveraging an external value function trained over the LLM's hidden states to guide generation via gradient-based editing. While effective, this method overlooks a key characteristic of alignment tasks, i.e. that they are typically formulated as learning from human preferences between candidate responses. To address this, in this paper we propose a novel preference-based training framework, Pref-CTRL, that uses a multi-objective value function to better reflect the structure of preference data. Our approach has outperformed RE-Control on two benchmark datasets and showed greater generalization on out-of-domain datasets. Our source code is available at https://github.com/UTS-nlPUG/pref-ctrl.
CLMar 28, 2024
Improving Vietnamese-English Medical Machine TranslationNhu Vo, Dat Quoc Nguyen, Dung D. Le et al.
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.
CLMar 20, 2024
SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual SummarizationJacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi
Cross-lingual summarization (XLS) generates summaries in a language different from that of the input documents (e.g., English to Spanish), allowing speakers of the target language to gain a concise view of their content. In the present day, the predominant approach to this task is to take a performing, pretrained multilingual language model (LM) and fine-tune it for XLS on the language pairs of interest. However, the scarcity of fine-tuning samples makes this approach challenging in some cases. For this reason, in this paper we propose revisiting the summarize-and-translate pipeline, where the summarization and translation tasks are performed in a sequence. This approach allows reusing the many, publicly-available resources for monolingual summarization and translation, obtaining a very competitive zero-shot performance. In addition, the proposed pipeline is completely differentiable end-to-end, allowing it to take advantage of few-shot fine-tuning, where available. Experiments over two contemporary and widely adopted XLS datasets (CrossSum and WikiLingua) have shown the remarkable zero-shot performance of the proposed approach, and also its strong few-shot performance compared to an equivalent multilingual LM baseline, that the proposed approach has been able to outperform in many languages with only 10% of the fine-tuning samples.
CLMay 20, 2024
A Constraint-Enforcing Reward for Adversarial Attacks on Text ClassifiersTom Roth, Inigo Jauregi Unanue, Alsharif Abuadbba et al.
Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial examples is to define and solve a combinatorial optimisation problem over a space of allowable transformations. While effective, this approach is slow and limited by the choice of transformations. An alternate approach is to directly generate adversarial examples by fine-tuning a pre-trained language model, as is commonly done for other text-to-text tasks. This approach promises to be much quicker and more expressive, but is relatively unexplored. For this reason, in this work we train an encoder-decoder paraphrase model to generate a diverse range of adversarial examples. For training, we adopt a reinforcement learning algorithm and propose a constraint-enforcing reward that promotes the generation of valid adversarial examples. Experimental results over two text classification datasets show that our model has achieved a higher success rate than the original paraphrase model, and overall has proved more effective than other competitive attacks. Finally, we show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
CLSep 19, 2025
Multilingual LLM Prompting Strategies for Medical English-Vietnamese Machine TranslationNhu Vo, Nu-Uyen-Phuong Le, Dung D. Le et al.
Medical English-Vietnamese machine translation (En-Vi MT) is essential for healthcare access and communication in Vietnam, yet Vietnamese remains a low-resource and under-studied language. We systematically evaluate prompting strategies for six multilingual LLMs (0.5B-9B parameters) on the MedEV dataset, comparing zero-shot, few-shot, and dictionary-augmented prompting with Meddict, an English-Vietnamese medical lexicon. Results show that model scale is the primary driver of performance: larger LLMs achieve strong zero-shot results, while few-shot prompting yields only marginal improvements. In contrast, terminology-aware cues and embedding-based example retrieval consistently improve domain-specific translation. These findings underscore both the promise and the current limitations of multilingual LLMs for medical En-Vi MT.
CLJan 16, 2024
A Generative Adversarial Attack for Multilingual Text ClassifiersTom Roth, Inigo Jauregi Unanue, Alsharif Abuadbba et al.
Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is monolingual and cannot be used to target multilingual victim models, a significant limitation given the increased use of these models. For this reason, in this work we propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective so that it becomes able to generate effective adversarial examples against multilingual classifiers. The training objective incorporates a set of pre-trained models to ensure text quality and language consistency of the generated text. In addition, all the models are suitably connected to the generator by vocabulary-mapping matrices, allowing for full end-to-end differentiability of the overall training pipeline. The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach compared to existing baselines, particularly in terms of query efficiency. We also provide a detailed analysis of the generated attacks and discuss limitations and opportunities for future research.
CLJun 8, 2021
RewardsOfSum: Exploring Reinforcement Learning Rewards for SummarisationJacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi
To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.
CLJun 4, 2021
BERTTune: Fine-Tuning Neural Machine Translation with BERTScoreInigo Jauregi Unanue, Jacob Parnell, Massimo Piccardi
Neural machine translation models are often biased toward the limited translation references seen during training. To amend this form of overfitting, in this paper we propose fine-tuning the models with a novel training objective based on the recently-proposed BERTScore evaluation metric. BERTScore is a scoring function based on contextual embeddings that overcomes the typical limitations of n-gram-based metrics (e.g. synonyms, paraphrases), allowing translations that are different from the references, yet close in the contextual embedding space, to be treated as substantially correct. To be able to use BERTScore as a training objective, we propose three approaches for generating soft predictions, allowing the network to remain completely differentiable end-to-end. Experiments carried out over four, diverse language pairs have achieved improvements of up to 0.58 pp (3.28%) in BLEU score and up to 0.76 pp (0.98%) in BERTScore (F_BERT) when fine-tuning a strong baseline.
CLOct 8, 2020
Leveraging Discourse Rewards for Document-Level Neural Machine TranslationInigo Jauregi Unanue, Nazanin Esmaili, Gholamreza Haffari et al.
Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion (LC) and coherence (COH), by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F_BERT.
CLJul 8, 2020
Learning Neural Textual Representations for Citation RecommendationBinh Thanh Kieu, Inigo Jauregi Unanue, Son Bao Pham et al.
With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming. While several approaches for automated citation recommendation have been proposed in the recent years, effective document representations for citation recommendation are still elusive to a large extent. For this reason, in this paper we propose a novel approach to citation recommendation which leverages a deep sequential representation of the documents (Sentence-BERT) cascaded with Siamese and triplet networks in a submodular scoring function. To the best of our knowledge, this is the first approach to combine deep representations and submodular selection for a task of citation recommendation. Experiments have been carried out using a popular benchmark dataset - the ACL Anthology Network corpus - and evaluated against baselines and a state-of-the-art approach using metrics such as the MRR and F1-at-k score. The results show that the proposed approach has been able to outperform all the compared approaches in every measured metric.
CLSep 30, 2019
Regressing Word and Sentence Embeddings for Regularization of Neural Machine TranslationInigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi
In recent years, neural machine translation (NMT) has become the dominant approach in automated translation. However, like many other deep learning approaches, NMT suffers from overfitting when the amount of training data is limited. This is a serious issue for low-resource language pairs and many specialized translation domains that are inherently limited in the amount of available supervised data. For this reason, in this paper we propose regressing word (ReWE) and sentence (ReSE) embeddings at training time as a way to regularize NMT models and improve their generalization. During training, our models are trained to jointly predict categorical (words in the vocabulary) and continuous (word and sentence embeddings) outputs. An extensive set of experiments over four language pairs of variable training set size has showed that ReWE and ReSE can outperform strong state-of-the-art baseline models, with an improvement that is larger for smaller training sets (e.g., up to +5:15 BLEU points in Basque-English translation). Visualizations of the decoder's output space show that the proposed regularizers improve the clustering of unique words, facilitating correct predictions. In a final experiment on unsupervised NMT, we show that ReWE and ReSE are also able to improve the quality of machine translation when no parallel data are available.
CLApr 4, 2019
ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation SystemsInigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili et al.
Regularization of neural machine translation is still a significant problem, especially in low-resource settings. To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value). Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences. Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.54 BLEU points, and also a marked improvement over a state-of-the-art system.
CLJul 1, 2018
A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing SystemsInigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi
Automatic post-editing (APE) systems aim to correct the systematic errors made by machine translators. In this paper, we propose a neural APE system that encodes the source (src) and machine translated (mt) sentences with two separate encoders, but leverages a shared attention mechanism to better understand how the two inputs contribute to the generation of the post-edited (pe) sentences. Our empirical observations have showed that when the mt is incorrect, the attention shifts weight toward tokens in the src sentence to properly edit the incorrect translation. The model has been trained and evaluated on the official data from the WMT16 and WMT17 APE IT domain English-German shared tasks. Additionally, we have used the extra 500K artificial data provided by the shared task. Our system has been able to reproduce the accuracies of systems trained with the same data, while at the same time providing better interpretability.
CLJun 29, 2017
Recurrent neural networks with specialized word embeddings for health-domain named-entity recognitionInigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi
Background. Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text "feature engineering" and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word "embeddings". Objectives. (i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets. Methods. Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models. Results. We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DDI-DrugBank and DDI-MedLine, but not in the 2010 i2b2/VA IRB Revision dataset. Conclusion. We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary.
MLNov 25, 2016
Bidirectional LSTM-CRF for Clinical Concept ExtractionRaghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Automated extraction of concepts from patient clinical records is an essential facilitator of clinical research. For this reason, the 2010 i2b2/VA Natural Language Processing Challenges for Clinical Records introduced a concept extraction task aimed at identifying and classifying concepts into predefined categories (i.e., treatments, tests and problems). State-of-the-art concept extraction approaches heavily rely on handcrafted features and domain-specific resources which are hard to collect and define. For this reason, this paper proposes an alternative, streamlined approach: a recurrent neural network (the bidirectional LSTM with CRF decoding) initialized with general-purpose, off-the-shelf word embeddings. The experimental results achieved on the 2010 i2b2/VA reference corpora using the proposed framework outperform all recent methods and ranks closely to the best submission from the original 2010 i2b2/VA challenge.
CLOct 19, 2016
Bidirectional LSTM-CRF for Clinical Concept ExtractionRaghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Extraction of concepts present in patient clinical records is an essential step in clinical research. The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for clinical records presented concept extraction (CE) task, with aim to identify concepts (such as treatments, tests, problems) and classify them into predefined categories. State-of-the-art CE approaches heavily rely on hand crafted features and domain specific resources which are hard to collect and tune. For this reason, this paper employs bidirectional LSTM with CRF decoding initialized with general purpose off-the-shelf word embeddings for CE. The experimental results achieved on 2010 i2b2/VA reference standard corpora using bidirectional LSTM CRF ranks closely with top ranked systems.
CLSep 24, 2016
An Investigation of Recurrent Neural Architectures for Drug Name RecognitionRaghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Drug name recognition (DNR) is an essential step in the Pharmacovigilance (PV) pipeline. DNR aims to find drug name mentions in unstructured biomedical texts and classify them into predefined categories. State-of-the-art DNR approaches heavily rely on hand crafted features and domain specific resources which are difficult to collect and tune. For this reason, this paper investigates the effectiveness of contemporary recurrent neural architectures - the Elman and Jordan networks and the bidirectional LSTM with CRF decoding - at performing DNR straight from the text. The experimental results achieved on the authoritative SemEval-2013 Task 9.1 benchmarks show that the bidirectional LSTM-CRF ranks closely to highly-dedicated, hand-crafted systems.
CVJul 30, 2015
Action recognition in still images by latent superpixel classificationShaukat Abidi, Massimo Piccardi, Mary-Anne Williams
Action recognition from still images is an important task of computer vision applications such as image annotation, robotic navigation, video surveillance and several others. Existing approaches mainly rely on either bag-of-feature representations or articulated body-part models. However, the relationship between the action and the image segments is still substantially unexplored. For this reason, in this paper we propose to approach action recognition by leveraging an intermediate layer of "superpixels" whose latent classes can act as attributes of the action. In the proposed approach, the action class is predicted by a structural model(learnt by Latent Structural SVM) based on measurements from the image superpixels and their latent classes. Experimental results over the challenging Stanford 40 Actions dataset report a significant average accuracy of 74.06% for the positive class and 88.50% for the negative class, giving evidence to the performance of the proposed approach.
MLMar 10, 2015
An Adaptive Online HDP-HMM for Segmentation and Classification of Sequential DataAva Bargi, Richard Yi Da Xu, Massimo Piccardi
In the recent years, the desire and need to understand sequential data has been increasing, with particular interest in sequential contexts such as patient monitoring, understanding daily activities, video surveillance, stock market and the like. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly, without being limited to a rigid number of classes. In addition, the model needs to be capable of updating its parameters to comply with possible evolutions. This interesting problem, however, is not adequately addressed in the literature since many studies focus on offline classification over a pre-defined class set. In this paper, we propose a principled solution to this gap by introducing an adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the sequential data over unlimited number of classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by introducing a learning rate, responsible for balancing the extent to which the model sustains its previous learning (parameters) or adapts to the new streaming observations. Experimental results on several variants of stationary and evolving synthetic data and two video datasets, TUM Assistive Kitchen and collatedWeizmann, show remarkable performance in segmentation and classification, particularly for evolutionary sequences with changing distributions and/or containing new, unseen classes.
MLJul 2, 2013
A non-parametric conditional factor regression model for high-dimensional input and responseAva Bargi, Richard Yi Da Xu, Massimo Piccardi
In this paper, we propose a non-parametric conditional factor regression (NCFR)model for domains with high-dimensional input and response. NCFR enhances linear regression in two ways: a) introducing low-dimensional latent factors leading to dimensionality reduction and b) integrating an Indian Buffet Process as a prior for the latent factors to derive unlimited sparse dimensions. Experimental results comparing NCRF to several alternatives give evidence to remarkable prediction performance.