Jiyi Li

h-index15

12papers

920citations

Novelty50%

AI Score40

Ranked #100,655 of 201,326 authors (top 50%)#18,434 in CL (top 57%)

12 Papers

LGSep 28, 2024

A Generalized Model for Multidimensional Intransitivity

Jiuding Duan, Jiyi Li, Yukino Baba et al.

Intransitivity is a critical issue in pairwise preference modeling. It refers to the intransitive pairwise preferences between a group of players or objects that potentially form a cyclic preference chain and has been long discussed in social choice theory in the context of the dominance relationship. However, such multifaceted intransitivity between players and the corresponding player representations in high dimensions is difficult to capture. In this paper, we propose a probabilistic model that jointly learns each player's d-dimensional representation (d>1) and a dataset-specific metric space that systematically captures the distance metric in Rd over the embedding space. Interestingly, by imposing additional constraints in the metric space, our proposed model degenerates to former models used in intransitive representation learning. Moreover, we present an extensive quantitative investigation of the vast existence of intransitive relationships between objects in various real-world benchmark datasets. To our knowledge, this investigation is the first of this type. The predictive performance of our proposed method on different real-world datasets, including social choice, election, and online game datasets, shows that our proposed method outperforms several competing methods in terms of prediction accuracy.

HCFeb 8, 2023

Multiview Representation Learning from Crowdsourced Triplet Comparisons

Xiaotian Lu, Jiyi Li, Koh Takeuchi et al.

Crowdsourcing has been used to collect data at scale in numerous fields. Triplet similarity comparison is a type of crowdsourcing task, in which crowd workers are asked the question ``among three given objects, which two are more similar?'', which is relatively easy for humans to answer. However, the comparison can be sometimes based on multiple views, i.e., different independent attributes such as color and shape. Each view may lead to different results for the same three objects. Although an algorithm was proposed in prior work to produce multiview embeddings, it involves at least two problems: (1) the existing algorithm cannot independently predict multiview embeddings for a new sample, and (2) different people may prefer different views. In this study, we propose an end-to-end inductive deep learning framework to solve the multiview representation learning problem. The results show that our proposed method can obtain multiview embeddings of any object, in which each view corresponds to an independent attribute of the object. We collected two datasets from a crowdsourcing platform to experimentally investigate the performance of our proposed approach compared to conventional baseline methods.

CLMar 15, 2024Code

Enhanced Coherence-Aware Network with Hierarchical Disentanglement for Aspect-Category Sentiment Analysis

Jin Cui, Fumiyo Fukumoto, Xinfeng Wang et al.

Aspect-category-based sentiment analysis (ACSA), which aims to identify aspect categories and predict their sentiments has been intensively studied due to its wide range of NLP applications. Most approaches mainly utilize intrasentential features. However, a review often includes multiple different aspect categories, and some of them do not explicitly appear in the review. Even in a sentence, there is more than one aspect category with its sentiments, and they are entangled intra-sentence, which makes the model fail to discriminately preserve all sentiment characteristics. In this paper, we propose an enhanced coherence-aware network with hierarchical disentanglement (ECAN) for ACSA tasks. Specifically, we explore coherence modeling to capture the contexts across the whole review and to help the implicit aspect and sentiment identification. To address the issue of multiple aspect categories and sentiment entanglement, we propose a hierarchical disentanglement module to extract distinct categories and sentiment features. Extensive experimental and visualization results show that our ECAN effectively decouples multiple categories and sentiments entangled in the coherence representations and achieves state-of-the-art (SOTA) performance. Our codes and data are available online: \url{https://github.com/cuijin-23/ECAN}.

HCMay 17, 2024Code

Evaluating Saliency Explanations in NLP by Crowdsourcing

Xiaotian Lu, Jiyi Li, Zhen Wan et al.

Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution of output, have been proposed to determine the part of the input which a model values most. Despite a considerable body of work on the evaluation of saliency methods, whether the results of various evaluation metrics agree with human cognition remains an open question. In this study, we propose a new human-based method to evaluate saliency methods in NLP by crowdsourcing. We recruited 800 crowd workers and empirically evaluated seven saliency methods on two datasets with the proposed method. We analyzed the performance of saliency methods, compared our results with existing automated evaluation methods, and identified notable differences between NLP and computer vision (CV) fields when using saliency methods. The instance-level data of our crowdsourced experiments and the code to reproduce the explanations are available at https://github.com/xtlu/lreccoling_evaluation.

CLOct 22, 2024

Human-LLM Hybrid Text Answer Aggregation for Crowd Annotations

Jiyi Li

The quality is a crucial issue for crowd annotations. Answer aggregation is an important type of solution. The aggregated answers estimated from multiple crowd answers to the same instance are the eventually collected annotations, rather than the individual crowd answers themselves. Recently, the capability of Large Language Models (LLMs) on data annotation tasks has attracted interest from researchers. Most of the existing studies mainly focus on the average performance of individual crowd workers; several recent works studied the scenarios of aggregation on categorical labels and LLMs used as label creators. However, the scenario of aggregation on text answers and the role of LLMs as aggregators are not yet well-studied. In this paper, we investigate the capability of LLMs as aggregators in the scenario of close-ended crowd text answer aggregation. We propose a human-LLM hybrid text answer aggregation method with a Creator-Aggregator Multi-Stage (CAMS) crowdsourcing framework. We make the experiments based on public crowdsourcing datasets. The results show the effectiveness of our approach based on the collaboration of crowd workers and LLMs.

CLMay 20, 2025

Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales

Jun Cao, Jiyi Li, Ziwei Yang et al.

There has been growing interest in Multimodal Aspect-Based Sentiment Analysis (MABSA) in recent years. Existing methods predominantly rely on pre-trained small language models (SLMs) to collect information related to aspects and sentiments from both image and text, with an aim to align these two modalities. However, small SLMs possess limited capacity and knowledge, often resulting in inaccurate identification of meaning, aspects, sentiments, and their interconnections in textual and visual data. On the other hand, Large language models (LLMs) have shown exceptional capabilities in various tasks by effectively exploring fine-grained information in multimodal data. However, some studies indicate that LLMs still fall short compared to fine-tuned small models in the field of ABSA. Based on these findings, we propose a novel framework, termed LRSA, which combines the decision-making capabilities of SLMs with additional information provided by LLMs for MABSA. Specifically, we inject explanations generated by LLMs as rationales into SLMs and employ a dual cross-attention mechanism for enhancing feature interaction and fusion, thereby augmenting the SLMs' ability to identify aspects and sentiments. We evaluated our method using two baseline models, numerous experiments highlight the superiority of our approach on three widely-used benchmarks, indicating its generalizability and applicability to most pre-trained models for MABSA.

CRJul 31, 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems

Lijia Liu, Takumi Kondo, Kyohei Atarashi et al.

This paper investigates defenses for LLM-based evaluation systems against prompt injection. We formalize a class of threats called blind attacks, where a candidate answer is crafted independently of the true answer to deceive the evaluator. To counter such attacks, we propose a framework that augments Standard Evaluation (SE) with Counterfactual Evaluation (CFE), which re-evaluates the submission against a deliberately false ground-truth answer. An attack is detected if the system validates an answer under both standard and counterfactual conditions. Experiments show that while standard evaluation is highly vulnerable, our SE+CFE framework significantly improves security by boosting attack detection with minimal performance trade-offs.

CLJan 18, 2024

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation

Jiyi Li

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some specific NLP tasks by collecting new datasets. However, on the one hand, existing datasets for the studies of annotation quality in crowdsourcing are not yet utilized in such evaluations, which potentially provide reliable evaluations from a different viewpoint. On the other hand, the quality of these aggregated labels is crucial because, when utilizing crowdsourcing, the estimated labels aggregated from multiple crowd labels to the same instances are the eventually collected labels. Therefore, in this paper, we first investigate which existing crowdsourcing datasets can be used for a comparative study and create a benchmark. We then compare the quality between individual crowd labels and LLM labels and make the evaluations on the aggregated labels. In addition, we propose a Crowd-LLM hybrid label aggregation method and verify the performance. We find that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels of the datasets, which is also higher than the quality of LLM labels themselves.

CLSep 13, 2021

Abstract, Rationale, Stance: A Joint Model for Scientific Claim Verification

Zhiwei Zhang, Jiyi Li, Fumiyo Fukumoto et al.

Scientific claim verification can help the researchers to easily find the target scientific papers with the sentence evidence from a large corpus for the given claim. Some existing works propose pipeline models on the three tasks of abstract retrieval, rationale selection and stance prediction. Such works have the problems of error propagation among the modules in the pipeline and lack of sharing valuable information among modules. We thus propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework by including claim information. In addition, we enhance the information exchanges and constraints among tasks by proposing a regularization term between the sentence attention scores of abstract retrieval and the estimated outputs of rational selection. The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.

LGJun 29, 2021

Learning from Multiple Annotators by Incorporating Instance Features

Jingzheng Li, Hailong Sun, Jiyi Li et al.

Learning from multiple annotators aims to induce a high-quality classifier from training instances, where each of them is associated with a set of possibly noisy labels provided by multiple annotators under the influence of their varying abilities and own biases. In modeling the probability transition process from latent true labels to observed labels, most existing methods adopt class-level confusion matrices of annotators that observed labels do not depend on the instance features, just determined by the true labels. It may limit the performance that the classifier can achieve. In this work, we propose the noise transition matrix, which incorporates the influence of instance features on annotators' performance based on confusion matrices. Furthermore, we propose a simple yet effective learning framework, which consists of a classifier module and a noise transition matrix module in a unified neural network architecture. Experimental results demonstrate the superiority of our method in comparison with state-of-the-art methods.

HCAug 1, 2020

CrowDEA: Multi-view Idea Prioritization with Crowds

Yukino Baba, Jiyi Li, Hisashi Kashima

Given a set of ideas collected from crowds with regard to an open-ended question, how can we organize and prioritize them in order to determine the preferred ones based on preference comparisons by crowd evaluators? As there are diverse latent criteria for the value of an idea, multiple ideas can be considered as "the best". In addition, evaluators can have different preference criteria, and their comparison results often disagree. In this paper, we propose an analysis method for obtaining a subset of ideas, which we call frontier ideas, that are the best in terms of at least one latent evaluation criterion. We propose an approach, called CrowDEA, which estimates the embeddings of the ideas in the multiple-criteria preference space, the best viewpoint for each idea, and preference criterion for each evaluator, to obtain a set of frontier ideas. Experimental results using real datasets containing numerous ideas or designs demonstrate that the proposed approach can effectively prioritize ideas from multiple viewpoints, thereby detecting frontier ideas. The embeddings of ideas learned by the proposed approach provide a visualization that facilitates observation of the frontier ideas. In addition, the proposed approach prioritizes ideas from a wider variety of viewpoints, whereas the baselines tend to use to the same viewpoints; it can also handle various viewpoints and prioritize ideas in situations where only a limited number of evaluators or labels are available.

HCApr 2, 2014

Map Route Ranking with Weighted Distance using Environmental Factors

Jiyi Li

When users search for the routes between two places using map based services, these services compute and provide the top candidate routes based on shortest geometric distances or ideal time consuming. However, other real factors like physical exertion and practical time consuming will influence user experience, and the environmental factors like steep slope and traffic jam that result in these real factors need to be considered. For example, when users travel on foot or by bicycle, if there are many steep slopes on the routes, it will be difficult or easy to be tired. In this paper, we propose an approach computing weighted distance considering these environmental factors. We rank the candidate route results generated by Google Map using elevation information. We integrate the elevation information in the route results to assist users to make decision. The solution can also be used in other scenarios that need to consider environmental factors.