Mengyi Wei

IT
h-index2
5papers
56citations
Novelty37%
AI Score39

5 Papers

AIJun 15, 2022
AI Ethics Issues in Real World: Evidence from AI Incident Database

Mengyi Wei, Zhixuan Zhou

With the powerful performance of Artificial Intelligence (AI) also comes prevalent ethical issues. Though governments and corporations have curated multiple AI ethics guidelines to curb unethical behavior of AI, the effect has been limited, probably due to the vagueness of the guidelines. In this paper, we take a closer look at how AI ethics issues take place in real world, in order to have a more in-depth and nuanced understanding of different ethical issues as well as their social impact. With a content analysis of AI Incident Database, which is an effort to prevent repeated real world AI failures by cataloging incidents, we identified 13 application areas which often see unethical use of AI, with intelligent service robots, language/vision models and autonomous driving taking the lead. Ethical issues appear in 8 different forms, from inappropriate use and racial discrimination, to physical safety and unfair algorithm. With this taxonomy of AI ethics issues, we aim to provide AI practitioners with a practical guideline when trying to deploy AI applications ethically.

67.8CYApr 24
Inclusive Learning Analytics with Embedded Data Comics: A Conceptual Framework for Public Understanding of AI Ethics

Mengyi Wei, Chenyu Zuo, Dongsheng Chen et al.

Public awareness of AI ethics plays a crucial role in fostering the responsible and sustainable development of AI technology. However, finding effective ways to promote public understanding of the ethical risks of AI remains a challenge. Given the complexity of AI ethical issues and the cognitive limitations of the public, this review paper proposes a conceptual framework for inclusive learning analytics with embedded data comics. Data comics help transform complex and abstract AI ethics cases into compelling and relatable stories, fostering public empathy and introspection. More importantly, inclusive learning analytics targets not only people of different demographic attributes, but also different mindsets with inherent cognitive biases. By providing equal and easily accessible channels for AI ethics issues, we aim to encourage the public to reflect on AI ethics incidents from multiple perspectives and develop the habit of continuous learning to adapt to evolving AI technologies and ethical risks.

ITJul 10, 2024
Disturbance-based Discretization, Differentiable IDS Channel, and an IDS-Correcting Code for DNA-based Storage

Alan J. X. Guo, Mengyi Wei, Yufan Dai et al.

With recent advancements in next-generation data storage, especially in biological molecule-based storage, insertion, deletion, and substitution (IDS) error-correcting codes have garnered increased attention. However, a universal method for designing tailored IDS-correcting codes across varying channel settings remains underexplored. We present an autoencoder-based approach, THEA-code, aimed at efficiently generating IDS-correcting codes for complex IDS channels. In the work, a disturbance-based discretization is proposed to discretize the features of the autoencoder, and a simulated differentiable IDS channel is developed as a differentiable alternative for IDS operations. These innovations facilitate the successful convergence of the autoencoder, producing channel-customized IDS-correcting codes that demonstrate commendable performance across complex IDS channels, particularly in realistic DNA-based storage channels.

LGDec 13, 2023
Levenshtein Distance Embedding with Poisson Regression for DNA Storage

Xiang Wei, Alan J. X. Guo, Sihan Sun et al.

Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches.

ITDec 20, 2023
DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

Alan J. X. Guo, Sihan Sun, Xiang Wei et al.

With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task. Furthermore, current studies on single-IDS-correcting code misalign with the requirements of applications which necessitates the correcting of multiple errors. Compromise solutions have involved shortening codewords to reduce the chance of multiple errors. However, the code rates of existing codes are poor at short lengths, diminishing the overall storage density. In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. While the concept underpinning this approach is straightforward, it bypasses the mathematical challenges typically encountered in code design. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords.