Chi-Hung Chi

CR
h-index14
6papers
33citations
Novelty32%
AI Score24

6 Papers

LGApr 6, 2023
SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets

Borui Cai, Guangyan Huang, Shuiqiao Yang et al.

Shapelets that discriminate time series using local features (subsequences) are promising for time series clustering. Existing time series clustering methods may fail to capture representative shapelets because they discover shapelets from a large pool of uninformative subsequences, and thus result in low clustering accuracy. This paper proposes a Semi-supervised Clustering of Time Series Using Representative Shapelets (SE-Shapelets) method, which utilizes a small number of labeled and propagated pseudo-labeled time series to help discover representative shapelets, thereby improving the clustering accuracy. In SE-Shapelets, we propose two techniques to discover representative shapelets for the effective clustering of time series. 1) A \textit{salient subsequence chain} ($SSC$) that can extract salient subsequences (as candidate shapelets) of a labeled/pseudo-labeled time series, which helps remove massive uninformative subsequences from the pool. 2) A \textit{linear discriminant selection} ($LDS$) algorithm to identify shapelets that can capture representative local features of time series in different classes, for convenient clustering. Experiments on UCR time series datasets demonstrate that SE-shapelets discovers representative shapelets and achieves higher clustering accuracy than counterpart semi-supervised time series clustering methods.

CRNov 4, 2024
Tabular Data Synthesis with Differential Privacy: A Survey

Mengmeng Yang, Chi-Hung Chi, Kwok-Yan Lam et al.

Data sharing is a prerequisite for collaborative innovation, enabling organizations to leverage diverse datasets for deeper insights. In real-world applications like FinTech and Smart Manufacturing, transactional data, often in tabular form, are generated and analyzed for insight generation. However, such datasets typically contain sensitive personal/business information, raising privacy concerns and regulatory risks. Data synthesis tackles this by generating artificial datasets that preserve the statistical characteristics of real data, removing direct links to individuals. However, attackers can still infer sensitive information using background knowledge. Differential privacy offers a solution by providing provable and quantifiable privacy protection. Consequently, differentially private data synthesis has emerged as a promising approach to privacy-aware data sharing. This paper provides a comprehensive overview of existing differentially private tabular data synthesis methods, highlighting the unique challenges of each generation model for generating tabular data under differential privacy constraints. We classify the methods into statistical and deep learning-based approaches based on their generation models, discussing them in both centralized and distributed environments. We evaluate and compare those methods within each category, highlighting their strengths and weaknesses in terms of utility, privacy, and computational complexity. Additionally, we present and discuss various evaluation methods for assessing the quality of the synthesized data, identify research gaps in the field and directions for future research.

CVMar 31, 2024
Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining

Jingyu Wang, Niantai Jing, Ziyao Liu et al.

In copy-move tampering operations, perpetrators often employ techniques, such as blurring, to conceal tampering traces, posing significant challenges to the detection of object-level targets with intact structures. Focus on these challenges, this paper proposes an Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining (IMNet). To obtain complete object-level targets, we customize prototypes for both the source and tampered regions and dynamically update them. Additionally, we extract inconsistent regions between coarse similar regions obtained through self-correlation calculations and regions composed of prototypes. The detected inconsistent regions are used as supplements to coarse similar regions to refine pixel-level detection. We operate experiments on three public datasets which validate the effectiveness and the robustness of the proposed IMNet.

CRFeb 21, 2022
Analysis of Digital Sovereignty and Identity: From Digitization to Digitalization

Kheng-Leong Tan, Chi-Hung Chi, Kwok-Yan Lam

Advances in emerging technologies have accelerated digital transformation with the pervasive digitalization of the economy and society, driving innovations such as smart cities, industry 4.0 and FinTech. Unlike digitization, digitalization is a transformation to improve processes by leveraging digital technologies and digitized data. The cyberspace has evolved from a hardware internetworking infrastructure to the notion of a virtual environment, transforming how people, business and government interact and operate. Through this transformation, lots of personal data are captured which individuals have no ownership or control over, threatening their privacy. It is therefore necessary for the data owners to have control over the ownership, custody and utilization of their data and to protect one's digital assets and identity through proper data governance, cybersecurity control and privacy protection. This results in the notions of data sovereignty and digital sovereignty - two conceptually related terms, but different focuses. This paper first explains these two concepts in terms of their guiding principles, laws and regulations requirements, and analyse and discuss the technical challenges of implementing these requirements. Next, to understand the emerging trend shift in digital sovereignty towards individuals to take complete control of the security and privacy of their own digital assets, this paper conducts a systematic study and analysis of Self-Sovereign Identity, and discuss existing solutions and point out that an efficient key management system, scalability and interoperability of the solutions and well established standards are some of its challenges and open problems to wide deployments.

SEApr 19, 2016
A Model-based Approach for Effective Service Delivery

Feng-Lin Li, Chi-Hung Chi

With the prevalence of X-as-a-Service (e.g., software as a service, platform as a service, infrastructure as a service, etc.) and users' growing demand on good services, QoS (Quality of Service) assurance is becoming increasingly important to service delivery. Traditional service delivery mainly focuses on function or information provisioning, and does not give high priority to quality assurance. In this paper, we tackle the QoS assurance problem in a systematic way, from model to system. We first decompose traditional services into three components - namely software application, data and resource, then define models for these three kinds of basic services, and propose a set of operations for service publishing and composition. To illustrate our approach, we present a prototype system, the Platform as a Service (PaaS) system, which is developed in support of our framework and shows how QoS can be ensured through real-time monitoring and dynamic scaling (up or down).

SEApr 12, 2016
Service Intelligence Oriented Distributed Data Stream Integration

Feng-Lin Li, Chi-Hung Chi, Yue Wang et al.

Software as a service (SaaS) has recently enjoyed much attention as it makes the use of software more convenient and cost-effective. At the same time, the arising of users' expectation for high quality service such as real-time information or functionality provisioning brings about new challenges: to satisfy such (near) real-time requirements, real-time monitoring and effective processing of streaming data is necessary. However, due to the composition structure and multi-instance property of services, service data streams are often distributed, hard to synchronize and aggregate. We tackles these challenges by (1) proposing systematic associate strategies for relating distributed data; (2) introducing a new small window array mechanism for aggregating distributed data; (3) setting window parameters based on the cumulative distribution function (CDF) method; and (4) modeling streaming operators with queuing models for performance evaluation and prediction. Experiments show that our approach has good accuracy, completeness and acceptable performance measurement in processing distributed service data streams.