LGOct 30, 2025
MPRU: Modular Projection-Redistribution Unlearning as Output Filter for Classification PipelinesMinyi Peng, Darian Gunamardi, Ivan Tjuawinata et al.
As a new and promising approach, existing machine unlearning (MU) works typically emphasize theoretical formulations or optimization objectives to achieve knowledge removal. However, when deployed in real-world scenarios, such solutions typically face scalability issues and have to address practical requirements such as full access to original datasets and model. In contrast to the existing approaches, we regard classification training as a sequential process where classes are learned sequentially, which we call \emph{inductive approach}. Unlearning can then be done by reversing the last training sequence. This is implemented by appending a projection-redistribution layer in the end of the model. Such an approach does not require full access to the original dataset or the model, addressing the challenges of existing methods. This enables modular and model-agnostic deployment as an output filter into existing classification pipelines with minimal alterations. We conducted multiple experiments across multiple datasets including image (CIFAR-10/100 using CNN-based model) and tabular datasets (Covertype using tree-based model). Experiment results show consistently similar output to a fully retrained model with a high computational cost reduction. This demonstrates the applicability, scalability, and system compatibility of our solution while maintaining the performance of the output in a more practical setting.
41.7LGApr 9
A Systematic Framework for Tabular Data DisentanglementIvan Tjuawinata, Andre Gunawan, Anh Quan Tran et al.
Tabular data, widely used in various applications such as industrial control systems, finance, and supply chain, often contains complex interrelationships among its attributes. Data disentanglement seeks to transform such data into latent variables with reduced interdependencies, facilitating more effective and efficient processing. Despite the extensive studies on data disentanglement over image, text, or audio data, tabular data disentanglement may require further investigation due to the more intricate attribute interactions typically found in tabular data. Moreover, due to the highly complex interrelationships, direct translation from other data domains results in suboptimal data disentanglement. Existing tabular data disentanglement methods, such as factor analysis, CT-GAN, and VAE face limitations including scalability issues, mode collapse, and poor extrapolation. In this paper, we propose the use of a framework to provide a systematic view on tabular data disentanglement that modularizes the process into four core components: data extraction, data modeling, model analysis, and latent representation extrapolation. We believe this work provides a deeper understanding of tabular data disentanglement and existing methods, and lays the foundation for potential future research in developing robust, efficient, and scalable data disentanglement techniques. Finally, we demonstrate the framework's applicability through a case study on synthetic tabular data generation, showcasing its potential in the particular downstream task of data synthesis.
CRApr 13, 2021
Fair and Differentially Private Distributed Frequency EstimationMengmeng Yang, Ivan Tjuawinata, Kwok-Yan Lam et al.
In order to remain competitive, Internet companies collect and analyse user data for the purpose of improving user experiences. Frequency estimation is a widely used statistical tool which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods based on differential privacy have been proposed, which either require a large user base or a trusted server; hence may give big companies an unfair advantage while handicapping smaller organizations in their growth opportunity. To address this issue, this paper proposes a fair privacy-preserving sampling-based frequency estimation method and provides a relation between its privacy guarantee, output accuracy, and number of participants. We designed decentralized privacy-preserving aggregation mechanisms using multi-party computation technique and established that, for a limited number of participants and a fixed privacy level, our mechanisms perform better than those that are based on traditional perturbation methods; hence, provide smaller companies a fair growth opportunity. We further propose an architectural model to support weighted aggregation in order to achieve higher accuracy estimate to cater for users with different privacy requirements. Compared to the unweighted aggregation, our method provides a more accurate estimate. Extensive experiments are conducted to show the effectiveness of the proposed methods.
CRJan 4, 2021
Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor ComputationJenn-Bing Ong, Wee-Keong Ng, Ivan Tjuawinata et al.
Data privacy is an important issue for organizations and enterprises to securely outsource data storage, sharing, and computation on clouds / fogs. However, data encryption is complicated in terms of the key management and distribution; existing secure computation techniques are expensive in terms of computational / communication cost and therefore do not scale to big data computation. Tensor network decomposition and distributed tensor computation have been widely used in signal processing and machine learning for dimensionality reduction and large-scale optimization. However, the potential of distributed tensor networks for big data privacy preservation have not been considered before, this motivates the current study. Our primary intuition is that tensor network representations are mathematically non-unique, unlinkable, and uninterpretable; tensor network representations naturally support a range of multilinear operations for compressed and distributed / dispersed computation. Therefore, we propose randomized algorithms to decompose big data into randomized tensor network representations and analyze the privacy leakage for 1D to 3D data tensors. The randomness mainly comes from the complex structural information commonly found in big data; randomization is based on controlled perturbation applied to the tensor blocks prior to decomposition. The distributed tensor representations are dispersed on multiple clouds / fogs or servers / devices with metadata privacy, this provides both distributed trust and management to seamlessly secure big data storage, communication, sharing, and computation. Experiments show that the proposed randomization techniques are helpful for big data anonymization and efficient for big data storage and computation.
CRDec 26, 2020
Secure Hot Path Crowdsourcing with Local Differential Privacy under Fog Computing ArchitectureMengmeng Yang, Ivan Tjuawinata, Kwok Yan Lam et al.
Crowdsourcing plays an essential role in the Internet of Things (IoT) for data collection, where a group of workers is equipped with Internet-connected geolocated devices to collect sensor data for marketing or research purpose. In this paper, we consider crowdsourcing these worker's hot travel path. Each worker is required to report his real-time location information, which is sensitive and has to be protected. Encryption-based methods are the most direct way to protect the location, but not suitable for resource-limited devices. Besides, local differential privacy is a strong privacy concept and has been deployed in many software systems. However, the local differential privacy technology needs a large number of participants to ensure the accuracy of the estimation, which is not always the case for crowdsourcing. To solve this problem, we proposed a trie-based iterative statistic method, which combines additive secret sharing and local differential privacy technologies. The proposed method has excellent performance even with a limited number of participants without the need of complex computation. Specifically, the proposed method contains three main components: iterative statistics, adaptive sampling, and secure reporting. We theoretically analyze the effectiveness of the proposed method and perform extensive experiments to show that the proposed method not only provides a strict privacy guarantee, but also significantly improves the performance from the previous existing solutions.
CRJul 24, 2020
MPC-enabled Privacy-Preserving Neural Network Training against Malicious AttackZiyao Liu, Ivan Tjuawinata, Chaoping Xing et al.
The application of secure multiparty computation (MPC) in machine learning, especially privacy-preserving neural network training, has attracted tremendous attention from the research community in recent years. MPC enables several data owners to jointly train a neural network while preserving the data privacy of each participant. However, most of the previous works focus on semi-honest threat model that cannot withstand fraudulent messages sent by malicious participants. In this paper, we propose an approach for constructing efficient $n$-party protocols for secure neural network training that can provide security for all honest participants even when a majority of the parties are malicious. Compared to the other designs that provide semi-honest security in a dishonest majority setting, our actively secure neural network training incurs affordable efficiency overheads of around 2X and 2.7X in LAN and WAN settings, respectively. Besides, we propose a scheme to allow additive shares defined over an integer ring $\mathbb{Z}_N$ to be securely converted to additive shares over a finite field $\mathbb{Z}_Q$, which may be of independent interest. Such conversion scheme is essential in securely and correctly converting shared Beaver triples defined over an integer ring generated in the preprocessing phase to triples defined over a field to be used in the calculation in the online phase.