11.5CRMay 15
From Backup Restoration to Minimum Viable Factory Recovery: A Systematization of Ransomware Recovery in Manufacturing SystemsChun Yin Chiu
Ransomware recovery in critical manufacturing infrastructure is not only a backup-restoration problem. Production capability depends on coupled information-technology, operational-technology, physical-process, quality, logistics, identity, and supplier systems. After ransomware, a plant may rebuild servers yet remain unable to schedule work, authenticate operators, trust engineering workstations, release product, reconnect OT assets, or coordinate suppliers. This paper reframes manufacturing ransomware recovery as a critical-infrastructure continuity and interdependency problem. We conduct a PRISMA-guided multivocal review of academic literature, standards and government guidance, threat frameworks, public incident material, and verified full-text/source-page evidence anchors. The review identifies nine evidence-backed recovery failure modes: dependency blindness, untrusted restore point and backup over-trust, identity trust collapse, lack of proof-of-recovery, unsafe OT reconnection, segmentation assumption failure, capability mismatch, unmanaged degraded operation, and supplier dependency failure. We then introduce Minimum Viable Factory Recovery (MVF Recovery): the smallest safe, trusted, and operationally meaningful production capability that can be resumed under current dependency, evidence, identity, data, network, OT, and supplier constraints. MVF Recovery is an analytical objective rather than a claim of full recovery, implementation, or safety certification. The paper derives a recovery lifecycle and benchmarking directions as secondary outputs. The contribution is an evidence-calibrated foundation for capability-centric ransomware recovery in critical manufacturing infrastructure.
4.3CRMay 7
AoI-Guided Client Selection for Robust and Timely Federated Intrusion Detection in Cloud-Edge Security AnalyticsChun Yin Chiu
Federated learning (FL) is attractive for cloud-edge intrusion detection because it enables collaborative training over distributed telemetry without centralizing raw logs. In production security analytics pipelines, however, only a subset of clients participates in each round, and heterogeneous bandwidth, stragglers, and dropouts can cause the server to rely on stale client information. This paper studies client participation as a timeliness-aware systems problem using Age of Information (AoI). We compare three lightweight policies for federated intrusion detection: AoI-first, utility-first, and a hybrid AoI+utility rule with a tunable trade-off parameter. Across a CIC-IDS2017 DDoS/PortScan mini subset, NSL-KDD, ToN-IoT, and a synthetic drift benchmark under clean, poisoning, and poisoning-plus-robust-aggregation settings, AoI-aware selection reduces average AoI by about 39--41% and peak AoI by about 70% relative to random sampling while keeping the per-round communication budget fixed. The hybrid policy usually preserves Macro-F1/AUC and provides an interpretable knob for balancing freshness, detection quality, and robustness, although it is not uniformly Pareto-dominant once false positive rate is included. Robustness is evaluated by combining AoI-guided selection with trimmed-mean aggregation under label-flip poisoning; the selection policy itself is not intended as a standalone Byzantine defense. The main practical message is that cloud-edge, privacy-preserving intrusion analytics can improve timeliness through a lightweight scheduling layer without changing the underlying FL participation budget.
1.4QMMay 7
An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey DataWing Yi Yu, Chun Yin Chiu
Clinical dietary assessment can generate detailed but high-dimensional nutrient and food-group information that is difficult to translate quickly into counselling priorities. This paper proposes an explainable unsupervised-to-supervised machine learning framework for discovering, reproducing and interpreting dietary patterns using public UK National Diet and Nutrition Survey data. Adult participants aged 19 years and above from NDNS Years 12-15 were represented using 25 energy-adjusted nutrient and food-group features. K-means, Gaussian Mixture Models and Agglomerative Clustering were compared across k = 2-8, with stability and dietetic interpretability used alongside internal validation metrics. The selected K-means k = 4 solution identified four interpretable dietary patterns: high fat/meat and sodium, higher fibre fruit-vegetable micronutrient, high free-sugar snacks and sugary drinks, and dairy/cereal calcium-rich saturated-fat. A supervised surrogate classifier reproduced held-out cluster membership with high test performance (macro-F1 = 0.963), but was interpreted only as an explanatory surrogate rather than as an independent clinical prediction model. SHAP analysis linked predictions to dietetically meaningful drivers, suggesting potential value for dietitian-in-the-loop assessment, counselling prioritisation and follow-up monitoring.
2.2CRMay 5
Lightweight Vulnerability Detection from Code Metrics and Token FeaturesChun Yin Chiu
Vulnerability detection for C/C++ code increasingly relies on heavy representations such as code graphs and deep models, while many practical workflows still benefit from fast and reproducible ranking baselines for human triage. This preprint studies a lightweight function-level vulnerability triage pipeline that combines sparse token n-grams from raw function text with a small set of inexpensive code metrics, including NLOC, approximate cyclomatic complexity, token count, maximum brace depth, and parameter count. We use TF-IDF token features and a class-weighted logistic regression classifier, avoiding deep learning, transformers, and program graphs. Using the Devign function-level labels, we evaluate random and cross-project settings, including a FFmpeg-to-QEMU transfer experiment. We emphasize precision-recall AUC and Recall@10% as ranking-oriented metrics for skewed or triage-oriented workloads. On the random split, the best combined variant reaches PR-AUC 0.642 and Recall@10% 0.161, while cross-project generalization is substantially harder, with PR-AUC around 0.436. We further report ablations, test-only identifier-renaming robustness, and end-to-end efficiency. The results suggest that simple token and metric features provide a useful transparent baseline, but also expose sensitivity to superficial lexical cues and limited cross-project transfer.
2.8CRMay 5
Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data SharingChun Yin Chiu
Blockchain-based IoT data sharing systems increasingly adopt a hybrid architecture in which a permissioned ledger stores tamper-evident metadata while encrypted payloads are placed in content-addressed storage. In such systems, a central security bottleneck is key access control: enforcing dynamic, multi-user authorization for releasing or using bulk-data decryption keys. Existing designs often rely on always-online RBAC or smart-contract gates that return keys to authorized users, reintroducing a trusted online policy enforcement point and weakening auditability. This paper presents a revocation-ready key management layer that replaces online key release with ciphertext key publication: the ledger records metadata of the form (CID, CK, PolicyID, epoch), where CK is a CP-ABE ciphertext encapsulating an AES-GCM key. Users retrieve CK from the ledger and decrypt locally if their attributes satisfy the policy. To support forward revocation and policy evolution without re-encrypting large files, the design introduces an epoch/time-bound attribute and a lightweight CK-rotation protocol that updates only small ciphertext keys and ledger entries. We implement a minimal end-to-end prototype using a local content-addressed store, a hash-chained ledger, and a CP-ABE backend, with the goal of isolating key-management costs rather than benchmarking production blockchain throughput. Experiments on a commodity MacBook show that CP-ABE encryption dominates store latency, with approximately 186 ms for a k=6 mixed-Boolean policy, while ledger and storage operations remain around 1-2 ms. Epoch-based revocation amortizes key update cost under churn, gateway-assisted mode reduces median client-side decryption time by more than 4x under a simulated 4x client slow-down, and ledger growth scales with the number of shared assets rather than the number of readers.
22.3CRMay 5
Towards a Zero-Trust Supply-Chain Assurance Rubric for ORAN RIC ApplicationsChun Yin Chiu
Open RAN enables third-party xApps and rApps to be onboarded and updated at operational cadence, creating a software supply chain that spans developers, CI systems, registries, onboarding pipelines, and runtime enforcement points. This preprint proposes a zero-trust supply-chain assurance rubric for O-RAN RIC applications. It makes three contributions: first, an app-centric lifecycle threat model for RIC applications across build, signing, publication, onboarding, runtime, and update or rollback stages; second, a WG11-aligned threat-control-evidence mapping that relates lifecycle threats to O-RAN security baselines and complementary supply-chain evidence; and third, an operator-facing assurance profile that combines secure software development practices, SBOM transparency, and SLSA-style provenance into incremental onboarding levels. Analytical case-study walkthroughs and a minimal evidence-checking workflow illustrate how the rubric can support explicit Accept, Escalate, or Block decisions during RIC app onboarding. The evaluation is intended to assess applicability rather than deployment-scale performance; empirical measurements of operational overhead, decision consistency, and detection coverage are left for future work.
NIMay 26, 2021
Gamers Private Network Performance Forecasting. From Raw Data to the Data Warehouse with Machine Learning and Neural NetsAlbert Wong, Chun Yin Chiu, Gaétan Hains et al.
Gamers Private Network (GPN) is a client/server technology that guarantees a connection for online video games that is more reliable and lower latency than a standard internet connection. Users of the GPN technology benefit from a stable and high-quality gaming experience for online games, which are hosted and played across the world. After transforming a massive volume of raw networking data collected by WTFast, we have structured the cleaned data into a special-purpose data warehouse and completed the extensive analysis using machine learning and neural nets technologies, and business intelligence tools. These analyses demonstrate the ability to predict and quantify changes in the network and demonstrate the benefits gained from the use of a GPN for users when connected to an online game session.