Duan Wu

2papers

2 Papers

42.7HCMar 21
A 4R-supported circular product-service system for luxury branded events

Ke Ma, Francesca Valsecchi, Yuchen Tan et al.

Temporary luxury branded events run on short cycles and bespoke builds that accelerate material churn. We present a circular phygital product-service system that operationalises the circular economy (CE) through a 4R frame (Refuse, Reduce, Reuse, and Recycling) across warehouse-to-event journeys. Developed via a multi-method design inquiry with a tier-1 contractor, the system couples physical touchpoints (reusable fold-flat transit boxes, adjustable racking, standard labels) with digital orchestration (a live digital warehouse, list-based outbound/inbound workflow, and a sustainable materials library). The architecture aligns roles and decisions, protects and identifies assets, and makes reuse the default under luxury brand constraints. By embedding traceable actions and CE-aligned rules into everyday handoffs, the PSS shifts procurement, storage, dispatch, return, and redeployment toward value retention. The contribution is a replicable, practice-ready route from circular intent to operational change in branded environments, advancing responsible retail without compromising speed or aesthetic standards.

CLJan 26
BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation

Peng Sun, Xiangyu Zhang, Duan Wu

Accurate evaluation of user satisfaction is critical for iterative development of conversational AI. However, for open-ended assistants, traditional A/B testing lacks reliable metrics: explicit feedback is sparse, while implicit metrics are ambiguous. To bridge this gap, we introduce BoRP (Bootstrapped Regression Probing), a scalable framework for high-fidelity satisfaction evaluation. Unlike generative approaches, BoRP leverages the geometric properties of LLM latent space. It employs a polarization-index-based bootstrapping mechanism to automate rubric generation and utilizes Partial Least Squares (PLS) to map hidden states to continuous scores. Experiments on industrial datasets show that BoRP (Qwen3-8B/14B) significantly outperforms generative baselines (even Qwen3-Max) in alignment with human judgments. Furthermore, BoRP reduces inference costs by orders of magnitude, enabling full-scale monitoring and highly sensitive A/B testing via CUPED.