Kaiming Shen

h-index18

8papers

281citations

Novelty57%

AI Score52

Ranked #35,458 of 201,326 authors (top 18%)#315 in IR (top 14%)

8 Papers

48.2ITJun 3

Bounded Deep Unfolding for Joint Beamforming and Scheduling in Multi-Cell MIMO Networks

Jiansheng Li, Shuqi Chai, Fan Xu et al.

This paper investigates the joint resource block group (RBG) scheduling and beamforming optimization problem for weighted sum-rate (WSR) maximization in multi-cell multiple-input multiple-output (MIMO) downlink networks. While the Fast Fractional Programming (FastFP) framework provides a reliable model-driven solution, it suffers from conservative continuous beamforming updates and prohibitive computational overhead during the discrete RBG matching phase. To address these bottlenecks, we propose a joint deep unfolding framework comprising two core modules: P-Net and K-Net. For continuous beamforming, P-Net learns an adaptive relaxation factor along the analytical FastFP update direction. By strictly constraining this factor within an ascent-preserving interval, P-Net accelerates the optimization trajectory while rigorously retaining monotonic improvement and stationary-point convergence guarantees. For discrete RBG scheduling, K-Net learns a long-horizon priority policy that guides a low-complexity greedy assignment, effectively preserving the assignment quality while bypassing the high complexity of Hungarian matching. Both networks leverage analytical algorithmic priors and utilize recurrent parameter sharing, enabling flexible inference beyond the training horizon. Extensive simulations demonstrate that the proposed joint framework achieves higher WSR and faster execution times than conventional model-driven baselines, while generalizing robustly across unseen network scales, antenna configurations, and channel conditions without retraining.

IRSep 16, 2023

An Unified Search and Recommendation Foundation Model for Cold-Start Scenario

Yuqi Gong, Xichen Ding, Yehui Su et al.

In modern commercial search engines and recommendation systems, data from multiple domains is available to jointly train the multi-domain model. Traditional methods train multi-domain models in the multi-task setting, with shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of features, labels, and sample distributions of individual tasks. With the development of large language models, LLM can extract global domain-invariant text features that serve both search and recommendation tasks. We propose a novel framework called S\&R Multi-Domain Foundation, which uses LLM to extract domain invariant features, and Aspect Gating Fusion to merge the ID feature, domain invariant text features and task-specific heterogeneous sparse features to obtain the representations of query and item. Additionally, samples from multiple search and recommendation scenarios are trained jointly with Domain Adaptive Multi-Task module to obtain the multi-domain foundation model. We apply the S\&R Multi-Domain foundation model to cold start scenarios in the pretrain-finetune manner, which achieves better performance than other SOTA transfer learning methods. The S\&R Multi-Domain Foundation model has been successfully deployed in Alipay Mobile Application's online services, such as content query recommendation and service card recommendation, etc.

95.3ITMay 24

Eliminating Blind Spots from Wireless Network by Metasurface: A Blind Approach

Wenhai Lai, Mingxiao Li, Kaiming Shen et al.

Deploying metasurfaces (MTSs) to eliminate wireless blind spots requires jointly determining the physical placement of MTSs and the meta-atom phase shifts. Existing methods typically rely on explicit channel estimation, which incurs prohibitive overhead and is often intractable in real-world networks. To sidestep this bottleneck, we propose RFZero, a channel-state-information (CSI)-free deployment paradigm. Instead of estimating channels, RFZero extracts macro-environmental features from visual photos to guide MTS placement, and leverages reference signal received power (RSRP) feedback for dynamic phase-shift optimization. Most importantly, RFZero operates independently of base stations, thereby enabling seamless plug-and-play implementation. Real-world field tests confirm that RFZero completely eliminates all blind spots in a $100\text{ m}^2$ indoor area using just a pair of $1.5\text{ m}\times 0.9\text{ m}$ MTSs.

CLApr 3, 2023

Crossword: A Semantic Approach to Data Compression via Masking

Mingxiao Li, Rui Jin, Liyao Xiang et al.

The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental limit as entropy for lossless compression and as mutual information for lossy compression. However, the source (including text, music, and speech) in the real world is often statistically ill-defined because of its close connection to human perception, and thus the model-driven approach can be quite suboptimal. This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The main idea stems from the puzzle crossword, observing that the hidden words can still be precisely reconstructed so long as some key letters are provided. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer. Our experiments show that the proposed semantic approach can achieve much higher compression efficiency than the traditional methods such as Huffman code and UTF-8 code, while preserving the meaning in the target text to a great extent.

IRJul 15, 2024

SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation

Kaiming Shen, Xichen Ding, Zixiang Zheng et al.

The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist divergence across modalities. We also observe that users' search query sequences and item browsing sequences can fully depict users' intents and benefit from each other. To address these challenges, we propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval. Specifically, a network called Pretraining Search Unit (PSU) learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner with multiple objectives: multi-modal alignment, next query-item pair prediction, query-item relevance prediction, etc. After pretraining, the downstream model restores the pretrained embedding as initialization and finetunes the network. To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy to approximate the exact attention calculati

IRFeb 5, 2024

Denoising Time Cycle Modeling for Recommendation

Sicong Xie, Qunwei Li, Weidi Xu et al.

Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.

54.9ITMar 24

Ellipsoidal Manifold Optimization for Distributed Antenna Beamforming

Minhao Zhu, Kaiming Shen

This paper addresses the weighted sum-rate (WSR) maximization problem in a downlink distributed antenna system subject to per-cluster power constraints. This optimization scenario presents significant challenges due to the high dimensionality of beamforming variables in dense antenna deployments and the structural complexity of multiple independent power constraints. To overcome these difficulties, we generalize the low-dimensional subspace property--previously established for sum-power constraints--to the per-cluster power constraint case. We prove that all stationary-point beamformers reside in a reduced subspace spanned by the channel vectors of the corresponding antenna cluster. Leveraging this property, we reformulate the original high-dimensional constrained problem into an unconstrained optimization task over a product of ellipsoidal manifolds, thereby achieving significant dimensionality reduction. We systematically derive the necessary Riemannian geometric structures for this specific manifold, including the tangent space, Riemannian metric, orthogonal projection, retraction, and vector transport. Subsequently, we develop a tailored Riemannian conjugate gradient algorithm to solve the reformulated problem. Numerical simulations demonstrate that the proposed algorithm achieves the same local optima as standard benchmarks, such as the weighted minimum mean square error (WMMSE) method and conventional manifold optimization, but with substantially higher computational efficiency and scalability, particularly as the number of antenna clusters increases.

SPAug 4, 2018

Spatial Deep Learning for Wireless Scheduling

Wei Cui, Kaiming Shen, Wei Yu

The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; furthermore, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass the channel estimation and to schedule links efficiently based solely on the geographic locations of the transmitters and the receivers, due to the fact that in many propagation environments, the wireless channel strength is largely a function of the distance dependent path-loss. This is accomplished by unsupervised training over randomly deployed networks, and by using a novel neural network architecture that computes the geographic spatial convolutions of the interfering or interfered neighboring nodes along with subsequent multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Moreover, to provide fairness, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling algorithm over judiciously chosen subsets of links for maximizing a proportional fairness objective over the network. The proposed approach shows highly competitive and generalizable network utility maximization results.