CEMay 18
IterSIMP-σ: Evaluating LLM-Assisted Spatial Interventions in Stress-Aware Topology OptimizationShaoliang Yang, Jun Wang, Yunsheng Wang
This paper studies whether multimodal large language models (LLMs) can serve as inspectable spatial proposal modules for stress-aware topology optimization. IterSIMP-σ keeps the SIMP optimizer as a compliance-minimizing finite-element solver and places a deterministic stress pass, gate evaluator, and hybrid LLM/rule interpreter around it. After each solve, density and von Mises stress fields are rendered; the interpreter proposes ranked spatial interventions; and deterministic safeguards accept, reject, or stop each action. The main action is a soft density seed, where selected elements are initialized at elevated density before the next solve but remain free under the optimality-criteria update. We evaluate the loop on a 16-problem 2D controller-policy benchmark, a six-problem exploratory 3D extension, passive-solid and input ablations, stress-threshold sensitivity, and a fixed-volume attribution study comparing LLM proposals with deterministic max-stress hotspot seeding, random stress-region seeding, and rule-based control. The 2D controller-policy benchmark shows a small retained-compliance difference (1.9% lower geometric mean for the soft-seed LLM), but this diagnostic is not statistically significant (W = 33, two-sided p = 0.382) and is not a fixed-volume feasible-final comparison. In the fixed-volume study, the LLM condition completed 44/48 attempted evaluations; 25/44 completed evaluations produced all-gate-passing retained states. Feasible-final scoring against rule-based control is split 4/4/1, and deterministic exact-hotspot seeding remains competitive. Accepted LLM spatial actions with per-step records have mean normalized seed-to-hotspot distance 0.221. The results support IterSIMP-σ as an inspectable LLM-assisted design-automation framework for spatial interventions, not yet as evidence that LLM visual reasoning improves stress-constrained optimization.
AIJul 29, 2024
Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components RecommendationYunsheng Wang, Songhao Chen, Kevin Jin
Knowledge graphs (KGs) are essential in applications such as network alignment, question-answering, and recommender systems (RSs) since they offer structured relational data that facilitate the inference of indirect relationships. However, the development of KG-based RSs capable of processing user inputs in natural language faces significant challenges. Firstly, natural language processing units must effectively handle the ambiguity and variability in human language to interpret user intents accurately. Secondly, the system must precisely identify and link entities, like product names, to their corresponding nodes in KGs. To overcome these challenges, supported by Lenovo, we developed a novel chatbot called "Prometheus," which integrates a KG with a large language model (LLM), specifically designed for recommending computer components. This chatbot can accurately decode user requests and deliver personalized recommendations derived from KGs, ensuring precise comprehension and response to their computer setup needs.
CEMar 26
Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology OptimizationShaoliang Yang, Jun Wang, Yunsheng Wang
We present a framework in which a large language model (LLM) acts as an online adaptive controller for SIMP topology optimization, replacing conventional fixed-schedule continuation with real-time, state-conditioned parameter decisions. At every $k$-th iteration, the LLM receives a structured observation$-$current compliance, grayness index, stagnation counter, checkerboard measure, volume fraction, and budget consumption$-$and outputs numerical values for the penalization exponent $p$, projection sharpness $β$, filter radius $r_{\min}$, and move limit $δ$ via a Direct Numeric Control interface. A hard grayness gate prevents premature binarization, and a meta-optimization loop uses a second LLM pass to tune the agent's call frequency and gate threshold across runs. We benchmark the agent against four baselines$-$fixed (no-continuation), standard three-field continuation, an expert heuristic, and a schedule-only ablation$-$on three 2-D problems (cantilever, MBB beam, L-bracket) at $120\!\times\!60$ resolution and two 3-D problems (cantilever, MBB beam) at $40\!\times\!20\!\times\!10$ resolution, all run for 300 iterations. A standardized 40-iteration sharpening tail is applied from the best valid snapshot so that compliance differences reflect only the exploration phase. The LLM agent achieves the lowest final compliance on every benchmark: $-5.7\%$ to $-18.1\%$ relative to the fixed baseline, with all solutions fully binary. The schedule-only ablation underperforms the fixed baseline on two of three problems, confirming that the LLM's real-time intervention$-$not the schedule geometry$-$drives the gain. Code and reproduction scripts will be released upon publication.
CEApr 20
Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter KernelsShaoliang Yang, Jun Wang, Yunsheng Wang
The matrix-free gather-batched-GEMM-scatter pattern eliminates global stiffness assembly for three-dimensional SIMP topology optimization, but the conventional three-stage implementation forces avoidable DRAM traffic between stages. We present a single fused CUDA kernel, implemented through CuPy's runtime compilation interface, that performs gather, per-element stiffness multiplication, and scatter accumulation in one pass. On a single RTX 4090 (24 GB), the fused path reaches a problem-size-dependent 4.6-7.3x end-to-end SIMP wall-time speedup across 216k-4.9M cantilever elements and 4.4x on the 499,125-element torsion benchmark. Against the same-precision FP32 three-stage baseline, the fused path still yields 2.3-4.6x on cantilever and 2.8x on torsion. Isolated CUDA-event cantilever-operator measurements reach 8.9-13.8x per matvec call, while separate instrumented board-power traces at 216k and 1M show 3.2-4.9x lower energy than matched FP64 runs. A separate bridge stress test shows the same FP32-versus-FP64 three-stage trend under one distributed-load case; direct fused-kernel bridge benchmarks are not reported. We also evaluate a BF16 WMMA variant: a separate PyTorch BF16 GEMM proxy on matching tensor shapes yields 14.3x, but direct condition-number estimates of 6.1e5-2.3e6 across 64k-512k uniform-density test states imply BF16 conditioning products of 2.4e3-9.1e3, far above the 256 threshold, observed alongside BF16 iterative-refinement stagnation at the two tested inner tolerances.
CEMar 27
AutoSiMP: Autonomous Topology Optimization from Natural Language via LLM-Driven Problem Configuration and Adaptive Solver ControlShaoliang Yang, Jun Wang, Yunsheng Wang
We present AutoSiMP, an autonomous pipeline that transforms a natural-language structural problem description into a validated, binary topology without manual configuration. The pipeline comprises five modules: (1) an LLM-based configurator that parses a plain-English prompt into a validated specification of geometry, supports, loads, passive regions, and mesh parameters; (2) a boundary-condition generator producing solver-ready DOF arrays, force vectors, and passive-element masks; (3) a three-field SIMP solver with Heaviside projection and pluggable continuation control; (4) an eight-check structural evaluator (connectivity, compliance, grayness, volume fraction, convergence, plus three informational quality metrics); and (5) a closed-loop retry mechanism. We evaluate on three axes. Configuration accuracy: across 10 diverse problems the configurator produces valid specifications on all cases with a median compliance penalty of $+0.3\%$ versus expert ground truth. Controller comparison: on 17 benchmarks with six controllers sharing an identical sharpening tail, the LLM controller achieves the lowest median compliance but $76.5\%$ pass rate, while the deterministic schedule achieves $100\%$ pass rate at only $+1.5\%$ higher compliance. End-to-end reliability: with the schedule controller, all LLM-configured problems pass every quality check on the first attempt $-$ no retries needed. Among the systems surveyed in this work (Table 1), AutoSiMP is the first to close the full loop from natural-language problem description to validated structural topology. The complete codebase, all specifications, and an interactive web demo will be released upon journal acceptance.
CEApr 29
A Matrix-Free Galerkin Multigrid Solver and Failure-Mode Screen for Single-GPU 3D SIMP Linear SystemsShaoliang Yang, Jun Wang, Yunsheng Wang
Large 3D SIMP studies require repeated elasticity solves for density-dependent operators whose finest matrices are expensive to assemble and whose conditioning degrades under high contrast. We study this linear-solver layer rather than claiming end-to-end optimization acceleration. The solver builds a matrix-free Galerkin geometric multigrid (GMG) hierarchy around a fused fine operator: the finest level remains matrix-free, the first coarse level is assembled by local Galerkin aggregation, and deeper levels use sparse Galerkin products. The practical default is FP32-GMG; BF16 is evaluated as a guarded mixed-precision variant and diagnostic stress test, not as the main speed mechanism. In a 27-case heterogeneous cantilever sweep, pass rates under a 200-iteration budget are 7/9, 4/9, and 1/9 at 64k, 216k, and 512k elements; converged-only mean iteration counts are about 112, 134, and 146. On uniform rho=0.5, p=3 solves, FP32-GMG gives 1.62x, 1.75x, and 3.12x wall-time ratios relative to the capped flat Jacobi-PCG baseline at the same sizes; that non-converged baseline reaches the 200-iteration cap in all timed trials. BF16-GMG is not faster than FP32-GMG. In 18 fixed-seed heterogeneous BF16 validation cases, 7/18 converge, matching the FP64 count, and 11 cases that pass the spectral screen still fail the 500-iteration cap; the screen is therefore diagnostic rather than a convergence certificate. The largest reported solve is a 1M-element uniform-modulus system solved in 1.50+/-0.58 s with an 8.66 GiB hierarchy-allocation delta during setup, not a peak-memory trace; this point is reported as uniform scaling, not heterogeneous robustness evidence. The contribution is therefore a bounded single-GPU solver result built on an inherited Level 0 matrix-free operator: a Galerkin GMG hierarchy, direct BF16 guard evidence, and an explicit failure-mode screen for structured 3D SIMP linear systems.
CVJan 6, 2020
CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature DescriptionDeyu Yin, Qian Zhang, Jingbin Liu et al.
As an important technology in 3D mapping, autonomous driving, and robot navigation, LiDAR odometry is still a challenging task. Appropriate data structure and unsupervised deep learning are the keys to achieve an easy adjusted LiDAR odometry solution with high performance. Utilizing compact 2D structured spherical ring projection model and voxel model which preserves the original shape of input data, we propose a fully unsupervised Convolutional Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE. We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio; 2) besides, we also propose a keyframe selection method based on matching pairs transferring, an odometry refinement method for keyframes based on extended interest points from spherical rings, and a backward pose update method. The odometry refinement experiments verify the proposed ideas' feasibility and effectiveness.