Shiyu Tan

CV
4papers
28citations
Novelty61%
AI Score43

4 Papers

OPTICSDec 13, 2022
Foveated Thermal Computational Imaging in the Wild Using All-Silicon Meta-Optics

Vishwanath Saragadam, Zheyi Han, Vivek Boominathan et al. · cmu

Foveated imaging provides a better tradeoff between situational awareness (field of view) and resolution and is critical in long-wavelength infrared regimes because of the size, weight, power, and cost of thermal sensors. We demonstrate computational foveated imaging by exploiting the ability of a meta-optical frontend to discriminate between different polarization states and a computational backend to reconstruct the captured image/video. The frontend is a three-element optic: the first element which we call the "foveal" element is a metalens that focuses s-polarized light at a distance of $f_1$ without affecting the p-polarized light; the second element which we call the "perifoveal" element is another metalens that focuses p-polarized light at a distance of $f_2$ without affecting the s-polarized light. The third element is a freely rotating polarizer that dynamically changes the mixing ratios between the two polarization states. Both the foveal element (focal length = 150mm; diameter = 75mm), and the perifoveal element (focal length = 25mm; diameter = 25mm) were fabricated as polarization-sensitive, all-silicon, meta surfaces resulting in a large-aperture, 1:6 foveal expansion, thermal imaging capability. A computational backend then utilizes a deep image prior to separate the resultant multiplexed image or video into a foveated image consisting of a high-resolution center and a lower-resolution large field of view context. We build a first-of-its-kind prototype system and demonstrate 12 frames per second real-time, thermal, foveated image, and video capture in the wild.

38.5CVMay 13
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

Shiyu Tan, Zixuan Zhao, Hao Gao et al.

Boundary Representation (BRep) is the standard format for Computer-Aided Design (CAD), yet reconstructing high-quality BReps from single-view images remains challenging due to the complexity of topological constraints and operation sequences. We present Img2CADSeq, a multi-stage pipeline that overcomes these limitations by encoding CAD sequences into a three-level hierarchical codebook. Guided by an importance prioritization, this strategy values profiles over details, compressing long sequences into a stable discrete latent space. To bridge the modality gap, we leverage a coarse-to-fine point cloud intermediate, aligning 2D visual features with 3D CAD sequences via contrastive learning to condition a VQ-Diffusion model. Supported by newly introduced CAD-220K and PrintCAD datasets, our approach ensures robust industrial domain adaptation. Extensive experiments demonstrate that Img2CADSeq significantly outperforms state-of-the-art methods, producing standard STEP files that can be directly used in commercial CAD software.

CVJul 26, 2024
IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs

Jingpeng Xie, Shiyu Tan, Yuanlei Wang et al.

Large-scale Neural Radiance Fields (NeRF) reconstructions are typically hindered by the requirement for extensive image datasets and substantial computational resources. This paper introduces IOVS4NeRF, a framework that employs an uncertainty-guided incremental optimal view selection strategy adaptable to various NeRF implementations. Specifically, by leveraging a hybrid uncertainty model that combines rendering and positional uncertainties, the proposed method calculates the most informative view from among the candidates, thereby enabling incremental optimization of scene reconstruction. Our detailed experiments demonstrate that IOVS4NeRF achieves high-fidelity NeRF reconstruction with minimal computational resources, making it suitable for large-scale scene applications.

CVApr 9, 2021
CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo

Shiyu Tan, Yicheng Wu, Shoou-I Yu et al.

Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo imaging system. The phase mask creates a depth-dependent point spread function, allowing us to recover sharp image texture and stereo correspondence over a significantly extended depth of field (EDOF) than conventional stereo. The phase mask pattern, the EDOF image reconstruction, and the stereo disparity estimation are all trained together using an end-to-end learned deep neural network. We perform theoretical analysis and characterization of the proposed approach and show a 6x increase in volume that can be imaged in simulation. We also build an experimental prototype and validate the approach using real-world results acquired using this prototype system.