Wei Yan

h-index21

28papers

299citations

Novelty38%

AI Score49

Ranked #48,914 of 201,326 authors (top 24%)#262 in HC (top 9%)

28 Papers

CLApr 12, 2022

Trigger-GNN: A Trigger-Based Graph Neural Network for Nested Named Entity Recognition

Yuan Sui, Fanyang Bu, Yingting Hu et al.

Nested named entity recognition (NER) aims to identify the entity boundaries and recognize categories of the named entities in a complex hierarchical sentence. Some works have been done using character-level, word-level, or lexicon-level based models. However, such researches ignore the role of the complementary annotations. In this paper, we propose a trigger-based graph neural network (Trigger-GNN) to leverage the nested NER. It obtains the complementary annotation embeddings through entity trigger encoding and semantic matching, and tackle nested entity utilizing an efficient graph message passing architecture, aggregation-update mode. We posit that using entity triggers as external annotations can add in complementary supervision signals on the whole sentences. It helps the model to learn and generalize more efficiently and cost-effectively. Experiments show that the Trigger-GNN consistently outperforms the baselines on four public NER datasets, and it can effectively alleviate the nested NER.

CVJan 13Code

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

Zishan Shu, Juntong Wu, Wei Yan et al.

Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency-from low-frequency global layout to high-frequency edges and textures-is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency-time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in O(N log N) time-far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to 1.6x higher throughput and 30% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics. Codes are available at: https://github.com/ZishanShu/WaveFormer.

ITMay 14

The Construction of Near-optimal Universal Coding of Integers

Wei Yan, Yunghsiang S. Han

The Universal Coding of Integers~(UCI) is suitable for discrete memoryless sources with unknown probability distributions and infinitely countable alphabet sizes. A UCI is a class of prefix codes for which the ratio of the average codeword length to $\max\{1,H(P)\}$ is within a constant expansion factor \textcolor{red}{$C_{\mathcal{C}}$} for any decreasing probability distribution $P$, where $H(P)$ is the entropy of $P$. For any UCI code $\mathcal{C}$, \emph{the minimum expansion factor} \textcolor{red}{$C_{\mathcal{C}}^{*}$} is defined to represent the infimum of the set of extension factors of $\mathcal{C}$. Each $\mathcal{C}$ has a unique corresponding \textcolor{red}{$C_{\mathcal{C}}^{*}$}, and the smaller \textcolor{red}{$C_{\mathcal{C}}^{*}$} is, the better the compression performance of $\mathcal{C}$ is. The class of UCIs $\mathcal{C}$ (or a family $\{\mathcal{C}_i\}_{i=1}^{\infty}$) that achieves the smallest \textcolor{red}{$C_{\mathcal{C}}^{*}$} is defined as the \emph{optimal UCI}. The best current result is that the range of $C_{\mathcal{C}}^{*}$ for the optimal UCI is $2\leq C_{\mathcal{C}}^{*}\leq 2.5$. In this paper, we prove a tighter probability inequality for decreasing distributions, which serves as a new tool for studying the properties of UCIs. On the basis of this inequality, we prove that there exists a class of near-optimal UCIs, called the $ν$ code, achieving \textcolor{red}{$C_ν=2.0386$}. This narrows the range of the minimum expansion factor for the optimal UCI to $2\leq C_{\mathcal{C}}^{*}\leq 2.0386$. We show that the $ν$ code is currently optimal in terms of the minimum expansion factor. In addition, we propose a new proof showing that the minimum expansion factor of the optimal UCI is lower bounded by $2$.

CVMar 20, 2023

DIME-Net: Neural Network-Based Dynamic Intrinsic Parameter Rectification for Cameras with Optical Image Stabilization System

Shu-Hao Yeh, Shuangyu Xie, Di Wang et al.

Optical Image Stabilization (OIS) system in mobile devices reduces image blurring by steering lens to compensate for hand jitters. However, OIS changes intrinsic camera parameters (i.e. $\mathrm{K}$ matrix) dynamically which hinders accurate camera pose estimation or 3D reconstruction. Here we propose a novel neural network-based approach that estimates $\mathrm{K}$ matrix in real-time so that pose estimation or scene reconstruction can be run at camera native resolution for the highest accuracy on mobile devices. Our network design takes gratified projection model discrepancy feature and 3D point positions as inputs and employs a Multi-Layer Perceptron (MLP) to approximate $f_{\mathrm{K}}$ manifold. We also design a unique training scheme for this network by introducing a Back propagated PnP (BPnP) layer so that reprojection error can be adopted as the loss function. The training process utilizes precise calibration patterns for capturing accurate $f_{\mathrm{K}}$ manifold but the trained network can be used anywhere. We name the proposed Dynamic Intrinsic Manifold Estimation network as DIME-Net and have it implemented and tested on three different mobile devices. In all cases, DIME-Net can reduce reprojection error by at least $64\%$ indicating that our design is successful.

ITApr 19

About Optimal Prefix Codes over Countably Infinite Alphabets: Probabilistic Intervals for the Codeword Lengths Assignment

Hongyang Liu, Wei Yan

For the discrete memoryless sources with a countably infinite alphabet, we prove that for any positive integer $k$, there exists a corresponding probability interval such that if the largest symbol probability $p_{1}$ falls in this interval, the optimal code length for the symbol equals $k$. Furthermore, for infinite sources, we provide a criterion to determine probability distributions whose optimal code length assignment follows the pattern $l^{best}_{i}=i$, for $i\ge 1$. Compared with the existing conclusion for anti-uniform sources, the proposed criterion requires less information for verification.

HCAug 1, 2023

Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design

Jaechang Ko, John Ajibefun, Wei Yan

This paper introduces a new architectural design framework that utilizes generative AI tools including ChatGPT and Veras with parametric modeling and Building Information Modeling (BIM) to enhance the design process. The study experiments with the potential of ChatGPT and generative AI in 3D architectural design, extending beyond its use in text and 2D image generation. The proposed framework promotes collaboration between architects and AI, facilitating a quick exploration of design ideas and producing context-sensitive, creative design generation. By integrating ChatGPT for scripting and Veras for generating design ideas with widely used parametric modeling and BIM tools, the framework provides architects with an intuitive and powerful method to convey design intent, leading to more efficient, creative, and collaborative design processes.

CVJan 3, 2023

Deep Learning from Parametrically Generated Virtual Buildings for Real-World Object Recognition

Mohammad Alawadhi, Wei Yan

We study the use of parametric building information modeling (BIM) to automatically generate training data for artificial neural networks (ANNs) to recognize building objects in photos. Teaching artificial intelligence (AI) machines to detect building objects in images is the foundation toward AI-assisted semantic 3D reconstruction of existing buildings. However, there exists the challenge of acquiring training data which is typically human-annotated, that is, unless a computer machine can generate high-quality data to train itself for a certain task. In that vein, we trained ANNs solely on realistic computer-generated images of 3D BIM models which were parametrically and automatically generated using the BIMGenE program. The ANN training result demonstrated generalizability and good semantic segmentation on a test case as well as arbitrary photos of buildings that are outside the range of the training data, which is significant for the future of training AI with generated data for solving real-world architectural problems.

LGDec 2, 2020Code

Partially Shared Semi-supervised Deep Matrix Factorization with Multi-view Data

Haonan Huang, Naiyao Liang, Wei Yan et al.

Since many real-world data can be described from multiple views, multi-view learning has attracted considerable attention. Various methods have been proposed and successfully applied to multi-view learning, typically based on matrix factorization models. Recently, it is extended to the deep structure to exploit the hierarchical information of multi-view data, but the view-specific features and the label information are seldom considered. To address these concerns, we present a partially shared semi-supervised deep matrix factorization model (PSDMF). By integrating the partially shared deep decomposition structure, graph regularization and the semi-supervised regression model, PSDMF can learn a compact and discriminative representation through eliminating the effects of uncorrelated information. In addition, we develop an efficient iterative updating algorithm for PSDMF. Extensive experiments on five benchmark datasets demonstrate that PSDMF can achieve better performance than the state-of-the-art multi-view learning approaches. The MATLAB source code is available at https://github.com/libertyhhn/PartiallySharedDMF.

HCNov 27, 2023

Multi-3D-Models Registration-Based Augmented Reality (AR) Instructions for Assembly

Seda Tuzun Canadinc, Wei Yan

This paper introduces a novel, markerless, step-by-step, in-situ 3D Augmented Reality (AR) instruction method and its application - BRICKxAR (Multi 3D Models/M3D) - for small parts assembly. BRICKxAR (M3D) realistically visualizes rendered 3D assembly parts at the assembly location of the physical assembly model (Figure 1). The user controls the assembly process through a user interface. BRICKxAR (M3D) utilizes deep learning-trained 3D model-based registration. Object recognition and tracking become challenging as the assembly model updates at each step. Additionally, not every part in a 3D assembly may be visible to the camera during the assembly. BRICKxAR (M3D) combines multiple assembly phases with a step count to address these challenges. Thus, using fewer phases simplifies the complex assembly process while step count facilitates accurate object recognition and precise visualization of each step. A testing and heuristic evaluation of the BRICKxAR (M3D) prototype and qualitative analysis were conducted with users and experts in visualization and human-computer interaction. Providing robust 3D AR instructions and allowing the handling of the assembly model, BRICKxAR (M3D) has the potential to be used at different scales ranging from manufacturing assembly to construction.

MLJul 26, 2025

Predicting Parkinson's Disease Progression Using Statistical and Neural Mixed Effects Models: A Comparative Study on Longitudinal Biomarkers

Ran Tong, Lanruo Wang, Tong Wang et al.

Predicting Parkinson's Disease (PD) progression is crucial, and voice biomarkers offer a non-invasive method for tracking symptom severity (UPDRS scores) through telemonitoring. Analyzing this longitudinal data is challenging due to within-subject correlations and complex, nonlinear patient-specific progression patterns. This study benchmarks LMMs against two advanced hybrid approaches: the Generalized Neural Network Mixed Model (GNMM) (Mandel 2021), which embeds a neural network within a GLMM structure, and the Neural Mixed Effects (NME) model (Wortwein 2023), allowing nonlinear subject-specific parameters throughout the network. Using the Oxford Parkinson's telemonitoring voice dataset, we evaluate these models' performance in predicting Total UPDRS to offer practical guidance for PD research and clinical applications.

LGMar 10, 2024

Revisiting Edge Perturbation for Graph Neural Network in Graph Data Augmentation and Attack

Xin Liu, Yuxiang Zhang, Meng Wu et al.

Edge perturbation is a basic method to modify graph structures. It can be categorized into two veins based on their effects on the performance of graph neural networks (GNNs), i.e., graph data augmentation and attack. Surprisingly, both veins of edge perturbation methods employ the same operations, yet yield opposite effects on GNNs' accuracy. A distinct boundary between these methods in using edge perturbation has never been clearly defined. Consequently, inappropriate perturbations may lead to undesirable outcomes, necessitating precise adjustments to achieve desired effects. Therefore, questions of ``why edge perturbation has a two-faced effect?'' and ``what makes edge perturbation flexible and effective?'' still remain unanswered. In this paper, we will answer these questions by proposing a unified formulation and establishing a clear boundary between two categories of edge perturbation methods. Specifically, we conduct experiments to elucidate the differences and similarities between these methods and theoretically unify the workflow of these methods by casting it to one optimization problem. Then, we devise Edge Priority Detector (EPD) to generate a novel priority metric, bridging these methods up in the workflow. Experiments show that EPD can make augmentation or attack flexibly and achieve comparable or superior performance to other counterparts with less time overhead.

LGMay 10, 2024

Disttack: Graph Adversarial Attacks Toward Distributed GNN Training

Yuxiang Zhang, Xin Liu, Meng Wu et al.

Graph Neural Networks (GNNs) have emerged as potent models for graph learning. Distributing the training process across multiple computing nodes is the most promising solution to address the challenges of ever-growing real-world graphs. However, current adversarial attack methods on GNNs neglect the characteristics and applications of the distributed scenario, leading to suboptimal performance and inefficiency in attacking distributed GNN training. In this study, we introduce Disttack, the first framework of adversarial attacks for distributed GNN training that leverages the characteristics of frequent gradient updates in a distributed system. Specifically, Disttack corrupts distributed GNN training by injecting adversarial attacks into one single computing node. The attacked subgraphs are precisely perturbed to induce an abnormal gradient ascent in backpropagation, disrupting gradient synchronization between computing nodes and thus leading to a significant performance decline of the trained GNN. We evaluate Disttack on four large real-world graphs by attacking five widely adopted GNNs. Compared with the state-of-the-art attack method, experimental results demonstrate that Disttack amplifies the model accuracy degradation by 2.75$\times$ and achieves speedup by 17.33$\times$ on average while maintaining unnoticeability.

CVMar 24, 2024

Semantic Is Enough: Only Semantic Information For NeRF Reconstruction

Ruibo Wang, Song Zhang, Ping Huang et al.

Recent research that combines implicit 3D representation with semantic information, like Semantic-NeRF, has proven that NeRF model could perform excellently in rendering 3D structures with semantic labels. This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component. We reformulate the model and its training procedure to leverage only the cross-entropy loss between the model semantic output and the ground truth semantic images, removing the colour data traditionally used in the original Semantic-NeRF approach. We then conduct a series of identical experiments using the original and the modified Semantic-NeRF model. Our primary objective is to obverse the impact of this modification on the model performance by Semantic-NeRF, focusing on tasks such as scene understanding, object detection, and segmentation. The results offer valuable insights into the new way of rendering the scenes and provide an avenue for further research and development in semantic-focused 3D scene understanding.

AINov 9, 2024

AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality

Uttamasha Monjoree, Wei Yan

Spatial intelligence is important in Architecture, Construction, Science, Technology, Engineering, and Mathematics (STEM), and Medicine. Understanding three-dimensional (3D) spatial rotations can involve verbal descriptions and visual or interactive examples, illustrating how objects change orientation in 3D space. Recent studies show Artificial Intelligence (AI) with language and vision capabilities still face limitations in spatial reasoning. In this paper, we have studied generative AI's spatial capabilities of understanding rotations of objects utilizing its image and language processing features. We examined the spatial intelligence of the GPT-4 model with vision in understanding spatial rotation process with diagrams based on the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). Next, we incorporated a layer of coordinate system axes on Revised PSVT:R to study the variations in GPT-4's performance. We also examined GPT-4's understanding of 3D rotations in Augmented Reality (AR) scenes that visualize spatial rotations of an object in 3D space and observed increased accuracy of GPT-4's understanding of the rotations by adding supplementary textual information depicting the rotation process or mathematical representations of the rotation (e.g., matrices). The results indicate that while GPT-4 as a major current Generative AI model lacks the understanding of a spatial rotation process, it has the potential to understand the rotation process with additional information that can be provided by methods such as AR. By combining the potentials in spatial intelligence of AI with AR's interactive visualization abilities, we expect to offer enhanced guidance for students' spatial learning activities. Such spatial guidance can benefit understanding spatial transformations and additionally support processes like assembly, fabrication, and manufacturing.

IVDec 16, 2024

Point Cloud-Assisted Neural Image Compression

Ziqun Li, Qi Zhang, Xiaofeng Huang et al.

High-efficient image compression is a critical requirement. In several scenarios where multiple modalities of data are captured by different sensors, the auxiliary information from other modalities are not fully leveraged by existing image-only codecs, leading to suboptimal compression efficiency. In this paper, we increase image compression performance with the assistance of point cloud, which is widely adopted in the area of autonomous driving. We first unify the data representation for both modalities to facilitate data processing. Then, we propose the point cloud-assisted neural image codec (PCA-NIC) to enhance the preservation of image texture and structure by utilizing the high-dimensional point cloud information. We further introduce a multi-modal feature fusion transform module (MMFFT) to capture more representative image features, remove redundant information between channels and modalities that are not relevant to the image content. Our work is the first to improve image compression performance using point cloud and achieves state-of-the-art performance.

CYJun 28, 2024

An Approach to Detect Abnormal Submissions for CodeWorkout Dataset

Alex Hicks, Yang Shi, Arun-Balajiee Lekshmi-Narayanan et al.

Students interactions while solving problems in learning environments (i.e. log data) are often used to support students learning. For example, researchers use log data to develop systems that can provide students with personalized problem recommendations based on their knowledge level. However, anomalies in the students log data, such as cheating to solve programming problems, could introduce a hidden bias in the log data. As a result, these systems may provide inaccurate problem recommendations, and therefore, defeat their purpose. Classical cheating detection methods, such as MOSS, can be used to detect code plagiarism. However, these methods cannot detect other abnormal events such as a student gaming a system with multiple attempts of similar solutions to a particular programming problem. This paper presents a preliminary study to analyze log data with anomalies. The goal of our work is to overcome the abnormal instances when modeling personalizable recommendations in programming learning environments.

HCJun 9, 2024

Text2VP: Generative AI for Visual Programming and Parametric Modeling

Guangxi Feng, Wei Yan

The integration of generative artificial intelligence (AI) into architectural design has advanced significantly, enabling the generation of text, images, and 3D models. However, prior AI applications lack support for text-to-parametric models, essential for generating and optimizing diverse parametric design options. This study introduces Text-to-Visual Programming (Text2VP) GPT, a novel generative AI derived from GPT-4.1, designed to automate graph-based visual programming workflows, parameters, and their interconnections. Text2VP leverages detailed documentation, specific instructions, and example-driven few-shot learning to reflect user intentions accurately and facilitate interactive parameter adjustments. Testing demonstrates Text2VP's capability in generating functional parametric models, although higher complexity models present increased error rates. This research highlights generative AI's potential in visual programming and parametric modeling, laying groundwork for future improvements to manage complex modeling tasks. Ultimately, Text2VP aims to enable designers to easily create and modify parametric models without extensive training in specialized platforms like Grasshopper.

HCJan 24, 2022

BIM LOD + Virtual Reality -- Using Game Engine for Visualization in Architectural & Construction Education

Hassan Anifowose, Wei Yan, Manish Dixit

Architectural Education faces limitations due to its tactile approach to learning in classrooms with only 2-D and 3-D tools. At a higher level, virtual reality provides a potential for delivering more information to individuals undergoing design learning. This paper investigates a hypothesis establishing grounds towards a new research in Building Information Modeling (BIM) and Virtual Reality (VR). The hypothesis is projected to determine best practices for content creation and tactile object virtual interaction, which potentially can improve learning in architectural & construction education with a less costly approach and ease of access to well-known buildings. We explored this hypothesis in a step-by-step game design demonstration in VR, by showcasing the exploration of the Farnsworth House and reproducing assemblage of the same with different game levels of difficulty which correspond with varying BIM levels of development (LODs). The game design prototype equally provides an entry way and learning style for users with or without a formal architectural or construction education seeking to understand design tectonics within diverse or cross-disciplinary study cases. This paper shows that developing geometric abstract concepts of design pedagogy, using varying LODs for game content and levels, while utilizing newly developed features such as snap-to-grid, snap-to-position and snap-to-angle to improve user engagement during assemblage may provide deeper learning objectives for architectural precedent study.

HCSep 2, 2021

Learning Geometric Transformations for Parametric Design: An Augmented Reality (AR)-Powered Approach

Zohreh Shaghaghian, Heather Burte, Dezhen Song et al.

Despite the remarkable development of parametric modeling methods for architectural design, a significant problem still exists, which is the lack of knowledge and skill regarding the professional implementation of parametric design in architectural modeling. Considering the numerous advantages of digital/parametric modeling in rapid prototyping and simulation most instructors encourage students to use digital modeling even from the early stages of design; however, an appropriate context to learn the basics of digital design thinking is rarely provided in architectural pedagogy. This paper presents an educational tool, specifically an Augmented Reality (AR) intervention, to help students understand the fundamental concepts of para-metric modeling before diving into complex parametric modeling platforms. The goal of the AR intervention is to illustrate geometric transformation and the associated math functions so that students learn the mathematical logic behind the algorithmic thinking of parametric modeling. We have developed BRICKxAR_T, an educational AR prototype, that intends to help students learn geometric transformations in an immersive spatial AR environment. A LEGO set is used within the AR intervention as a physical manipulative to support physical interaction and im-prove spatial skill through body gesture.

LGAug 29, 2021

Convolutional versus Dense Neural Networks: Comparing the Two Neural Networks Performance in Predicting Building Operational Energy Use Based on the Building Shape

Farnaz Nazari, Wei Yan

A building self-shading shape impacts substantially on the amount of direct sunlight received by the building and contributes significantly to building operational energy use, in addition to other major contributing variables, such as materials and window-to-wall ratios. Deep Learning has the potential to assist designers and engineers by efficiently predicting building energy performance. This paper assesses the applicability of two different neural networks structures, Dense Neural Network (DNN) and Convolutional Neural Network (CNN), for predicting building operational energy use with respect to building shape. The comparison between the two neural networks shows that the DNN model surpasses the CNN model in performance, simplicity, and computation time. However, image-based CNN has the benefit of utilizing architectural graphics that facilitates design communication.

HCJun 7, 2021

Towards Learning Geometric Transformations through Play: An AR-powered approach

Zohreh Shaghaghian, Wei Yan, Dezhen Song

Despite the excessive developments of architectural parametric platforms, parametric design is often interpreted as an architectural style rather than a computational method. Also, the problem is still a lack of knowledge and skill about the technical application of parametric design in architectural modelling. Students often dive into utilizing complex digital modelling without having a competent pedagogical context to learn algorithmic thinking and the corresponding logic behind digital and parametric modelling. The insufficient skills and superficial knowledge often result in utilizing the modelling software through trial and error, not taking full advantage of what it has to offer. Geometric transformations as the fundamental functions of parametric modelling is explored in this study to anchor learning essential components in parametric modelling. Students need to understand the differences between variables, parameters, functions and their relations. Fologram, an Augmented Reality tool, is utilized in this study to learn geometric transformation and its components in an intuitive way. A LEGO set is used as an editable physical model to improve spatial skill through hand movement beside an instant feedback in the physical environment.

LGMay 10, 2021

BIM Hyperreality: Data Synthesis Using BIM and Hyperrealistic Rendering for Deep Learning

Mohammad Alawadhi, Wei Yan

Deep learning is expected to offer new opportunities and a new paradigm for the field of architecture. One such opportunity is teaching neural networks to visually understand architectural elements from the built environment. However, the availability of large training datasets is one of the biggest limitations of neural networks. Also, the vast majority of training data for visual recognition tasks is annotated by humans. In order to resolve this bottleneck, we present a concept of a hybrid system using both building information modeling (BIM) and hyperrealistic (photorealistic) rendering to synthesize datasets for training a neural network for building object recognition in photos. For generating our training dataset BIMrAI, we used an existing BIM model and a corresponding photo-realistically rendered model of the same building. We created methods for using renderings to train a deep learning model, trained a generative adversarial network (GAN) model using these methods, and tested the output model on real-world photos. For the specific case study presented in this paper, our results show that a neural network trained with synthetic data; i.e., photorealistic renderings and BIM-based semantic labels, can be used to identify building objects from photos without using photos in the training data. Future work can enhance the presented methods using available BIM models and renderings for more generalized mapping and description of photographed built environments.

LGJun 10, 2020

AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP Investigation in the Recommender System

Pengyu Zhao, Kecheng Xiao, Yuanxing Zhang et al.

Recently, deep learning models have been widely spread in the industrial recommender systems and boosted the recommendation quality. Though having achieved remarkable success, the design of task-aware recommender systems usually requires manual feature engineering and architecture engineering from domain experts. To relieve those human efforts, we explore the potential of neural architecture search (NAS) and introduce AMEIR for Automatic behavior Modeling, interaction Exploration and multi-layer perceptron (MLP) Investigation in the Recommender system. The core contributions of AMEIR are the three-stage search space and the tailored three-step searching pipeline. Specifically, AMEIR divides the complete recommendation models into three stages of behavior modeling, interaction exploration, MLP aggregation, and introduces a novel search space containing three tailored subspaces that cover most of the existing methods and thus allow for searching better models. To find the ideal architecture efficiently and effectively, AMEIR realizes the one-shot random search in recommendation progressively on the three stages and assembles the search results as the final outcome. Further analysis reveals that AMEIR's search space could cover most of the representative recommendation models, which demonstrates the universality of our design. The extensive experiments over various scenarios reveal that AMEIR outperforms competitive baselines of elaborate manual design and leading algorithmic complex NAS methods with lower model complexity and comparable time cost, indicating efficacy, efficiency and robustness of the proposed method.

DBJan 19, 2020

SQLFlow: A Bridge between SQL and Machine Learning

Yi Wang, Yang Yang, Weiguo Zhu et al.

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

CVDec 28, 2019

Application of Deep Learning in Generating Desired Design Options: Experiments Using Synthetic Training Dataset

Zohreh Shaghaghian, Wei Yan

Most design methods contain a forward framework, asking for primary specifications of a building to generate an output or assess its performance. However, architects urge for specific objectives though uncertain of the proper design parameters. Deep Learning (DL) algorithms provide an intelligent workflow in which the system can learn from sequential training experiments. This study applies a method using DL algorithms towards generating demanded design options. In this study, an object recognition problem is investigated to initially predict the label of unseen sample images based on training dataset consisting of different types of synthetic 2D shapes; later, a generative DL algorithm is applied to be trained and generate new shapes for given labels. In the next step, the algorithm is trained to generate a window/wall pattern for desired light/shadow performance based on the spatial daylight autonomy (sDA) metrics. The experiments show promising results both in predicting unseen sample shapes and generating new design options.

HCJul 29, 2019

Augmented Reality Applied to LEGO Construction: AR-based Building Instructions with High Accuracy & Precision and Realistic Object-Hand Occlusions

Wei Yan

BRICKxAR is a novel Augmented Reality (AR) instruction method for construction toys such as LEGO. With BRICKxAR, physical LEGO construction is guided by virtual bricks. Compared with the state-of-the-art, accuracy of the virtual - physical model alignment is significantly improved through a new design of marker-based registration, which can achieve an average error less than 1mm throughout the model. Realistic object occlusion is accomplished to reveal the true spatial relationship between physical and virtual bricks. LEGO players' hand detection and occlusion are realized to visualize the correct spatial relationship between real hands and virtual bricks, and allow virtual bricks to be "grasped" by real hands. The integration of these features makes AR instructions possible for small-parts assembly, validated through a working AR prototype for constructing LEGO Arc de Triomphe, quantitative measures of the accuracies of registration and occlusions, and heuristic evaluation of AR instruction features.

CVApr 18, 2019

Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds

Wei Yan, Yiting shao, Shan Liu et al.

Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving. As a newly-developed media format which is characterized by complexity and irregularity, point cloud creates a need for compression algorithms which are more flexible than existing codecs. Recently, autoencoders(AEs) have shown their effectiveness in many visual analysis tasks as well as image compression, which inspires us to employ it in point cloud compression. In this paper, we propose a general autoencoder-based architecture for lossy geometry point cloud compression. To the best of our knowledge, it is the first autoencoder-based geometry compression codec that directly takes point clouds as input rather than voxel grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previously unseen media contents and media formats, meanwhile achieving competitive performance. Our architecture consists of a pointnet-based encoder, a uniform quantizer, an entropy estimation block and a nonlinear synthesis transformation module. In lossy geometry compression of point cloud, results show that the proposed method outperforms the test model for categories 1 and 3 (TMC13) published by MPEG-3DG group on the 125th meeting, and on average a 73.15\% BD-rate gain is achieved.

CRFeb 25, 2019

DRAMNet: Authentication based on Physical Unique Features of DRAM Using Deep Convolutional Neural Networks

Nima Karimian, Fatemeh Tehranipoor, Nikolaos Anagnostopoulos et al.

Nowadays, there is an increasing interest in the development of Autonomous Vehicles (AV). However, there are two types of attack challenges that can affect AVs and are yet to be resolved, i.e., sensor attacks and vehicle access attacks. This paper, to the best of our knowledge, is the first work that proposes a novel authentication scheme involving DRAM power-up unique features using deep Convolutional Neural Network (CNN), which can be used to implement secure access control of autonomous vehicles. Our approach consists of two parts. First, we convert raw power-up sequence data from DRAM cells into a two-dimensional (2D) format to generate a DRAM image structure. Second, we apply deep CNN to DRAM images, in order to extract unique features from each memory to classify them for authentication. To evaluate our proposed approach, we utilize data from three Commercial-Off-The-Shelf (COTS) DRAMs taken under various environmental and other conditions (high/low temperature, high/low supply voltage and aging effects). Based on our results, our proposed authentication method ``DRAMNet'' achieves 98.63% accuracy and 98.49% precision. In comparison to other state-of-the-art CNN architectures, such as the AlexNet and VGGNet models, our DRAMNet approach fares equally well or better than them.