Mohammad Amin Shabani

h-index7

4papers

162citations

Novelty44%

AI Score27

Ranked #157,983 of 194,257 authors (top 81%)#51,214 in CV (top 87%)

4 Papers

25.4CVNov 23, 2022Code

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Mohammad Amin Shabani, Sepidehsadat Hosseini, Yasutaka Furukawa

The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room. A project website with supplementary video and document is here https://aminshabani.github.io/housediffusion.

18.4AINov 24, 2022

PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust et al.

This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system "PuzzleFusion" takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements: 1) 2D Voronoi jigsaw dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D pointset; and 2) MagicPlan dataset, a real one offered by MagicPlan from its production pipeline, where pieces are room layouts constructed by augmented reality App by real-estate consumers. The qualitative and quantitative evaluations demonstrate that our approach outperforms the competing methods by significant margins in all the tasks.

1.8CVJul 8, 2019

Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasksks, Multi Task Learning, Semi-Supervised Learning

Sepidehsadat Hosseini, Mohammad Amin Shabani, Nam Ik Cho

We propose a new semi-supervised learning method on face-related tasks based on Multi-Task Learning (MTL) and data distillation. The proposed method exploits multiple datasets with different labels for different-but-related tasks such as simultaneous age, gender, race, facial expression estimation. Specifically, when there are only a few well-labeled data for a specific task among the multiple related ones, we exploit the labels of other related tasks in different domains. Our approach is composed of (1) a new MTL method which can deal with weakly labeled datasets and perform several tasks simultaneously, and (2) an MTL-based data distillation framework which enables network generalization for the training and test data from different domains. Experiments show that the proposed multi-task system performs each task better than the baseline single task. It is also demonstrated that using different domain datasets along with the main dataset can enhance network generalization and overcome the domain differences between datasets. Also, comparing data distillation both on the baseline and MTL framework, the latter shows more accurate predictions on unlabeled data from different domains. Furthermore, by proposing a new learning-rate optimization method, our proposed network is able to dynamically tune its learning rate.

5.7HCSep 12, 2017

A low cost non-wearable gaze detection system based on infrared image processing

Ehsan Arbabi, Mohammad Shabani, Ali Yarigholi

Human eye gaze detection plays an important role in various fields, including human-computer interaction, virtual reality and cognitive science. Although different relatively accurate systems of eye tracking and gaze detection exist, they are usually either too expensive to be bought for low cost applications or too complex to be implemented easily. In this article, we propose a non-wearable system for eye tracking and gaze detection with low complexity and cost. The proposed system provides a medium accuracy which makes it suitable for general applications in which low cost and easy implementation is more important than achieving very precise gaze detection. The proposed method includes pupil and marker detection using infrared image processing, and gaze evaluation using an interpolation-based strategy. The interpolation-based strategy exploits the positions of the detected pupils and markers in a target captured image and also in some previously captured training images for estimating the position of a point that the user is gazing at. The proposed system has been evaluated by three users in two different lighting conditions. The experimental results show that the accuracy of this low cost system can be between 90% and 100% for finding major gazing directions.