CVJun 13, 2022
Efficient Human-in-the-loop System for Guiding DNNs AttentionYi He, Xi Yang, Chia-Ming Chang et al.
Attention guidance is an approach to addressing dataset bias in deep learning, where the model relies on incorrect features to make decisions. Focusing on image classification tasks, we propose an efficient human-in-the-loop system to interactively direct the attention of classifiers to the regions specified by users, thereby reducing the influence of co-occurrence bias and improving the transferability and interpretability of a DNN. Previous approaches for attention guidance require the preparation of pixel-level annotations and are not designed as interactive systems. We present a new interactive method to allow users to annotate images with simple clicks, and study a novel active learning strategy to significantly reduce the number of annotations. We conducted both a numerical evaluation and a user study to evaluate the proposed system on multiple datasets. Compared to the existing non-active-learning approach which usually relies on huge amounts of polygon-based segmentation masks to fine-tune or train the DNNs, our system can save lots of labor and money and obtain a fine-tuned network that works better even when the dataset is biased. The experiment results indicate that the proposed system is efficient, reasonable, and reliable.
CVMar 29, 2022
AutoPoly: Predicting a Polygonal Mesh Construction Sequence from a Silhouette ImageI-Chao Shen, Yu Ju Chen, Oliver van Kaick et al.
Polygonal modeling is a core task of content creation in Computer Graphics. The complexity of modeling, in terms of the number and the order of operations and time required to execute them makes it challenging to learn and execute. Our goal is to automatically derive a polygonal modeling sequence for a given target. Then, one can learn polygonal modeling by observing the resulting sequence and also expedite the modeling process by starting from the auto-generated result. As a starting point for building a system for 3D modeling in the future, we tackle the 2D shape modeling problem and present AutoPoly, a hybrid method that generates a polygonal mesh construction sequence from a silhouette image. The key idea of our method is the use of the Monte Carlo tree search (MCTS) algorithm and differentiable rendering to separately predict sequential topological actions and geometric actions. Our hybrid method can alter topology, whereas the recently proposed inverse shape estimation methods using differentiable rendering can only handle a fixed topology. Our novel reward function encourages MCTS to select topological actions that lead to a simpler shape without self-intersection. We further designed two deep learning-based methods to improve the expansion and simulation steps in the MCTS search process: an $n$-step "future action prediction" network (nFAP-Net) to generate candidates for potential topological actions, and a shape warping network (WarpNet) to predict polygonal shapes given the predicted rendered images and topological actions. We demonstrate the efficiency of our method on 2D polygonal shapes of multiple man-made object categories.
66.7GRMay 25
Garment Particles: A 2D--3D Symmetric Garment Representation for Generation and EditingKiyohiro Nakayama, I-chao Shen, Ruofan Liu et al.
Practical garment design spans two modes: intuitive creation from high-level intent, such as a reference image or text description, and complex low-level editing across 2D sewing patterns and 3D draped geometry, which requires professional training to navigate their complex interdependencies. Yet existing frameworks address only part of this challenge, offering either garment generation from casual inputs or direct editing on sewing patterns. To support both ends of the spectrum, we propose Garment Particles, a 5D point-cloud representation that jointly encodes 2D sewing patterns and 3D geometry. This representation enables Garment Particles Flow (GPF), a rectified flow framework that supports intuitive generation from high-level inputs (text, images, sketches) and various editing operations on 2D sewing patterns and 3D geometries via diffusion posterior sampling. Finally, we introduce Particles-to-Pattern Flow that converts generated garment particles into curved-based patterns for simulation. We validate our model's generation ability on multiple datasets, achieving state-of-the-art garment generation results against competitive baselines. Our model also enables many garment editing scenarios, including garment interpolation, sewing pattern editing, point-cloud- and silhouette-conditioned garment generation. Our project website is at https://garment-particles.github.io .
HCFeb 2
See2Refine: Vision-Language Feedback Improves LLM-Based eHMI Action DesignersDing Xia, Xinyue Gui, Mark Colley et al.
Automated vehicles lack natural communication channels with other road users, making external Human-Machine Interfaces (eHMIs) essential for conveying intent and maintaining trust in shared environments. However, most eHMI studies rely on developer-crafted message-action pairs, which are difficult to adapt to diverse and dynamic traffic contexts. A promising alternative is to use Large Language Models (LLMs) as action designers that generate context-conditioned eHMI actions, yet such designers lack perceptual verification and typically depend on fixed prompts or costly human-annotated feedback for improvement. We present See2Refine, a human-free, closed-loop framework that uses vision-language model (VLM) perceptual evaluation as automated visual feedback to improve an LLM-based eHMI action designer. Given a driving context and a candidate eHMI action, the VLM evaluates the perceived appropriateness of the action, and this feedback is used to iteratively revise the designer's outputs, enabling systematic refinement without human supervision. We evaluate our framework across three eHMI modalities (lightbar, eyes, and arm) and multiple LLM model sizes. Across settings, our framework consistently outperforms prompt-only LLM designers and manually specified baselines in both VLM-based metrics and human-subject evaluations. Results further indicate that the improvements generalize across modalities and that VLM evaluations are well aligned with human preferences, supporting the robustness and effectiveness of See2Refine for scalable action design.
CVNov 29, 2023
MicroGlam: Microscopic Skin Image Dataset with CosmeticsToby Chong, Alina Chadwick, I-chao Shen et al.
In this paper, we present a cosmetic-specific skin image dataset. It consists of skin images from $45$ patches ($5$ skin patches each from $9$ participants) of size $8mm^*8mm$ under three cosmetic products (i.e., foundation, blusher, and highlighter). We designed a novel capturing device inspired by Light Stage. Using the device, we captured over $600$ images of each skin patch under diverse lighting conditions in $30$ seconds. We repeated the process for the same skin patch under three cosmetic products. Finally, we demonstrate the viability of the dataset with an image-to-image translation-based pipeline for cosmetic rendering and compared our data-driven approach to an existing cosmetic rendering method.
CVNov 28, 2025Code
Cascaded Robust Rectification for Arbitrary Document ImagesChaoyun Wang, Quanxin Huang, I-Chao Shen et al.
Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1\%--34.7\% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.
CVFeb 24
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian SplattingShuangkang Fang, I-Chao Shen, Xuanyang Zhang et al.
Recent 3D Gaussian Splatting (3DGS) Dropout methods address overfitting under sparse-view conditions by randomly nullifying Gaussian opacities. However, we identify a neighbor compensation effect in these approaches: dropped Gaussians are often compensated by their neighbors, weakening the intended regularization. Moreover, these methods overlook the contribution of high-degree spherical harmonic coefficients (SH) to overfitting. To address these issues, we propose DropAnSH-GS, a novel anchor-based Dropout strategy. Rather than dropping Gaussians independently, our method randomly selects certain Gaussians as anchors and simultaneously removes their spatial neighbors. This effectively disrupts local redundancies near anchors and encourages the model to learn more robust, globally informed representations. Furthermore, we extend the Dropout to color attributes by randomly dropping higher-degree SH to concentrate appearance information in lower-degree SH. This strategy further mitigates overfitting and enables flexible post-training model compression via SH truncation. Experimental results demonstrate that DropAnSH-GS substantially outperforms existing Dropout methods with negligible computational overhead, and can be readily integrated into various 3DGS variants to enhance their performances. Project Website: https://sk-fun.fun/DropAnSH-GS
CVJul 20, 2025Code
Axis-Aligned Document DewarpingChaoyun Wang, I-Chao Shen, Takeo Igarashi et al.
Document dewarping is crucial for many applications. However, existing learning-based methods rely heavily on supervised regression with annotated data without fully leveraging the inherent geometric properties of physical documents. Our key insight is that a well-dewarped document is defined by its axis-aligned feature lines. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. Harnessing this property, we introduce three synergistic contributions: for the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping; for the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty; and for the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves state-of-the-art performance on multiple existing benchmarks, improving the AAD metric by 18.2% to 34.5%. The code is publicly available at https://github.com/chaoyunwang/AADD.
IVMar 2, 2020Code
IntrA: 3D Intracranial Aneurysm Dataset for Deep LearningXi Yang, Ding Xia, Taichi Kin et al.
Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a large-scale benchmark of classification and part segmentation by testing state-of-the-art networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: https://github.com/intra3d2019/IntrA.
HCDec 31, 2024
Proactive Conversational Agents with Inner ThoughtsXingyu Bruce Liu, Shitao Fang, Weiyan Shi et al.
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
CVMar 11, 2024
FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font ApplicationsYuki Tatsukawa, I-Chao Shen, Anran Qi et al.
Acquiring the desired font for various design tasks can be challenging and requires professional typographic knowledge. While previous font retrieval or generation works have alleviated some of these difficulties, they often lack support for multiple languages and semantic attributes beyond the training data domains. To solve this problem, we present FontCLIP: a model that connects the semantic understanding of a large vision-language model with typographical knowledge. We integrate typography-specific knowledge into the comprehensive vision-language knowledge of a pretrained CLIP model through a novel finetuning approach. We propose to use a compound descriptive prompt that encapsulates adaptively sampled attributes from a font attribute dataset focusing on Roman alphabet characters. FontCLIP's semantic typographic latent space demonstrates two unprecedented generalization abilities. First, FontCLIP generalizes to different languages including Chinese, Japanese, and Korean (CJK), capturing the typographical features of fonts across different languages, even though it was only finetuned using fonts of Roman characters. Second, FontCLIP can recognize the semantic attributes that are not presented in the training data. FontCLIP's dual-modality and generalization abilities enable multilingual and cross-lingual font retrieval and letter shape optimization, reducing the burden of obtaining desired fonts.
GRAug 2, 2025
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D MeshShuangkang Fang, I-Chao Shen, Yufeng Wang et al.
We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits. This enables the creation of a large-scale dataset with 1500k+ samples, almost 50 times larger than previous methods, which aligns better with the LLM scaling law principles. Furthermore, we propose inferring face connectivity from vertices and local mesh assembly training strategies, significantly enhancing the LLMs' ability to capture mesh topology and spatial structures. Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding, highlighting its great potential in processing text-serialized 3D meshes.
CVJul 31, 2025
NeRF Is a Valuable Assistant for 3D Gaussian SplattingShuangkang Fang, I-Chao Shen, Takeo Igarashi et al.
We introduce NeRF-GS, a novel framework that jointly optimizes Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). This framework leverages the inherent continuous spatial representation of NeRF to mitigate several limitations of 3DGS, including sensitivity to Gaussian initialization, limited spatial awareness, and weak inter-Gaussian correlations, thereby enhancing its performance. In NeRF-GS, we revisit the design of 3DGS and progressively align its spatial features with NeRF, enabling both representations to be optimized within the same scene through shared 3D spatial information. We further address the formal distinctions between the two approaches by optimizing residual vectors for both implicit features and Gaussian positions to enhance the personalized capabilities of 3DGS. Experimental results on benchmark datasets show that NeRF-GS surpasses existing methods and achieves state-of-the-art performance. This outcome confirms that NeRF and 3DGS are complementary rather than competing, offering new insights into hybrid approaches that combine 3DGS and NeRF for efficient 3D scene representation.
GRJun 12, 2025
Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-OnZaiqiang Wu, Yechen Li, Jingyuan Liu et al.
Existing image-based virtual try-on methods are often limited to the front view and lack real-time performance. While per-garment virtual try-on methods have tackled these issues by capturing per-garment datasets and training per-garment neural networks, they still encounter practical limitations: (1) the robotic mannequin used to capture per-garment datasets is prohibitively expensive for widespread adoption and fails to accurately replicate natural human body deformation; (2) the synthesized garments often misalign with the human body. To address these challenges, we propose a low-barrier approach for collecting per-garment datasets using real human bodies, eliminating the necessity for a customized robotic mannequin. We also introduce a hybrid person representation that enhances the existing intermediate representation with a simplified DensePose map. This ensures accurate alignment of synthesized garment images with the human body and enables human-garment interaction without the need for customized wearable devices. We performed qualitative and quantitative evaluations against other state-of-the-art image-based virtual try-on methods and conducted ablation studies to demonstrate the superiority of our method regarding image quality and temporal consistency. Finally, our user study results indicated that most participants found our virtual try-on system helpful for making garment purchasing decisions.
68.5HCMar 12
Exploring the Role of User Comments Throughout the Stages of Video-Based Task-LearningNayoung Kim, Yotam Sechayk, Zhongyi Zhou et al.
Learning tasks through videos is a dynamic way to acquire skills by witnessing entire processes. However, compared to in-person demonstrations, videos may omit tacit knowledge, including subtle details and contextual nuances. Users' unique circumstances, like missing ingredients in a recipe, may also require adaptation beyond the video content. To fill these gaps, many users turn to the comment section, seeking additional guidance and interactions with creators or peers to personalize their experience. Despite their importance, there is limited understanding of how users engage with and apply comments in task-learning scenarios. In our study, we explore the role of comments in video-based task-learning through interviews with 14 users, and co-watching sessions with eight. Our findings show that while comments are critical for learning, they are poorly integrated into all stages of the learning process. Based on our findings, we outline design opportunities to better utilize comments in video-based task-learning.
GRJun 14, 2025
Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting GarmentsZaiqiang Wu, I-Chao Shen, Takeo Igarashi
Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address the first limitation, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. To address the second limitation, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.
GRSep 10, 2021
Per Garment Capture and Synthesis for Real-time Virtual Try-onToby Chong, I-Chao Shen, Nobuyuki Umetani et al.
Virtual try-on is a promising application of computer graphics and human computer interaction that can have a profound real-world impact especially during this pandemic. Existing image-based works try to synthesize a try-on image from a single image of a target garment, but it inherently limits the ability to react to possible interactions. It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand. In this paper, we propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images. Our workflow is composed of two parts: garment capturing and clothed person image synthesis. We designed an actuated mannequin and an efficient capturing process that collects the detailed deformations of the target garments under diverse body sizes and poses. Furthermore, we proposed to use a custom-designed measurement garment, and we captured paired images of the measurement garment and the target garments. We then learn a mapping between the measurement garment and the target garments using deep image-to-image translation. The customer can then try on the target garments interactively during online shopping.
HCJul 27, 2021
Guided Optimization for Image Processing PipelinesYuka Ikarashi, Jonathan Ragan-Kelley, Tsukasa Fukusato et al.
Writing high-performance image processing code is challenging and labor-intensive. The Halide programming language simplifies this task by decoupling high-level algorithms from "schedules" which optimize their implementation. However, even with this abstraction, it is still challenging for Halide programmers to understand complicated scheduling strategies and productively write valid, optimized schedules. To address this, we propose a programming support method called "guided optimization." Guided optimization provides programmers a set of valid optimization options and interactive feedback about their current choices, which enables them to comprehend and efficiently optimize image processing code without the time-consuming trial-and-error process of traditional text editors. We implemented a proof-of-concept system, Roly-poly, which integrates guided optimization, program visualization, and schedule cost estimation to support the comprehension and development of efficient Halide image processing code. We conducted a user study with novice Halide programmers and confirmed that Roly-poly and its guided optimization was informative, increased productivity, and resulted in higher-performing schedules in less time.
HCMay 27, 2021
A Survey on Interactive Reinforcement Learning: Design Principles and Open ChallengesChristian Arzate Cruz, Takeo Igarashi
Interactive reinforcement learning (RL) has been successfully used in various applications in different fields, which has also motivated HCI researchers to contribute in this area. In this paper, we survey interactive RL to empower human-computer interaction (HCI) researchers with the technical background in RL needed to design new interaction techniques and propose new applications. We elucidate the roles played by HCI researchers in interactive RL, identifying ideas and promising research directions. Furthermore, we propose generic design principles that will provide researchers with a guide to effectively implement interactive RL applications.
HCMay 27, 2021
MarioMix: Creating Aligned Playstyles for Bots with Interactive Reinforcement LearningChristian Arzate Cruz, Takeo Igarashi
In this paper, we propose a generic framework that enables game developers without knowledge of machine learning to create bot behaviors with playstyles that align with their preferences. Our framework is based on interactive reinforcement learning (RL), and we used it to create a behavior authoring tool called MarioMix. This tool enables non-experts to create bots with varied playstyles for the game titled Super Mario Bros. The main interaction procedure of MarioMix consists of presenting short clips of gameplay displaying precomputed bots with different playstyles to end-users. Then, end-users can select the bot with the playstyle that behaves as intended. We evaluated MarioMix by incorporating input from game designers working in the industry.
HCMay 27, 2021
Interactive Explanations: Diagnosis and Repair of Reinforcement Learning Based Agent BehaviorsChristian Arzate Cruz, Takeo Igarashi
Reinforcement learning techniques successfully generate convincing agent behaviors, but it is still difficult to tailor the behavior to align with a user's specific preferences. What is missing is a communication method for the system to explain the behavior and for the user to repair it. In this paper, we present a novel interaction method that uses interactive explanations using templates of natural language as a communication method. The main advantage of this interaction method is that it enables a two-way communication channel between users and the agent; the bot can explain its thinking procedure to the users, and the users can communicate their behavior preferences to the bot using the same interactive explanations. In this manner, the thinking procedure of the bot is transparent, and users can provide corrections to the bot that include a suggested action to take, a goal to achieve, and the reasons behind these decisions. We tested our proposed method in a clone of the video game named \textit{Super Mario Bros.}, and the results demonstrate that our interactive explanation approach is effective at diagnosing and repairing bot behaviors.
CVApr 9, 2021
Brain Surface Reconstruction from MRI Images Based on Segmentation Networks Applying Signed Distance MapsHeng Fang, Xi Yang, Taichi Kin et al.
Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull stripping methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an additional Laplacian loss to ensure that the prediction results retain shape information. We validated our newly proposed method by conducting experiments on our brain magnetic resonance imaging dataset (111 patients). The evaluation results demonstrate that our approach achieves comparable dice scores and also reduces the Hausdorff distance and average symmetric surface distance, thus producing more stable and smooth brain isosurfaces.
HCMar 8, 2021
Exploring a Makeup Support System for Transgender Passing based on Automatic Gender RecognitionToby Chong, Nolwenn Maudet, Katsuki Harima et al.
How to handle gender with machine learning is a controversial topic. A growing critical body of research brought attention to the numerous issues transgender communities face with the adoption of current automatic gender recognition (AGR) systems. In contrast, we explore how such technologies could potentially be appropriated to support transgender practices and needs, especially in non-Western contexts like Japan. We designed a virtual makeup probe to assist transgender individuals with passing, that is to be perceived as the gender they identify as. To understand how such an application might support expressing transgender individuals gender identity or not, we interviewed 15 individuals in Tokyo and found that in the right context and under strict conditions, AGR based systems could assist transgender passing.
SDOct 7, 2020
Generative Melody Composition with Human-in-the-Loop Bayesian OptimizationYijun Zhou, Yuki Koyama, Masataka Goto et al.
Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative approach; the system generates candidate melodies to evaluate, and the user evaluates them and provides preferential feedback (i.e., picking the best melody among the candidates) to the system. This process is iteratively performed based on BO techniques until the user finds the desired melody. We conducted a pilot study using our prototype system, suggesting the potential of this approach.
IVJun 29, 2020
A Two-step Surface-based 3D Deep Learning Pipeline for Segmentation of Intracranial AneurysmsXi Yang, Ding Xia, Taichi Kin et al.
The exact shape of intracranial aneurysms is critical in medical diagnosis and surgical planning. While voxel-based deep learning frameworks have been proposed for this segmentation task, their performance remains limited. In this study, we offer a two-step surface-based deep learning pipeline that achieves significantly higher performance. Our proposed model takes a surface model of entire principal brain arteries containing aneurysms as input and returns aneurysms surfaces as output. A user first generates a surface model by manually specifying multiple thresholds for time-of-flight magnetic resonance angiography images. The system then samples small surface fragments from the entire brain arteries and classifies the surface fragments according to whether aneurysms are present using a point-based deep learning network (PointNet++). Finally, the system applies surface segmentation (SO-Net) to surface fragments containing aneurysms. We conduct a direct comparison of segmentation performance by counting voxels between the proposed surface-based framework and the existing voxel-based method, in which our framework achieves a much higher dice similarity coefficient score (72%) than the prior approach (46%).
GRFeb 20, 2020
Computational Design with CrowdsYuki Koyama, Takeo Igarashi
Computational design is aimed at supporting or automating design processes using computational techniques. However, some classes of design tasks involve criteria that are difficult to handle only with computers. For example, visual design tasks seeking to fulfill aesthetic goals are difficult to handle purely with computers. One promising approach is to leverage human computation; that is, to incorporate human input into the computation process. Crowdsourcing platforms provide a convenient way to integrate such human computation into a working system. In this chapter, we discuss such computational design with crowds in the domain of parameter tweaking tasks in visual design. Parameter tweaking is often performed to maximize the aesthetic quality of designed objects. Computational design powered by crowds can solve this maximization problem by leveraging human computation. We discuss the opportunities and challenges of computational design with crowds with two illustrative examples: (1) estimating the objective function (specifically, preference learning from crowds' pairwise comparisons) to facilitate interactive design exploration by a designer and (2) directly searching for the optimal parameter setting that maximizes the objective function (specifically, crowds-in-the-loop Bayesian optimization).
GRJun 24, 2019
Interactive Optimization of Generative Image Modeling using Sequential Subspace Search and Content-based GuidanceToby Chong Long Hin, I-Chao Shen, Issei Sato et al.
Generative image modeling techniques such as GAN demonstrate highly convincing image generation result. However, user interaction is often necessary to obtain the desired results. Existing attempts add interactivity but require either tailored architectures or extra data. We present a human-in-the-optimization method that allows users to directly explore and search the latent vector space of generative image modeling. Our system provides multiple candidates by sampling the latent vector space, and the user selects the best blending weights within the subspace using multiple sliders. In addition, the user can express their intention through image editing tools. The system samples latent vectors based on inputs and presents new candidates to the user iteratively. An advantage of our formulation is that one can apply our method to arbitrary pre-trained model without developing specialized architecture or data. We demonstrate our method with various generative image modeling applications, and show superior performance in a comparative user study with prior art iGAN.
CVMay 2, 2019
Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation MappingXi Yang, Bojian Wu, Issei Sato et al.
Deep neural networks (DNNs) have a high accuracy on image classification tasks. However, DNNs trained by such dataset with co-occurrence bias may rely on wrong features while making decisions for classification. It will greatly affect the transferability of pre-trained DNNs. In this paper, we propose an interactive method to direct classifiers paying attentions to the regions that are manually specified by the users, in order to mitigate the influence of co-occurrence bias. We test on CelebA dataset, the pre-trained AlexNet is fine-tuned to focus on the specific facial attributes based on the results of Grad-CAM.
HCMar 16, 2016
TapDrag: An Alternative Dragging Technique on Medium-Sized MultiTouch Displays Reducing Skin Irritation and Arm FatigueLasse Farnung Laursen, Hsiang-Ting Chen, Paulo Silva et al.
Medium-sized touch displays, sized 30 to 50 inches, are becoming more affordable and more widely available. Prolonged use of such displays can result in arm fatigue or skin irritation, especially when multiple long distance drags are involved. To address this issue, we present TapDrag, an alternative dragging technique that complements traditional dragging with a simple tapping gesture on both ends of the intended dragging path. Our experimental evaluation suggests that TapDrag is a viable alternative to traditional dragging with faster task completion times for long distances. Qualitative user feedback indicates that TapDrag helps prevent skin irritation. A reduction in arm fatigue remains unconfirmed.