Brian Kenji Iwana

h-index18

32papers

1,100citations

Novelty42%

AI Score44

Ranked #50,038 of 194,257 authors (top 26%)#17,484 in CV (top 30%)

32 Papers

11.0CVJun 16, 2023Code

FETNet: Feature Erasing and Transferring Network for Scene Text Removal

Guangtao Lyu, Kun Liu, Anna Zhu et al.

The scene text removal (STR) task aims to remove text regions and recover the background smoothly in images for private information protection. Most existing STR methods adopt encoder-decoder-based CNNs, with direct copies of the features in the skip connections. However, the encoded features contain both text texture and structure information. The insufficient utilization of text features hampers the performance of background reconstruction in text removal regions. To tackle these problems, we propose a novel Feature Erasing and Transferring (FET) mechanism to reconfigure the encoded features for STR in this paper. In FET, a Feature Erasing Module (FEM) is designed to erase text features. An attention module is responsible for generating the feature similarity guidance. The Feature Transferring Module (FTM) is introduced to transfer the corresponding features in different layers based on the attention guidance. With this mechanism, a one-stage, end-to-end trainable network called FETNet is constructed for scene text removal. In addition, to facilitate research on both scene text removal and segmentation tasks, we introduce a novel dataset, Flickr-ST, with multi-category annotations. A sufficient number of experiments and ablation studies are conducted on the public datasets and Flickr-ST. Our proposed method achieves state-of-the-art performance using most metrics, with remarkably higher quality scene text removal results. The source code of our work is available at: \href{https://github.com/GuangtaoLyu/FETNet}{https://github.com/GuangtaoLyu/FETNet.

3.9CVSep 13, 2023Code

Deep Attentive Time Warping

Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan et al.

Similarity measures for time series are important problems for time series classification. To handle the nonlinear time distortions, Dynamic Time Warping (DTW) has been widely used. However, DTW is not learnable and suffers from a trade-off between robustness against time distortion and discriminative power. In this paper, we propose a neural network model for task-adaptive time warping. Specifically, we use the attention model, called the bipartite attention model, to develop an explicit time warping mechanism with greater distortion invariance. Unlike other learnable models using DTW for warping, our model predicts all local correspondences between two time series and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task. We also propose to induce pre-training of our model by DTW to improve the discriminative power. Extensive experiments demonstrate the superior effectiveness of our model over DTW and its state-of-the-art performance in online signature verification.

3.3GRApr 27, 2023

Contour Completion by Transformers and Its Application to Vector Font Data

Yusuke Nagata, Brian Kenji Iwana, Seiichi Uchida

In documents and graphics, contours are a popular format to describe specific shapes. For example, in the True Type Font (TTF) file format, contours describe vector outlines of typeface shapes. Each contour is often defined as a sequence of points. In this paper, we tackle the contour completion task. In this task, the input is a contour sequence with missing points, and the output is a generated completed contour. This task is more difficult than image completion because, for images, the missing pixels are indicated. Since there is no such indication in the contour completion task, we must solve the problem of missing part detection and completion simultaneously. We propose a Transformer-based method to solve this problem and show the results of the typeface contour completion.

1.8LGDec 13, 2022Code

On Mini-Batch Training with Varying Length Time Series

Brian Kenji Iwana

In real-world time series recognition applications, it is possible to have data with varying length patterns. However, when using artificial neural networks (ANN), it is standard practice to use fixed-sized mini-batches. To do this, time series data with varying lengths are typically normalized so that all the patterns are the same length. Normally, this is done using zero padding or truncation without much consideration. We propose a novel method of normalizing the lengths of the time series in a dataset by exploiting the dynamic matching ability of Dynamic Time Warping (DTW). In this way, the time series lengths in a dataset can be set to a fixed size while maintaining features typical to the dataset. In the experiments, all 11 datasets with varying length time series from the 2018 UCR Time Series Archive are used. We evaluate the proposed method by comparing it with 18 other length normalization methods on a Convolutional Neural Network (CNN), a Long-Short Term Memory network (LSTM), and a Bidirectional LSTM (BLSTM).

2.8CVApr 27, 2023Code

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers

Brian Kenji Iwana, Akihiro Kusuda

Transformers are popular neural network models that use layers of self-attention and fully-connected nodes with embedded tokens. Vision Transformers (ViT) adapt transformers for image recognition tasks. In order to do this, the images are split into patches and used as tokens. One issue with ViT is the lack of inductive bias toward image structures. Because ViT was adapted for image data from language modeling, the network does not explicitly handle issues such as local translations, pixel information, and information loss in the structures and features shared by multiple patches. Conversely, Convolutional Neural Networks (CNN) incorporate this information. Thus, in this paper, we propose the use of convolutional layers within ViT. Specifically, we propose a model called a Vision Conformer (ViC) which replaces the Multi-Layer Perceptron (MLP) in a ViT layer with a CNN. In addition, to use the CNN, we proposed to reconstruct the image data after the self-attention in a reverse embedding layer. Through the evaluation, we demonstrate that the proposed convolutions help improve the classification ability of ViT.

2.6LGSep 30, 2024Code

Model Selection with a Shapelet-based Distance Measure for Multi-source Transfer Learning in Time Series Classification

Jiseok Lee, Brian Kenji Iwana

Transfer learning is a common practice that alleviates the need for extensive data to train neural networks. It is performed by pre-training a model using a source dataset and fine-tuning it for a target task. However, not every source dataset is appropriate for each target dataset, especially for time series. In this paper, we propose a novel method of selecting and using multiple datasets for transfer learning for time series classification. Specifically, our method combines multiple datasets as one source dataset for pre-training neural networks. Furthermore, for selecting multiple sources, our method measures the transferability of datasets based on shapelet discovery for effective source selection. While traditional transferability measures require considerable time for pre-training all the possible sources for source selection of each possible architecture, our method can be repeatedly used for every possible architecture with a single simple computation. Using the proposed method, we demonstrate that it is possible to increase the performance of temporal convolutional neural networks (CNN) on time series datasets.

6.8CVSep 2, 2023Code

Few shot font generation via transferring similarity guided global style and quantization local style

Wei Pan, Anna Zhu, Xinyu Zhou et al.

Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based approaches is that they usually require special pre-defined glyph components, e.g., strokes and radicals, which is infeasible for AFFG of different languages. In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. We calculate the similarity scores of the target character and the referenced samples by measuring the distance along the corresponding channels from the content features, and assigning them as the weights for aggregating the global style features. To better capture the local styles, a cross-attention-based style transfer module is adopted to transfer the styles of reference glyphs to the components, where the components are self-learned discrete latent codes through vector quantization without manual definition. With these designs, our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics. The experimental results reflect the effectiveness and generalization of the proposed method on different linguistic scripts, and also show its superiority when compared with other state-of-the-art methods. The source code can be found at https://github.com/awei669/VQ-Font.

11.5LGApr 19, 2020Code

Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher

Brian Kenji Iwana, Seiichi Uchida

Neural networks have become a powerful tool in pattern recognition and part of their success is due to generalization from using large datasets. However, unlike other domains, time series classification datasets are often small. In order to address this problem, we propose a novel time series data augmentation called guided warping. While many data augmentation methods are based on random transformations, guided warping exploits the element alignment properties of Dynamic Time Warping (DTW) and shapeDTW, a high-level DTW method based on shape descriptors, to deterministically warp sample patterns. In this way, the time series are mixed by warping the features of a sample pattern to match the time steps of a reference pattern. Furthermore, we introduce a discriminative teacher in order to serve as a directed reference for the guided warping. We evaluate the method on all 85 datasets in the 2015 UCR Time Series Archive with a deep convolutional neural network (CNN) and a recurrent neural network (RNN). The code with an easy to use implementation can be found at https://github.com/uchidalab/time_series_augmentation .

3.7CVFeb 26, 2024

What Text Design Characterizes Book Genres?

Daichi Haraguchi, Brian Kenji Iwana, Seiichi Uchida

This study analyzes the relationship between non-verbal information (e.g., genres) and text design (e.g., font style, character color, etc.) through the classification of book genres using text design on book covers. Text images have both semantic information about the word itself and other information (non-semantic information or visual design), such as font style, character color, etc. When we read a word printed on some materials, we receive impressions or other information from both the word itself and the visual design. Basically, we can understand verbal information only from semantic information, i.e., the words themselves; however, we can consider that text design is helpful for understanding other additional information (i.e., non-verbal information), such as impressions, genre, etc. To investigate the effect of text design, we analyze text design using words printed on book covers and their genres in two scenarios. First, we attempted to understand the importance of visual design for determining the genre (i.e., non-verbal information) of books by analyzing the differences in the relationship between semantic information/visual design and genres. In the experiment, we found that semantic information is sufficient to determine the genre; however, text design is helpful in adding more discriminative features for book genres. Second, we investigated the effect of each text design on book genres. As a result, we found that each text design characterizes some book genres. For example, font style is useful to add more discriminative features for genres of ``Mystery, Thriller \& Suspense'' and ``Christian books \& Bibles.''

3.6CVSep 18, 2025

Domain Adaptation for Ulcerative Colitis Severity Estimation Using Patient-Level Diagnoses

Takamasa Yamaguchi, Brian Kenji Iwana, Ryoma Bise et al.

The development of methods to estimate the severity of Ulcerative Colitis (UC) is of significant importance. However, these methods often suffer from domain shifts caused by differences in imaging devices and clinical settings across hospitals. Although several domain adaptation methods have been proposed to address domain shift, they still struggle with the lack of supervision in the target domain or the high cost of annotation. To overcome these challenges, we propose a novel Weakly Supervised Domain Adaptation method that leverages patient-level diagnostic results, which are routinely recorded in UC diagnosis, as weak supervision in the target domain. The proposed method aligns class-wise distributions across domains using Shared Aggregation Tokens and a Max-Severity Triplet Loss, which leverages the characteristic that patient-level diagnoses are determined by the most severe region within each patient. Experimental results demonstrate that our method outperforms comparative DA approaches, improving UC severity estimation in a domain-shifted setting.

4.1LGJun 28, 2025

Towards Time Series Generation Conditioned on Unstructured Natural Language

Jaeyun Woo, Jiseok Lee, Brian Kenji Iwana

Generative Artificial Intelligence (AI) has rapidly become a powerful tool, capable of generating various types of data, such as images and text. However, despite the significant advancement of generative AI, time series generative AI remains underdeveloped, even though the application of time series is essential in finance, climate, and numerous fields. In this research, we propose a novel method of generating time series conditioned on unstructured natural language descriptions. We use a diffusion model combined with a language model to generate time series from the text. Through the proposed method, we demonstrate that time series generation based on natural language is possible. The proposed method can provide various applications such as custom forecasting, time series manipulation, data augmentation, and transfer learning. Furthermore, we construct and propose a new public dataset for time series generation, consisting of 63,010 time series-description pairs.

1.6LGNov 5, 2021Code

Dynamic Data Augmentation with Gating Networks for Time Series Recognition

Daisuke Oba, Shinnosuke Matsuo, Brian Kenji Iwana

Data augmentation is a technique to improve the generalization ability of machine learning methods by increasing the size of the dataset. However, since every augmentation method is not equally effective for every dataset, you need to select an appropriate method carefully. We propose a neural network that dynamically selects the best combination of data augmentation methods using a mutually beneficial gating network and a feature consistency loss. The gating network is able to control how much of each data augmentation is used for the representation within the network. The feature consistency loss gives a constraint that augmented features from the same input should be in similar. In experiments, we demonstrate the effectiveness of the proposed method on the 12 largest time-series datasets from 2018 UCR Time Series Archive and reveal the relationships between the data augmentation methods through analysis of the proposed method.

2.6CVJun 29, 2021

Using Robust Regression to Find Font Usage Trends

Kaigen Tsuji, Seiichi Uchida, Brian Kenji Iwana

Fonts have had trends throughout their history, not only in when they were invented but also in their usage and popularity. In this paper, we attempt to specifically find the trends in font usage using robust regression on a large collection of text images. We utilize movie posters as the source of fonts for this task because movie posters can represent time periods by using their release date. In addition, movie posters are documents that are carefully designed and represent a wide range of fonts. To understand the relationship between the fonts of movie posters and time, we use a regression Convolutional Neural Network (CNN) to estimate the release year of a movie using an isolated title text image. Due to the difficulty of the task, we propose to use of a hybrid training regimen that uses a combination of Mean Squared Error (MSE) and Tukey's biweight loss. Furthermore, we perform a thorough analysis on the trends of fonts through time.

2.6CVMay 24, 2021Code

Towards Book Cover Design via Layout Graphs

Wensheng Zhang, Yan Zheng, Taiga Miyazono et al.

Book covers are intentionally designed and provide an introduction to a book. However, they typically require professional skills to design and produce the cover images. Thus, we propose a generative neural network that can produce book covers based on an easy-to-use layout graph. The layout graph contains objects such as text, natural scene objects, and solid color spaces. This layout graph is embedded using a graph convolutional neural network and then used with a mask proposal generator and a bounding-box generator and filled using an object proposal generator. Next, the objects are compiled into a single image and the entire network is trained using a combination of adversarial training, perceptual training, and reconstruction. Finally, a Style Retention Network (SRNet) is used to transfer the learned font style onto the desired text. Using the proposed method allows for easily controlled and unique book covers.

4.7CVMay 19, 2021Code

Font Style that Fits an Image -- Font Generation Based on Image Context

Taiga Miyazono, Brian Kenji Iwana, Daichi Haraguchi et al.

When fonts are used on documents, they are intentionally selected by designers. For example, when designing a book cover, the typography of the text is an important factor in the overall feel of the book. In addition, it needs to be an appropriate font for the rest of the book cover. Thus, we propose a method of generating a book title image based on its context within a book cover. We propose an end-to-end neural network that inputs the book cover, a target location mask, and a desired book title and outputs stylized text suitable for the cover. The proposed network uses a combination of a multi-input encoder-decoder, a text skeleton prediction network, a perception network, and an adversarial discriminator. We demonstrate that the proposed method can effectively produce desirable and appropriate book cover text through quantitative and qualitative results.

3.7CVMar 28, 2021

Attention to Warp: Deep Metric Learning for Multivariate Time Series

Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan et al.

Deep time series metric learning is challenging due to the difficult trade-off between temporal invariance to nonlinear distortion and discriminative power in identifying non-matching sequences. This paper proposes a novel neural network-based approach for robust yet discriminative time series classification and verification. This approach adapts a parameterized attention model to time warping for greater and more adaptive temporal invariance. It is robust against not only local but also large global distortions, so that even matching pairs that do not satisfy the monotonicity, continuity, and boundary conditions can still be successfully identified. Learning of this model is further guided by dynamic time warping to impose temporal constraints for stabilized training and higher discriminative power. It can learn to augment the inter-class variation through warping, so that similar but different classes can be effectively distinguished. We experimentally demonstrate the superiority of the proposed approach over previous non-parametric and deep models by combining it with a deep online signature verification framework, after confirming its promising behavior in single-letter handwriting classification on the Unipen dataset.

2.6CVMar 8, 2021

Self-Augmented Multi-Modal Feature Embedding

Shinnosuke Matsuo, Seiichi Uchida, Brian Kenji Iwana

Oftentimes, patterns can be represented through different modalities. For example, leaf data can be in the form of images or contours. Handwritten characters can also be either online or offline. To exploit this fact, we propose the use of self-augmentation and combine it with multi-modal feature embedding. In order to take advantage of the complementary information from the different modalities, the self-augmented multi-modal feature embedding employs a shared feature space. Through experimental results on classification with online handwriting and leaf images, we demonstrate that the proposed method can create effective embeddings.

1.2CVSep 23, 2020

What is the Reward for Handwriting? -- Handwriting Generation by Imitation Learning

Keisuke Kanda, Brian Kenji Iwana, Seiichi Uchida

Analyzing the handwriting generation process is an important issue and has been tackled by various generation models, such as kinematics based models and stochastic models. In this study, we use a reinforcement learning (RL) framework to realize handwriting generation with the careful future planning ability. In fact, the handwriting process of human beings is also supported by their future planning ability; for example, the ability is necessary to generate a closed trajectory like '0' because any shortsighted model, such as a Markovian model, cannot generate it. For the algorithm, we employ generative adversarial imitation learning (GAIL). Typical RL algorithms require the manual definition of the reward function, which is very crucial to control the generation process. In contrast, GAIL trains the reward function along with the other modules of the framework. In other words, through GAIL, we can understand the reward of the handwriting generation process from handwriting examples. Our experimental results qualitatively and quantitatively show that the learned reward catches the trends in handwriting generation and thus GAIL is well suited for the acquisition of handwriting behavior.

29.0LGJul 31, 2020Code

An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks

Brian Kenji Iwana, Seiichi Uchida

In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications.

10.1CVJul 16, 2020

Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology

Hiroki Tokunaga, Brian Kenji Iwana, Yuki Teramoto et al.

We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of "cell detection" (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining. First, we train a co-detection CNN that detects cells in successive frames by using weak-labels. Our key assumption is that the co-detection CNN implicitly learns association in addition to detection. To obtain the association information, we propose a backward-and-forward propagation method that analyzes the correspondence of cell positions in the detection maps output of the co-detection CNN. Experiments demonstrated that the proposed method can match positions by analyzing the co-detection CNN. Even though the method uses only weak supervision, the performance of our method was almost the same as the state-of-the-art supervised method.

1.2CVJun 22, 2020

On the Ability of a CNN to Realize Image-to-Image Language Conversion

Kohei Baba, Seiichi Uchida, Brian Kenji Iwana

The purpose of this paper is to reveal the ability that Convolutional Neural Networks (CNN) have on the novel task of image-to-image language conversion. We propose a new network to tackle this task by converting images of Korean Hangul characters directly into images of the phonetic Latin character equivalent. The conversion rules between Hangul and the phonetic symbols are not explicitly provided. The results of the proposed network show that it is possible to perform image-to-image language conversion. Moreover, it shows that it can grasp the structural features of Hangul even from limited learning data. In addition, it introduces a new network to use when the input and output have significantly different features.

5.0CVApr 18, 2020

Effect of Text Color on Word Embeddings

Masaya Ikoma, Brian Kenji Iwana, Seiichi Uchida

In natural scenes and documents, we can find the correlation between a text and its color. For instance, the word, "hot", is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.

4.2CVJan 24, 2020

Character-independent font identification

Daichi Haraguchi, Shota Harada, Brian Kenji Iwana et al.

There are a countless number of fonts with various shapes and styles. In addition, there are many fonts that only have subtle differences in features. Due to this, font identification is a difficult task. In this paper, we propose a method of determining if any two characters are from the same font or not. This is difficult due to the difference between fonts typically being smaller than the difference between alphabet classes. Additionally, the proposed method can be used with fonts regardless of whether they exist in the training or not. In order to accomplish this, we use a Convolutional Neural Network (CNN) trained with various font image pairs. In the experiment, the network is trained on image pairs of various fonts. We then evaluate the model on a different set of fonts that are unseen by the network. The evaluation is performed with an accuracy of 92.27%. Moreover, we analyzed the relationship between character classes and font identification accuracy.

1.2CVJan 21, 2020

Neural Style Difference Transfer and Its Application to Font Generation

Gantugs Atarsaikhan, Brian Kenji Iwana, Seiichi Uchida

Designing fonts requires a great deal of time and effort. It requires professional skills, such as sketching, vectorizing, and image editing. Additionally, each letter has to be designed individually. In this paper, we will introduce a method to create fonts automatically. In our proposed method, the difference of font styles between two different fonts is found and transferred to another font using neural style transfer. Neural style transfer is a method of stylizing the contents of an image with the styles of another image. We proposed a novel neural style difference and content difference loss for the neural style transfer. With these losses, new fonts can be generated by adding or removing font styles from a font. We provided experimental results with various combinations of input fonts and discussed limitations and future development for the proposed method.

17.7CVAug 6, 2019Code

Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation

Brian Kenji Iwana, Ryohei Kuroki, Seiichi Uchida

Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object's classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs.

6.5CVJun 14, 2019

Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders

Taichi Sumi, Brian Kenji Iwana, Hideaki Hayashi et al.

This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis.

9.1LGMay 26, 2019Code

ProbAct: A Probabilistic Activation Function for Deep Neural Networks

Kumar Shridhar, Joonho Lee, Hideaki Hayashi et al.

Activation functions play an important role in training artificial neural networks. The majority of currently used activation functions are deterministic in nature, with their fixed input-output relationship. In this work, we propose a novel probabilistic activation function, called ProbAct. ProbAct is decomposed into a mean and variance and the output value is sampled from the formed distribution, making ProbAct a stochastic activation function. The values of mean and variances can be fixed using known functions or trained for each element. In the trainable ProbAct, the mean and the variance of the activation distribution is trained within the back-propagation framework alongside other parameters. We show that the stochastic perturbation induced through ProbAct acts as a viable generalization technique for feature augmentation. In our experiments, we compare ProbAct with well-known activation functions on classification tasks on different modalities: Images(CIFAR-10, CIFAR-100, and STL-10) and Text (Large Movie Review). We show that ProbAct increases the classification accuracy by +2-3% compared to ReLU or other conventional activation functions on both original datasets and when datasets are reduced to 50% and 25% of the original size. Finally, we show that ProbAct learns an ensemble of models by itself that can be used to estimate the uncertainties associated with the prediction and provides robustness to noisy inputs.

5.8CVAug 25, 2018

How do Convolutional Neural Networks Learn Design?

Shailza Jolly, Brian Kenji Iwana, Ryohei Kuroki et al.

In this paper, we aim to understand the design principles in book cover images which are carefully crafted by experts. Book covers are designed in a unique way, specific to genres which convey important information to their readers. By using Convolutional Neural Networks (CNN) to predict book genres from cover images, visual cues which distinguish genres can be highlighted and analyzed. In order to understand these visual clues contributing towards the decision of a genre, we present the application of Layer-wise Relevance Propagation (LRP) on the book cover image classification results. We use LRP to explain the pixel-wise contributions of book cover design and highlight the design elements contributing towards particular genres. In addition, with the use of state-of-the-art object and text detection methods, insights about genre-specific book cover designs are discovered.

1.7CVMar 2, 2018

Constrained Neural Style Transfer for Decorated Logo Generation

Gantugs Atarsaikhan, Brian Kenji Iwana, Seiichi Uchida

Making decorated logos requires image editing skills, without sufficient skills, it could be a time-consuming task. While there are many on-line web services to make new logos, they have limited designs and duplicates can be made. We propose using neural style transfer with clip art and text for the creation of new and genuine logos. We introduce a new loss function based on distance transform of the input image, which allows the preservation of the silhouettes of text and objects. The proposed method constrains style transfer only around the designated area. We demonstrate the characteristics of proposed method. Finally, we show the results of logo generation with various input images.

3.8CVDec 18, 2017

Dynamic Weight Alignment for Temporal Convolutional Neural Networks

Brian Kenji Iwana, Seiichi Uchida

In this paper, we propose a method of improving temporal Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNN convolutions linearly match the shared weights to a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to the input of the convolutional layer. Specifically, the dynamic alignment overcomes issues such as temporal distortion by finding the minimal distance matching of the weights and the inputs under constraints. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.

3.8CVDec 25, 2016

Globally Optimal Object Tracking with Fully Convolutional Networks

Jinho Lee, Brian Kenji Iwana, Shouta Ide et al.

Tracking is one of the most important but still difficult tasks in computer vision and pattern recognition. The main difficulties in the tracking field are appearance variation and occlusion. Most traditional tracking methods set the parameters or templates to track target objects in advance and should be modified accordingly. Thus, we propose a new and robust tracking method using a Fully Convolutional Network (FCN) to obtain an object probability map and Dynamic Programming (DP) to seek the globally optimal path through all frames of video. Our proposed method solves the object appearance variation problem with the use of a FCN and deals with occlusion by DP. We show that our method is effective in tracking various single objects through video frames.

15.4CVOct 28, 2016Code

Judging a Book By its Cover

Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed et al.

Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.