Koichi Kise

h-index25

12papers

310citations

Novelty42%

AI Score40

Ranked #73,599 of 194,257 authors (top 38%)#24,995 in CV (top 42%)

12 Papers

7.7HCMar 12

Prediction of Grade, Gender, and Academic Performance of Children and Teenagers from Handwriting Using the Sigma-Lognormal Model

Adrian Iste, Kazuki Nishizawa, Chisa Tanaka et al.

Digital handwriting acquisition enables the capture of detailed temporal and kinematic signals reflecting the motor processes underlying writing behavior. While handwriting analysis has been extensively explored in clinical or adult populations, its potential for studying developmental and educational characteristics in children remains less investigated. In this work, we examine whether handwriting dynamics encode information related to student characteristics using a large-scale online dataset collected from Japanese students from elementary school to junior high school. We systematically compare three families of handwriting-derived features: basic statistical descriptors of kinematic signals, entropy-based measures of variability, and parameters obtained from the sigma-lognormal model. Although the dataset contains dense stroke-level recordings, features are aggregated at the student level to enable a controlled comparison between representations. These features are evaluated across three prediction tasks: grade prediction, gender classification, and academic performance classification, using Linear or Logistic Regression and Random Forest models under consistent experimental settings. The results show that handwriting dynamics contain measurable signals related to developmental stage and individual differences, especially for the grade prediction task. These findings highlight the potential of kinematic handwriting analysis and confirm that through their development, children's handwriting evolves toward a lognormal motor organization.

3.0CVMay 4, 2016Code

A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents

Sheraz Ahmed, Muhammad Imran Malik, Muhammad Zeshan Afzal et al.

The contribution of this paper is fourfold. The first contribution is a novel, generic method for automatic ground truth generation of camera-captured document images (books, magazines, articles, invoices, etc.). It enables us to build large-scale (i.e., millions of images) labeled camera-captured/scanned documents datasets, without any human intervention. The method is generic, language independent and can be used for generation of labeled documents datasets (both scanned and cameracaptured) in any cursive and non-cursive language, e.g., English, Russian, Arabic, Urdu, etc. To assess the effectiveness of the presented method, two different datasets in English and Russian are generated using the presented method. Evaluation of samples from the two datasets shows that 99:98% of the images were correctly labeled. The second contribution is a large dataset (called C3Wi) of camera-captured characters and words images, comprising 1 million word images (10 million character images), captured in a real camera-based acquisition. This dataset can be used for training as well as testing of character recognition systems on camera-captured documents. The third contribution is a novel method for the recognition of cameracaptured document images. The proposed method is based on Long Short-Term Memory and outperforms the state-of-the-art methods for camera based OCRs. As a fourth contribution, various benchmark tests are performed to uncover the behavior of commercial (ABBYY), open source (Tesseract), and the presented camera-based OCR using the presented C3Wi dataset. Evaluation results reveal that the existing OCRs, which already get very high accuracies on scanned documents, have limited performance on camera-captured document images; where ABBYY has an accuracy of 75%, Tesseract an accuracy of 50.22%, while the presented character recognition system has an accuracy of 95.10%.

6.9IRFeb 12, 2024

Had enough of experts? Quantitative knowledge retrieval from large language models

David Selby, Kai Spriestersbach, Yuichiro Iwashita et al.

Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. Here we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid two data analysis tasks: elicitation of prior distributions for Bayesian models and imputation of missing data. We introduce a framework that leverages LLMs to enhance Bayesian workflows by eliciting expert-like prior knowledge and imputing missing data. Tested on diverse datasets, this approach can improve predictive accuracy and reduce data requirements, offering significant potential in healthcare, environmental science and engineering applications. We discuss the implications and challenges of treating LLMs as 'experts'.

6.4HCFeb 15, 2021

Confidence-Aware Learning Assistant

Shoya Ishimaru, Takanori Maruichi, Andreas Dengel et al.

Not only correctness but also self-confidence play an important role in improving the quality of knowledge. Undesirable situations such as confident incorrect and unconfident correct knowledge prevent learners from revising their knowledge because it is not always easy for them to perceive the situations. To solve this problem, we propose a system that estimates self-confidence while solving multiple-choice questions by eye tracking and gives feedback about which question should be reviewed carefully. We report the results of three studies measuring its effectiveness. (1) On a well-controlled dataset with 10 participants, our approach detected confidence and unconfidence with 81% and 79% average precision. (2) With the help of 20 participants, we observed that correct answer rates of questions were increased by 14% and 17% by giving feedback about correct answers without confidence and incorrect answers with confidence, respectively. (3) We conducted a large-scale data recording in a private school (72 high school students solved 14,302 questions) to investigate effective features and the number of required training samples.

5.8HCDec 7, 2020

Self-supervised Deep Learning for Reading Activity Classification

Md. Rabiul Islam, Shuji Sakamoto, Yoshihiro Yamada et al.

Reading analysis can give important information about a user's confidence and habits and can be used to construct feedback to improve a user's reading behavior. A lack of labeled data inhibits the effective application of fully-supervised Deep Learning (DL) for automatic reading analysis. In this paper, we propose a self-supervised DL method for reading analysis and evaluate it on two classification tasks. We first evaluate the proposed self-supervised DL method on a four-class classification task on reading detection using electrooculography (EOG) glasses datasets, followed by an evaluation of a two-class classification task of confidence estimation on answers of multiple-choice questions (MCQs) using eye-tracking datasets. Fully-supervised DL and support vector machines (SVMs) are used to compare the performance of the proposed self-supervised DL method. The results show that the proposed self-supervised DL method is superior to the fully-supervised DL and SVM for both tasks, especially when training data is scarce. This result indicates that the proposed self-supervised DL method is the superior choice for reading analysis tasks. The results of this study are important for informing the design and implementation of automatic reading analysis platforms.

2.3CVAug 28, 2020

Distortion-Adaptive Grape Bunch Counting for Omnidirectional Images

Ryota Akai, Yuzuko Utsumi, Yuka Miwa et al.

This paper proposes the first object counting method for omnidirectional images. Because conventional object counting methods cannot handle the distortion of omnidirectional images, we propose to process them using stereographic projection, which enables conventional methods to obtain a good approximation of the density function. However, the images obtained by stereographic projection are still distorted. Hence, to manage this distortion, we propose two methods. One is a new data augmentation method designed for the stereographic projection of omnidirectional images. The other is a distortion-adaptive Gaussian kernel that generates a density map ground truth while taking into account the distortion of stereographic projection. Using the counting of grape bunches as a case study, we constructed an original grape-bunch image dataset consisting of omnidirectional images and conducted experiments to evaluate the proposed method. The results show that the proposed method performs better than a direct application of the conventional method, improving mean absolute error by 14.7% and mean squared error by 10.5%.

3.9CVNov 8, 2018

Facial Landmark Detection for Manga Images

Marco Stricker, Olivier Augereau, Koichi Kise et al.

The topic of facial landmark detection has been widely covered for pictures of human faces, but it is still a challenge for drawings. Indeed, the proportions and symmetry of standard human faces are not always used for comics or mangas. The personal style of the author, the limitation of colors, etc. makes the landmark detection on faces in drawings a difficult task. Detecting the landmarks on manga images will be useful to provide new services for easily editing the character faces, estimating the character emotions, or generating automatically some animations such as lip or eye movements. This paper contains two main contributions: 1) a new landmark annotation model for manga faces, and 2) a deep learning approach to detect these landmarks. We use the "Deep Alignment Network", a multi stage architecture where the first stage makes an initial estimation which gets refined in further stages. The first results show that the proposed method succeed to accurately find the landmarks in more than 80% of the cases.

9.7MMApr 16, 2018

A survey of comics research in computer science

Olivier Augereau, Motoi Iwata, Koichi Kise

Graphical novels such as comics and mangas are well known all over the world. The digital transition started to change the way people are reading comics, more and more on smartphones and tablets and less and less on paper. In the recent years, a wide variety of research about comics has been proposed and might change the way comics are created, distributed and read in future years. Early work focuses on low level document image analysis: indeed comic books are complex, they contains text, drawings, balloon, panels, onomatopoeia, etc. Different fields of computer science covered research about user interaction and content generation such as multimedia, artificial intelligence, human-computer interaction, etc. with different sets of values. We propose in this paper to review the previous research about comics in computer science, to state what have been done and to give some insights about the main outlooks.

30.9CVFeb 7, 2018Code

ShakeDrop Regularization for Deep Residual Learning

Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba et al.

Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.

4.4CVJan 20, 2017

Automatic Generation of Typographic Font from a Small Font Subset

Tomo Miyazaki, Tatsunori Tsuchiya, Yoshihiro Sugaya et al.

This paper addresses the automatic generation of a typographic font from a subset of characters. Specifically, we use a subset of a typographic font to extrapolate additional characters. Consequently, we obtain a complete font containing a number of characters sufficient for daily use. The automated generation of Japanese fonts is in high demand because a Japanese font requires over 1,000 characters. Unfortunately, professional typographers create most fonts, resulting in significant financial and time investments for font generation. The proposed method can be a great aid for font creation because designers do not need to create the majority of the characters for a new font. The proposed method uses strokes from given samples for font generation. The strokes, from which we construct characters, are extracted by exploiting a character skeleton dataset. This study makes three main contributions: a novel method of extracting strokes from characters, which is applicable to both standard fonts and their variations; a fully automated approach for constructing characters; and a selection method for sample characters. We demonstrate our proposed method by generating 2,965 characters in 47 fonts. Objective and subjective evaluations verify that the generated characters are similar to handmade characters.

9.4CVDec 5, 2016

Deep Pyramidal Residual Networks with Separated Stochastic Depth

Yoshihiro Yamada, Masakazu Iwamura, Koichi Kise

On general object recognition, Deep Convolutional Neural Networks (DCNNs) achieve high accuracy. In particular, ResNet and its improvements have broken the lowest error rate records. In this paper, we propose a method to successfully combine two ResNet improvements, ResDrop and PyramidNet. We confirmed that the proposed network outperformed the conventional methods; on CIFAR-100, the proposed network achieved an error rate of 16.18% in contrast to PiramidNet achieving that of 18.29% and ResNeXt 17.31%.

1.1CVMar 29, 2016

Scalable Solution for Approximate Nearest Subspace Search

Masakazu Iwamura, Masataka Konishi, Koichi Kise

Finding the nearest subspace is a fundamental problem and influential to many applications. In particular, a scalable solution that is fast and accurate for a large problem has a great impact. The existing methods for the problem are, however, useless in a large-scale problem with a large number of subspaces and high dimensionality of the feature space. A cause is that they are designed based on the traditional idea to represent a subspace by a single point. In this paper, we propose a scalable solution for the approximate nearest subspace search (ANSS) problem. Intuitively, the proposed method represents a subspace by multiple points unlike the existing methods. This makes a large-scale ANSS problem tractable. In the experiment with 3036 subspaces in the 1024-dimensional space, we confirmed that the proposed method was 7.3 times faster than the previous state-of-the-art without loss of accuracy.