AIJul 7, 2022
Word Embedding for Social Sciences: An Interdisciplinary SurveyAkira Matsui, Emilio Ferrara
To extract essential information from complex data, computer scientists have been developing machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer scientists but also social scientists have benefited and advanced their research because human behavior or social phenomena lies in complex data. However, this emerging trend is not well documented because different social science fields rarely cover each other's work, resulting in fragmented knowledge in the literature. To document this emerging trend, we survey recent studies that apply word embedding techniques to human behavior mining. We built a taxonomy to illustrate the methods and procedures used in the surveyed papers, aiding social science researchers in contextualizing their research within the literature on word embedding applications. This survey also conducts a simple experiment to warn that common similarity measurements used in the literature could yield different results even if they return consistent results at an aggregate level.
HCJun 20, 2022
Characterizing Human Actions in the Digital Platform by Temporal ContextAkira Matsui, Emilio Ferrara
Recent advances in digital platforms generate rich, high-dimensional logs of human behavior, and machine learning models have helped social scientists explain knowledge accumulation, communication, and information diffusion. Such models, however, almost always treat behavior as sequences of actions, abstracting the inter-temporal information among actions. To close this gap, we introduce a two-scale Action-Timing Context(ATC) framework that jointly embeds each action and its time interval. ATC obtains low-dimensional representations of actions and characterizes them with inter-temporal information. We provide three applications of ATC to real-world datasets and demonstrate that the method offers a unified view of human behavior. The presented qualitative findings demonstrate that explicitly modeling inter-temporal context is essential for a comprehensive, interpretable understanding of human activity on digital platforms.
CYDec 14, 2024
Hybrid Forecasting of Geopolitical EventsDaniel M. Benjamin, Fred Morstatter, Ali E. Abbas et al. · stanford
Sound decision-making relies on accurate prediction for tangible outcomes ranging from military conflict to disease outbreaks. To improve crowdsourced forecasting accuracy, we developed SAGE, a hybrid forecasting system that combines human and machine generated forecasts. The system provides a platform where users can interact with machine models and thus anchor their judgments on an objective benchmark. The system also aggregates human and machine forecasts weighting both for propinquity and based on assessed skill while adjusting for overconfidence. We present results from the Hybrid Forecasting Competition (HFC) - larger than comparable forecasting tournaments - including 1085 users forecasting 398 real-world forecasting problems over eight months. Our main result is that the hybrid system generated more accurate forecasts compared to a human-only baseline which had no machine generated predictions. We found that skilled forecasters who had access to machine-generated forecasts outperformed those who only viewed historical data. We also demonstrated the inclusion of machine-generated forecasts in our aggregation algorithms improved performance, both in terms of accuracy and scalability. This suggests that hybrid forecasting systems, which potentially require fewer human resources, can be a viable approach for maintaining a competitive level of accuracy over a larger number of forecasting questions.
AISep 16, 2025
Data-driven Methods of Extracting Text Structure and Information TransferShinichi Honna, Taichi Murayama, Akira Matsui
The Anna Karenina Principle (AKP) holds that success requires satisfying a small set of essential conditions, whereas failure takes diverse forms. We test AKP, its reverse, and two further patterns described as ordered and noisy across novels, online encyclopedias, research papers, and movies. Texts are represented as sequences of functional blocks, and convergence is assessed in transition order and position. Results show that structural principles vary by medium: novels follow reverse AKP in order, Wikipedia combines AKP with ordered patterns, academic papers display reverse AKP in order but remain noisy in position, and movies diverge by genre. Success therefore depends on structural constraints that are specific to each medium, while failure assumes different shapes across domains.
CVMay 13, 2025
Disruptive Transformation of Artworks in Master-Disciple Relationships: The Case of Ukiyo-e ArtworksHonna Shinichi, Akira Matsui
Artwork research has long relied on human sensibility and subjective judgment, but recent developments in machine learning have enabled the quantitative assessment of features that humans could not discover. In Western paintings, comprehensive analyses have been conducted from various perspectives in conjunction with large databases, but such extensive analysis has not been sufficiently conducted for Eastern paintings. Then, we focus on Ukiyo-e, a traditional Japanese art form, as a case study of Eastern paintings, and conduct a quantitative analysis of creativity in works of art using 11,000 high-resolution images. This involves using the concept of calculating creativity from networks to analyze both the creativity of the artwork and that of the artists. As a result, In terms of Ukiyo-e as a whole, it was found that the creativity of its appearance has declined with the maturation of culture, but in terms of style, it has become more segmented with the maturation of culture and has maintained a high level of creativity. This not only provides new insights into the study of Ukiyo-e but also shows how Ukiyo-e has evolved within the ongoing cultural history, playing a culturally significant role in the analysis of Eastern art.
IRFeb 23, 2022
A Real-World Implementation of Unbiased Lift-based Bidding SystemDaisuke Moriwaki, Yuta Hayakawa, Akira Matsui et al.
In display ad auctions of Real-Time Bid-ding (RTB), a typical Demand-Side Platform (DSP)bids based on the predicted probability of click and conversion right after an ad impression. Recent studies find such a strategy is suboptimal and propose a better bidding strategy named lift-based bidding.Lift-based bidding simply bids the price according to the lift effect of the ad impression and achieves maximization of target metrics such as sales. Despiteits superiority, lift-based bidding has not yet been widely accepted in the advertising industry. For one reason, lift-based bidding is less profitable for DSP providers under the current billing rule. Second, thepractical usefulness of lift-based bidding is not widely understood in the online advertising industry due to the lack of a comprehensive investigation of its impact.We here propose a practically-implementable lift-based bidding system that perfectly fits the current billing rules. We conduct extensive experiments usinga real-world advertising campaign and examine the performance under various settings. We find that lift-based bidding, especially unbiased lift-based bidding is most profitable for both DSP providers and advertisers. Our ablation study highlights that lift-based bidding has a good property for currently dominant first price auctions. The results will motivate the online
LGOct 18, 2020
Online-to-Offline Advertisements as Field ExperimentsAkira Matsui, Daisuke Moriwaki
Online advertisements have become one of today's most widely used tools for enhancing businesses partly because of their compatibility with A/B testing. A/B testing allows sellers to find effective advertisement strategies such as ad creatives or segmentations. Even though several studies propose a technique to maximize the effect of an advertisement, there is insufficient comprehension of the customers' offline shopping behavior invited by the online advertisements. Herein, we study the difference in offline behavior between customers who received online advertisements and regular customers (i.e., the customers visits the target shop voluntary), and the duration of this difference. We analyzed approximately three thousand users' offline behavior with their 23.5 million location records through 31 A/B testings. We first demonstrate the externality that customers with advertisements traverse larger areas than those without advertisements, and this spatial difference lasts several days after their shopping day. We then find a long-run effect of this externality of advertising that a certain portion of the customers invited to the offline shops revisit these shops. Finally, based on this revisit effect findings, we utilize a causal machine learning model to propose a marketing strategy to maximize the revisit ratio. Our results suggest that advertisements draw customers who have different behavior traits from regular customers. This study's findings demonstrate that a simple analysis may underrate the effects of advertisements on businesses, and an analysis considering externality can attract potentially valuable customers.
HCSep 4, 2020
Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in Crowdsourced Forecasting PlatformsAkira Matsui, Emilio Ferrara, Fred Morstatter et al.
Crowdwork often entails tackling cognitively-demanding and time-consuming tasks. Crowdsourcing can be used for complex annotation tasks, from medical imaging to geospatial data, and such data powers sensitive applications, such as health diagnostics or autonomous driving. However, the existence and prevalence of underperforming crowdworkers is well-recognized, and can pose a threat to the validity of crowdsourcing. In this study, we propose the use of a computational framework to identify clusters of underperforming workers using clickstream trajectories. We focus on crowdsourced geopolitical forecasting. The framework can reveal different types of underperformers, such as workers with forecasts whose accuracy is far from the consensus of the crowd, those who provide low-quality explanations for their forecasts, and those who simply copy-paste their forecasts from other users. Our study suggests that clickstream clustering and analysis are fundamental tools to diagnose the performance of crowdworkers in platforms leveraging the wisdom of crowds.
LGJul 8, 2020
Unbiased Lift-based Bidding SystemDaisuke Moriwaki, Yuta Hayakawa, Isshu Munemasa et al.
Conventional bidding strategies for online display ad auction heavily relies on observed performance indicators such as clicks or conversions. A bidding strategy naively pursuing these easily observable metrics, however, fails to optimize the profitability of the advertisers. Rather, the bidding strategy that leads to the maximum revenue is a strategy pursuing the performance lift of showing ads to a specific user. Therefore, it is essential to predict the lift-effect of showing ads to each user on their target variables from observed log data. However, there is a difficulty in predicting the lift-effect, as the training data gathered by a past bidding strategy may have a strong bias towards the winning impressions. In this study, we develop Unbiased Lift-based Bidding System, which maximizes the advertisers' profit by accurately predicting the lift-effect from biased log data. Our system is the first to enable high-performing lift-based bidding strategy by theoretically alleviating the inherent bias in the log. Real-world, large-scale A/B testing successfully demonstrates the superiority and practicability of the proposed system.
LGApr 28, 2020
Detecting multi-timescale consumption patterns from receipt data: A non-negative tensor factorization approachAkira Matsui, Teruyoshi Kobayashi, Daisuke Moriwaki et al.
Understanding consumer behavior is an important task, not only for developing marketing strategies but also for the management of economic policies. Detecting consumption patterns, however, is a high-dimensional problem in which various factors that would affect consumers' behavior need to be considered, such as consumers' demographics, circadian rhythm, seasonal cycles, etc. Here, we develop a method to extract multi-timescale expenditure patterns of consumers from a large dataset of scanned receipts. We use a non-negative tensor factorization (NTF) to detect intra- and inter-week consumption patterns at one time. The proposed method allows us to characterize consumers based on their consumption patterns that are correlated over different timescales.