Takayuki Mizuno

SI
7papers
72citations
Novelty31%
AI Score39

7 Papers

33.0SIMay 27
Snippet-Driven Supply Chain Discovery with LLMs: Scaling Visibility in China

Hiroto Fukada, Takayuki Mizuno

Financial and economic research often relies on structured supply-chain disclosures and commercial databases. In China, supplier--customer disclosure is typically limited to major partners of listed firms, leaving unlisted firms and long-tail inter-firm links poorly captured in structured data. Public web evidence can partly complement this gap through corporate, government, and trade-media disclosures; however, full-text web mining at scale is costly because pages are often inaccessible or expensive to process with large language models (LLMs). We propose a snippet-driven method for constructing a supply chain knowledge graph (SCKG), with firms as nodes and inter-firm relationships as edges. Web search snippets are query-biased summaries returned with search results. We use them as a scalable first-pass evidence layer for LLM-based relationship extraction. We evaluate the pipeline in terms of extraction efficiency and coverage. For extraction efficiency, exhaustive full-text chunking discovers 19.8$\times$ more unique relationships than snippets, but requires 251.2$\times$ more input tokens and yields higher redundancy. For coverage, we use 130,685 Chinese firms as search seeds, covering Shanghai/Shenzhen-listed firms and large unlisted firms as of 2024. In the listed-firm subset, the resulting SCKG covers 7.2$\times$ more firms and 9.3$\times$ more relationships than the CSMAR disclosure-based benchmark, while revealing heavy-tailed degree patterns. Retained provenance metadata make the SCKG an auditable complement to disclosure-based databases.

LGAug 14, 2023
Generating Individual Trajectories Using GPT-2 Trained from Scratch on Encoded Spatiotemporal Data

Taizo Horikomi, Shouji Fujimoto, Atushi Ishikawa et al.

Following Mizuno, Fujimoto, and Ishikawa's research (Front. Phys. 2022), we transpose geographical coordinates expressed in latitude and longitude into distinctive location tokens that embody positions across varied spatial scales. We encapsulate an individual daily trajectory as a sequence of tokens by adding unique time interval tokens to the location tokens. Using the architecture of an autoregressive language model, GPT-2, this sequence of tokens is trained from scratch, allowing us to construct a deep learning model that sequentially generates an individual daily trajectory. Environmental factors such as meteorological conditions and individual attributes such as gender and age are symbolized by unique special tokens, and by training these tokens and trajectories on the GPT-2 architecture, we can generate trajectories that are influenced by both environmental factors and individual attributes.

26.8DLMar 15
Researcher Population Pyramids: Tracking Demographic and Gender Trajectories Across Countries

Kazuki Nakajima, Takayuki Mizuno

The sustainability of the academic ecosystem relies on researcher demographics and gender balance, yet assessing these dynamics in a timely manner for policy is challenging. Here, we propose a researcher population pyramid framework for tracking demographic and gender trajectories across countries using publication data. We provide a timely snapshot of historical and present demographics and gender balance across 58 countries, revealing three contrasting patterns among research systems: Emerging systems (e.g., Arab countries) exhibit high researcher inflows with widening gender gaps in cumulative productivity; Mature systems (e.g., the United States) show modest inflows with narrowing gender gaps; and Rigid systems (e.g., Japan) lag in both. Furthermore, by simulating future scenarios, the framework makes potential trajectories visible. If 2023 demographic patterns persist, Arab countries' systems could resemble mature or even rigid ones by 2050. Our framework provides a robust diagnostic tool for policymakers worldwide to foster sustainable talent pipelines and gender equality in academia.

LGJul 13, 2024
Generating In-store Customer Journeys from Scratch with GPT Architectures

Taizo Horikomi, Takayuki Mizuno

We propose a method that can generate customer trajectories and purchasing behaviors in retail stores simultaneously using Transformer-based deep learning structure. Utilizing customer trajectory data, layout diagrams, and retail scanner data obtained from a retail store, we trained a GPT-2 architecture from scratch to generate indoor trajectories and purchase actions. Additionally, we explored the effectiveness of fine-tuning the pre-trained model with data from another store. Results demonstrate that our method reproduces in-store trajectories and purchase behaviors more accurately than LSTM and SVM models, with fine-tuning significantly reducing the required training data.

SISep 8, 2020
Nondiagonal Mixture of Dirichlet Network Distributions for Analyzing a Stock Ownership Network

Wenning Zhang, Ryohei Hisano, Takaaki Ohnishi et al.

Block modeling is widely used in studies on complex networks. The cornerstone model is the stochastic block model (SBM), widely used over the past decades. However, the SBM is limited in analyzing complex networks as the model is, in essence, a random graph model that cannot reproduce the basic properties of many complex networks, such as sparsity and heavy-tailed degree distribution. In this paper, we provide an edge exchangeable block model that incorporates such basic features and simultaneously infers the latent block structure of a given complex network. Our model is a Bayesian nonparametric model that flexibly estimates the number of blocks and takes into account the possibility of unseen nodes. Using one synthetic dataset and one real-world stock ownership dataset, we show that our model outperforms state-of-the-art SBMs for held-out link prediction tasks.

SINov 9, 2018
Prediction of ESG Compliance using a Heterogeneous Information Network

Ryohei Hisano, Didier Sornette, Takayuki Mizuno

Negative screening is one method to avoid interactions with inappropriate entities. For example, financial institutions keep investment exclusion lists of inappropriate firms that have environmental, social, and government (ESG) problems. They create their investment exclusion lists by gathering information from various news sources to keep their portfolios profitable as well as green. International organizations also maintain smart sanctions lists that are used to prohibit trade with entities that are involved in illegal activities. In the present paper, we focus on the prediction of investment exclusion lists in the finance domain. We construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, which results in approximately 50 million nodes and 400 million edges in total. Exploiting these vast datasets and motivated by how professional investigators and journalists undertake their daily investigations, we propose a model that can learn to predict firms that are more likely to be added to an investment exclusion list in the near future. Our approach is tested using the negative news investment exclusion list data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with the state-of-the-art methods with and without using the network, we show that the predictive accuracy is substantially improved when using the vast information stored in the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data to monitor dominant firms on a global scale for better risk management and more socially responsible investment.

MLOct 23, 2012
High quality topic extraction from business news explains abnormal financial market volatility

Ryohei Hisano, Didier Sornette, Takayuki Mizuno et al.

Understanding the mutual relationships between information flows and social activity in society today is one of the cornerstones of the social sciences. In financial economics, the key issue in this regard is understanding and quantifying how news of all possible types (geopolitical, environmental, social, financial, economic, etc.) affect trading and the pricing of firms in organized stock markets. In this article, we seek to address this issue by performing an analysis of more than 24 million news records provided by Thompson Reuters and of their relationship with trading activity for 206 major stocks in the S&P US stock index. We show that the whole landscape of news that affect stock price movements can be automatically summarized via simple regularized regressions between trading activity and news information pieces decomposed, with the help of simple topic modeling techniques, into their "thematic" features. Using these methods, we are able to estimate and quantify the impacts of news on trading. We introduce network-based visualization techniques to represent the whole landscape of news information associated with a basket of stocks. The examination of the words that are representative of the topic distributions confirms that our method is able to extract the significant pieces of information influencing the stock market. Our results show that one of the most puzzling stylized fact in financial economies, namely that at certain times trading volumes appear to be "abnormally large," can be partially explained by the flow of news. In this sense, our results prove that there is no "excess trading," when restricting to times when news are genuinely novel and provide relevant financial information.