LGIROct 30, 2022

A Pipeline for Analysing Grant Applications

arXiv:2210.16843v12 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of evaluating innovation in grant applications for research funding agencies, but it is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of analyzing grant applications to assess innovation by applying data mining models to predict innovation scores and understand the vocabulary of innovative proposals, resulting in a Random Forest classifier with a modified TF-IDF encoding that demonstrated feasibility through experimental results.

Data mining techniques can transform massive amounts of unstructured data into quantitative data that quickly reveal insights, trends, and patterns behind the original data. In this paper, a data mining model is applied to analyse the 2019 grant applications submitted to an Australian Government research funding agency to investigate whether grant schemes successfully identifies innovative project proposals, as intended. The grant applications are peer-reviewed research proposals that include specific ``innovation and creativity'' (IC) scores assigned by reviewers. In addition to predicting the IC score for each research proposal, we are particularly interested in understanding the vocabulary of innovative proposals. In order to solve this problem, various data mining models and feature encoding algorithms are studied and explored. As a result, we propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features denoting the presence or absence of unigrams. In specific, the unigram terms are encoded by a modified Term Frequency - Inverse Document Frequency (TF-IDF) algorithm, which only implements the IDF part of TF-IDF. Besides the proposed model, this paper also presents a rigorous experimental pipeline for analysing grant applications, and the experimental results prove its feasibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes