"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces
This work addresses the problem of information retrieval from web interfaces for users, but it is incremental as it decomposes a known task into a more atomic operation.
The study investigated whether large language models (LLMs) can identify important information in web interfaces for user queries, finding that while LLMs show reasonable performance, there is substantial room for improvement.
Large language models (LLMs) that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.