Evaluating Large Language Models for Causal Modeling
This work addresses the challenge of integrating causal modeling with domain knowledge for researchers and practitioners in causal data science, but it is incremental as it builds on existing LLM capabilities for specific tasks.
The paper tackled the problem of transforming causal domain knowledge into a representation aligned with causal data science guidelines by introducing tasks for distilling knowledge into causal variables and detecting interaction entities using LLMs. It found that LLMs like GPT-4-turbo and Llama3-70b perform better in distilling knowledge compared to sparse expert models like Mixtral-8x22b, while Mixtral-8x22b is more effective in identifying interaction entities, with performance depending on the domain.
In this paper, we consider the process of transforming causal domain knowledge into a representation that aligns more closely with guidelines from causal data science. To this end, we introduce two novel tasks related to distilling causal domain knowledge into causal variables and detecting interaction entities using LLMs. We have determined that contemporary LLMs are helpful tools for conducting causal modeling tasks in collaboration with human experts, as they can provide a wider perspective. Specifically, LLMs, such as GPT-4-turbo and Llama3-70b, perform better in distilling causal domain knowledge into causal variables compared to sparse expert models, such as Mixtral-8x22b. On the contrary, sparse expert models such as Mixtral-8x22b stand out as the most effective in identifying interaction entities. Finally, we highlight the dependency between the domain where the entities are generated and the performance of the chosen LLM for causal modeling.