Towards Practical Multi-label Causal Discovery in High-Dimensional Event Sequences via One-Shot Graph Aggregation
This addresses the challenge of multi-label causal discovery in sparse, high-dimensional event sequences for domains like healthcare or vehicle diagnostics, representing an incremental advance.
The paper tackles the problem of discovering causality in high-dimensional event sequences with multiple outcome labels, introducing CARGO which uses pretrained causal Transformers and one-shot graph aggregation to efficiently reconstruct global Markov boundaries. Results on an automotive fault dataset with 29,100 event types and 474 labels demonstrate its ability to perform structured reasoning at scale.
Understanding causality in event sequences where outcome labels such as diseases or system failures arise from preceding events like symptoms or error codes is critical. Yet remains an unsolved challenge across domains like healthcare or vehicle diagnostics. We introduce CARGO, a scalable multi-label causal discovery method for sparse, high-dimensional event sequences comprising of thousands of unique event types. Using two pretrained causal Transformers as domain-specific foundation models for event sequences. CARGO infers in parallel, per sequence one-shot causal graphs and aggregates them using an adaptive frequency fusion to reconstruct the global Markov boundaries of labels. This two-stage approach enables efficient probabilistic reasoning at scale while bypassing the intractable cost of full-dataset conditional independence testing. Our results on a challenging real-world automotive fault prediction dataset with over 29,100 unique event types and 474 imbalanced labels demonstrate CARGO's ability to perform structured reasoning.