Benchmark time series data sets for PyTorch -- the torchtime package
This work addresses reproducibility and error issues for researchers in EHR time series modeling, though it is incremental as it focuses on data processing tools rather than new models.
The paper tackles the lack of standardized data processing for Electronic Health Record time series models by introducing torchtime, a Python package that provides reproducible implementations of benchmark datasets from PhysioNet and UEA & UCR repositories, aiming to simplify access and enable fair comparisons.
The development of models for Electronic Health Record data is an area of active research featuring a small number of public benchmark data sets. Researchers typically write custom data processing code but this hinders reproducibility and can introduce errors. The Python package torchtime provides reproducible implementations of commonly used PhysioNet and UEA & UCR time series classification repository data sets for PyTorch. Features are provided for working with irregularly sampled and partially observed time series of unequal length. It aims to simplify access to PhysioNet data and enable fair comparisons of models in this exciting area of research.