智能AI
morning
Informative Missingness to Generate Irregular Clinical Time Series
2026-06-17
1 阅读
Hadi Mehdizavareh, Gabriele Santangelo, Giovanna Nicora, Simon Lebech Cichosz, Arianna Dagliati, Arijit Khan, Riccardo Bellazzi
arXiv:2606.17106v1 公告类型:新 摘要:电子健康记录中的实验室测试是不定期收集的,没有测试订单可以与测量本身一样提供信息。这种缺失反映了临床医生的决定和患者的生理学,因此直接对其进行建模而不是将其视为预处理工件非常重要。在这里,我们提出了一种基于扩散的方法来生成临床时间序列,该方法使用源自 MIMIC-III 的公共数据分析挑战缺失数据插补 (DACMI) 基准来联合建模实验室值及其观察模式。为了保持真实的采样,我们将图表时间调整为 4 小时的间隔,并将入院情况分段为 7 天的窗口,生成将每个实验室值与相应的观察指标配对的轨迹。 Standard transformations and normalization are applied to stabilize training.我们的方法扩展了 TimeDiff 框架,通过互补的扩散目标来学习连续的实验室值和离散的缺失模式。 Experiments show that the generated data closely match real patient trajectories across individual lab distributions and joint value-missingness embeddings, demonstrating that diffusion models can capture clinically meaningful dependencies between patient physiology and clinicians' testing behavior under MNAR-like (missing-not-at-random) missingness. These preliminary results indicate that our model can serve as an initial component toward developing clinical foundation models. By producing synthetic priors that preserve key physiology-missingness relationships, this work motivates the subsequent training of Prior-Data Fitted Networks capable of leveraging informative missingness, which we will investigate in the extended work.