In a world drowning in AI news, we find what actually matters.

JP|EN
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
arxiv_cs_lg·Apr 3, 2026, 08:01 PM·9

DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data

Summary

DISCO-TAB  is a novel framework designed for the privacy-preserving synthesis of complex clinical data , addressing the limitations of traditional Generative LLMs  in capturing intricate dependencies and class imbalances in Electronic Health Records (EHR) . It orchestrates a fine-tuned LLM  with a multi-objective discriminator system , optimized via Reinforcement Learning .

Unlike prior methods, DISCO-TAB evaluates synthesis at four granularities (token, sentence, feature, and row) and integrates Automated Constraint Discovery  and Inverse-Frequency Reward Shaping  to preserve latent medical logic and prevent minority-class collapse. Rigorous validation shows state-of-the-art performance , achieving up to 38.2% improvement in downstream clinical classifier utility compared to GAN  and Diffusion  baselines, while ensuring exceptional statistical fidelity and robust resistance to membership inference attacks . This work sets a new standard for trustworthy, utility-preserving synthetic tabular data in sensitive healthcare applications.

Technical Impact

  • Synthetic Data Generation : Establishes a new state-of-the-art (SOTA)  for privacy-preserving synthetic tabular data , particularly for healthcare. This significantly enhances the availability of sensitive clinical datasets for research and development, accelerating the creation of clinical decision support systems .

  • LLM Application Expansion : Demonstrates a groundbreaking approach to extend the utility of Large Language Models (LLMs)  beyond text generation into the synthesis of structured tabular data . This highlights the potential for LLMs to handle diverse data modalities when integrated with other AI paradigms.

  • Advanced Reinforcement Learning : Showcases sophisticated Reinforcement Learning (RL)  techniques, including hierarchical feedback , multi-objective optimization , and reward shaping , applied to a complex data generation problem. This provides valuable insights for data scientists and ML engineers on leveraging RL for similar challenges.

  • Robust Privacy-Preserving AI : Offers a robust solution for balancing data privacy  with utility . Its strong resistance to membership inference attacks  while significantly improving downstream classifier performance makes it an indispensable technology for AI development in highly regulated industries.

  • Addressing Data Imbalance : The integration of Automated Constraint Discovery  and Inverse-Frequency Reward Shaping  autonomously preserves latent medical logic and resolves the common issue of minority-class collapse . This is crucial for ensuring the quality and clinical validity of synthetic data generated from imbalanced datasets.

DISCO-TABGenerative Large Language Models (LLMs)Reinforcement LearningGANDiffusionElectronic Health Records (EHR)
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data - EX ViSiON