The realm of healthcare and life sciences is undergoing a transformative shift, fueled by the advent and integration of data-driven technologies. At the forefront of this revolution is the burgeoning use of synthetic data, a groundbreaking development poised to redefine the landscape of medical research, AI development, and patient privacy.
The Emergence of Synthetic Data
Synthetic data is to real-world data as synthetic fiber (like nylon) is to real fiber (like hemp). Humans have created synthetic products throughout our evolution to achieve goals and develop new products that improve our lives. Synthetic fibers are used in clothing, rope, industrial equipment, automobiles, and more. The ability to create synthetic fiber expanded the opportunity to create numerous products that today we find essential.
Synthetic data has the opportunity to have a similar impact in healthcare. Synthetic data is created based on real-world data using a data synthesizer. These synthesizers may leverage different methods to create synthetic data that have the same statistical and correlative properties as the original data; however, they are completely independent from the real-world data (1, 2).
Notably, synthetic data do not contain any personal identifying information which ensures personal privacy and full compliance with privacy regulations such as EU’s General Data Protection Regulation (GDPR). The use of high-fidelity synthetic data for data augmentation is an area of growing interest in data science, generating virtual patient cohorts, such as digital twins, to estimate counterfactuals in silico trials, allowing for better prediction of treatment outcomes and personalised medicine (3).
Synthetic Data in Clinical Trials and Healthcare
In clinical trials and healthcare, synthetic data offer a unique balance of data quality, accuracy, in conjunction with privacy protection. By enabling meaningful analysis without the need to expose sensitive details, they preserve individual privacy while advancing medical research. Furthermore, synthetic data from clinical trials could become a commodity exchanged between researchers without cumbersome legal agreements to ensure personal privacy.
One use of synthetic data in clinical trials is the inclusion of synthetic control arms created using real-world data to estimate the comparative effectiveness of lurbinectedin versus the historical standard of care for relapsed small cell lung cancer in the post-platinum setting (4). The study was able to evaluate the efficacy of the treatment without the ethical and logistical constraints of enrolling a comparable control group of patients undertaking the historical standard of care. A synthetic control arm was also used for the evaluation of lisocabtagene maraleucel for the treatment of hematological cancer (5). These examples demonstrate that synthetic data can be used to accelerate clinical development.
Market Growth and Privacy Enhancement
The synthetic data generation market has seen significant growth, reaching USD 163.8 million in 2022 with a projected CAGR of 35.0% from 2023 to 2030, driven by AI integration (6). This market expansion is attributed to synthetic data’s capability to mimic real datasets’ statistical properties without compromising personal privacy. This technology, akin to creating a completely anonymized version of a detailed photograph, ensures compliance with privacy laws and protects participant anonymity in clinical research.
Highlighting synthetic data as a pivotal privacy-enhancing tool, Simmons & Simmons note its capacity to foster innovation while adhering to strict privacy regulations. This positions synthetic data as crucial for businesses aiming to utilize large data volumes securely, aligning market growth with the increasing demand for robust data protection in the face of tightening privacy laws.
Challenges and Regulatory Considerations
The emergence of synthetic data will present new challenges for regulatory bodies like the European Medicines Agency and US Food and Drug Administration as they start to receive requests for marketing authorization that include synthetic data. The European Data Supervisory Board advocates for privacy assurance assessments to guarantee the non-personal nature of synthetic data (7).
EMA and FDA’s recognition of in silico methodologies, including synthetic data, underscores their significance in complementing traditional research methods (8,9,10). Yet concerns remain around the ability of synthetic data to capture small subgroups, outlier profiles, and other aspects of real world data (11). Efforts to compare synthetic data to real-world data must continue to support continued use of synthetic data in clinical development activities.
InSilicoTrials and Synthetic Data
InSilicoTrials is helping sponsors integrate synthetic data in drug development programs. We help companies leverage synthetic data to support clinical trials in rare diseases, develop synthetic control arms, discuss use cases with regulatory agencies, and more. At InSilicoTrials, our vision is to incorporate synthetic data to accelerate drug development, reduce, refine, and replace clinical trials where possible, and improve the safety of medical products.
In conclusion, synthetic data may play a key role in the digital transformation in the healthcare and life sciences sector. Apart from offering a pragmatic solution to privacy concerns, they also open new avenues for market opportunities, research endeavors, and addressing biases which allow for a safe and ethical framework for medical research, where privacy and technological progress coexist harmoniously.
References
-
Bange, V., Nwosu, C. and Griffiths, H. (2023) ‘How synthetic data can increase privacy-prioritised data sharing among businesses’, Connect on Tech [Preprint]. Available at: https://connectontech.com/how-synthetic-data-can-increase-privacy-prioritised-data-sharing-among-businesses/.
-
Bordukova, M. et al. (2024) ‘Generative artificial intelligence empowers digital twins in drug discovery and clinical trials’, Expert Opinion on Drug Discovery, 19(1), pp. 33–42. Available at: https://doi.org/10.1080/17460441.2023.2273839.
-
Chen, R.J. et al. (2021) ‘Synthetic data in machine learning for medicine and healthcare’, Nature Biomedical Engineering, 5(6), pp. 493–497. Available at: https://doi.org/10.1038/s41551-021-00751-8.
-
Boyne, D.J. et al. (2023) ‘Comparative Effectiveness of Lurbinectedin for the Treatment of Relapsed Small Cell Lung Cancer in the Post-Platinum Setting: A Real-World Canadian Synthetic Control Arm Analysis’, Targeted Oncology, 18(5), pp. 697–705. Available at: https://doi.org/10.1007/s11523-023-00995-1.
-
Van Le, H. et al. (2023) ‘Use of a real-world synthetic control arm for direct comparison of lisocabtagene maraleucel and conventional therapy in relapsed/refractory large B-cell lymphoma’, Leukemia & Lymphoma, 64(3), pp. 573–585. Available at: https://doi.org/10.1080/10428194.2022.2160200.
-
Simmons & Simmons (2023) ‘The Revolution in the Data-Driven Healthcare and Life Sciences Market’.
-
European Council and European Parliament (2023) ‘Provisional agreement on AI Act’.
-
EMA (2023) Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle. EMA/CHMP/CVMP/83833/2023. European Union: Committee for Medicinal Products for Veterinary Use (CVMP). European Medicines Agency.
-
European Commission (2023) ‘European Health Data Space’. Available at: https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space_en.
-
FDA (2023) Using Artificial Intelligence and Machine Learning in the Development of Drug and Biological Products. Docket No. FDA-2023-N-0743; Document Number: 2023-09985. United States: U.S. Department of Health and Human Services, Food and Drug Administration, pp. 30313–30314.
-
Jordon, J. et al. (2022) Synthetic Data – what, why and how? Technical Report. The Alan Turing Institute and The Royal Society. Available at: https://www.turing.ac.uk.