Large Language Models (LLMs) such as GPT-4 have reshaped numerous fields by demonstrating unprecedented capabilities in tasks traditionally reserved for specialized models. Recent research has highlighted the potential of LLMs in generating high-quality synthetic datasets, such as consumer interviews, in what is termed "Silicon Sampling" (Kraft et al., 2024). This raises an intriguing question: Could LLMs also excel in generating realistic synthetic financial transaction data, an area dominated by methods such as Generative Adversarial Networks (GANs) and Diffusion Models (Xu et al., 2019; Sattarov et al., 2023)?
Synthetic data has become essential in financial services to overcome challenges of data scarcity, regulatory constraints, and privacy protection (Assefa, 2020; Hilal et al., 2022). Exploring LLM-based approaches for synthetic transaction data could lead to transformative outcomes by enabling faster, more flexible, and potentially more privacy-preserving methods of synthetic data generation.
If you are interested, please reach out via email to mahei.li@unisg.ch. I look forward to discussing your thesis ideas and supporting you in achieving exceptional results.