Final Theses

Digital Business, Digital Transformation, Service Engineering, Service Management

Can LLMs Do Everything? Exploring the Potential of LLMs for Synthetic Financial Transaction Data Generation

Situation

Large Language Models (LLMs) such as GPT-4 have reshaped numerous fields by demonstrating unprecedented capabilities in tasks traditionally reserved for specialized models. Recent research has highlighted the potential of LLMs in generating high-quality synthetic datasets, such as consumer interviews, in what is termed "Silicon Sampling" (Kraft et al., 2024). This raises an intriguing question: Could LLMs also excel in generating realistic synthetic financial transaction data, an area dominated by methods such as Generative Adversarial Networks (GANs) and Diffusion Models (Xu et al., 2019; Sattarov et al., 2023)?

Synthetic data has become essential in financial services to overcome challenges of data scarcity, regulatory constraints, and privacy protection (Assefa, 2020; Hilal et al., 2022). Exploring LLM-based approaches for synthetic transaction data could lead to transformative outcomes by enabling faster, more flexible, and potentially more privacy-preserving methods of synthetic data generation.

Potential Research Directions

  • Assessing the Potential of LLMs: Investigate methods for employing LLMs to generate realistic synthetic financial transaction data, inspired by recent advancements in silicon sampling techniques.
  • Benchmarking Against Existing Methods: Evaluate if LLM-generated financial transaction data can match or exceed the quality, diversity, and utility of current state-of-the-art synthetic data generation methods (e.g., GANs, Diffusion Models).
  • Impact on Financial Applications: Quantitatively compare how effectively models trained on LLM-generated data perform on critical financial tasks such as fraud detection, credit scoring, and risk assessment.
  • Evaluating Efficiency and Practicality: Study the computational efficiency, cost-effectiveness, and scalability of LLM-based synthetic data methods compared to established alternatives.
     

What We Expect

  • Strong analytical skills, curiosity, and willingness to think outside the box and apply existing LLMs in new contexts.
  • Commitment to a structured and result-driven research approach, aiming for completion within ±6 months.
     

What We Offer

  • Close supervision, regular feedback sessions, and direct access to expert guidance.
  • Access to computational resources, relevant datasets, and collaboration opportunities with industry experts.
  • Opportunity to contribute to groundbreaking research at the forefront of synthetic data generation with potential for publication in high-impact journals.

Application

If you are interested, please reach out via email to mahei.li@unisg.ch. I look forward to discussing your thesis ideas and supporting you in achieving exceptional results.

Persons

Dr.

Mahei Li

To Detail
north