LLM Direct Preference Optimization Using Synthetic Data: Domain Specific Model Training and Benchmarking

Pavlyshenko Bohdan, Ivan Bulka

Ivan Franko National University of Lviv

Аннотація

Large Language Models (LLMs) have revolutionized natural language processing, but their usage in specialized domains such as finance is complicated by limitations in domain-specific understanding and the use of specific terminology. This study investigates the adaptation of LLMs, focusing on Meta- Llama-3-8B-Instruct, for advanced financial NLP tasks through a combination of Parameter-Efficient Fine- Tuning (PEFT) techniques, specifically LoRA and QLoRA, and Direct Preference Optimization (DPO) methods. Utilizing the open-source Sujet-Finance-Instruct-177k dataset, which covers six core financial NLP tasks, we demonstrate that PEFT approaches improve performance in tasks such as sentiment analysis and topic classification, while showing limited efficiency in complex generative tasks like question answering. To address this gap, we introduce DPO using synthetically generated preference pairs, enabling supervised alignment based on human-like feedback. Experimental results reveal that DPO enhances the model’s performance in challenging question-answering tasks, as evidenced by increased LLM-based evaluation scores. Our findings highlight that while PEFT methods offer efficient domain adaptation, augmenting them with supervised preference optimization is crucial for optimal performance in financial applications.

0 Comments
Найстаріші
Найновіше Найбільше голосів
Зворотній зв'язок в режимі реального часу
Переглянути всі коментарі
0
Буду рада вашим думкам, прокоментуйте.x