dDPO Articles - Boardor

Comprehensive Analysis of LoRA, QLoRA, RLHF, PPO, DPO, and Flash Attention

2025-05-03 by boardor

With the rapid development of large models, there has been significant technological iteration and updates in just a year, from LoRA, QLoRA, AdaLoRa, ZeroQuant, Flash Attention, KTO, distillation techniques to model incremental learning, data processing, and understanding new open-source models, almost every day brings new developments. As algorithm engineers, do you feel like your learning … Read more

Detailed Explanation of the Zephyr Model

2025-04-15 by boardor

Click the “Deephub Imba“, follow the public account, and don’t miss out on great articles!! Zephyr utilizes dDPO, significantly improving intent alignment and AI feedback (AIF) preference data, following steps similar to InstructGPT. Training Method Distilled Supervised Fine-Tuning (dSFT) Starting from the original LLM, it is first trained to respond to user prompts, traditionally done … Read more