Tina: A Lightweight Inference Model Based on LoRA

Tina: A Lightweight Inference Model Based on LoRA

How can we achieve strong inference capabilities in language models at a low cost? Driven by this fundamental question, we propose Tina — a family of lightweight inference models realized with high cost efficiency. Tina demonstrates that significant improvements in inference performance can be achieved even with minimal resources. This achievement is realized through the … Read more

A Review of LLM Meta-Thinking via Multi-Agent Reinforcement Learning

A Review of LLM Meta-Thinking via Multi-Agent Reinforcement Learning

Imagine asking an LLM, “Did Aristotle use a laptop?” It might seriously fabricate a reason: “There was already wireless internet in ancient Greece…” This phenomenon of “seriously nonsense” is what we call the LLM’s “hallucination.” Paper: Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey Link: https://arxiv.org/pdf/2504.14520 The paper points out that meta-thinking— the ability … Read more

Detailed Explanation of Smooth Policy Iteration (SPI) Architecture Against Adversarial Reinforcement Learning

Detailed Explanation of Smooth Policy Iteration (SPI) Architecture Against Adversarial Reinforcement Learning

It is well known that the max operator (or min operator) is a core component of the Bellman equation, and its efficient solution runs through various reinforcement learning algorithms, including mainstream Actor-Critic algorithms such as PPO, TRPO, DDPG, DSAC, and DACER. Friends familiar with algorithm design may have a question: why is the max operator … Read more