Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 8 days ago • 80
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing Paper • 2507.20352 • Published Jul 27
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability Paper • 2505.24147 • Published May 30
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 74
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 16
Language Models can Self-Lengthen to Generate Long Texts Paper • 2410.23933 • Published Oct 31, 2024 • 18
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 52
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining Paper • 2003.13198 • Published Mar 30, 2020
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts Paper • 2305.14688 • Published May 24, 2023