-
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
Temporal Preference Optimization for Long-Form Video Understanding
Paper • 2501.13919 • Published • 23 -
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
Paper • 2503.07365 • Published • 61
Collections
Discover the best community collections!
Collections including paper arxiv:2503.07365
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147 -
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Paper • 2501.01245 • Published • 5 -
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Paper • 2501.00599 • Published • 47 -
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Paper • 2501.08326 • Published • 33
-
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper • 2503.04130 • Published • 96 -
Temporal Preference Optimization for Long-Form Video Understanding
Paper • 2501.13919 • Published • 23 -
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
Paper • 2503.07365 • Published • 61
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 39
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 429 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 58 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 147 -
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Paper • 2501.01245 • Published • 5 -
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Paper • 2501.00599 • Published • 47 -
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Paper • 2501.08326 • Published • 33