-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2509.08755
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56 -
The Majority is not always right: RL training for solution aggregation
Paper • 2509.06870 • Published • 16 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper • 2509.03646 • Published • 30
-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 35 -
FNet: Mixing Tokens with Fourier Transforms
Paper • 2105.03824 • Published • 1 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 48 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
Paper • 2311.13743 • Published • 1 -
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper • 2509.09995 • Published • 14 -
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 14 -
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Paper • 2509.09677 • Published • 34
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 30 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 33 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 48 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56 -
The Majority is not always right: RL training for solution aggregation
Paper • 2509.06870 • Published • 16 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper • 2509.03646 • Published • 30
-
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
Paper • 2311.13743 • Published • 1 -
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper • 2509.09995 • Published • 14 -
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 14 -
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Paper • 2509.09677 • Published • 34
-
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
Paper • 2508.21104 • Published • 35 -
FNet: Mixing Tokens with Fourier Transforms
Paper • 2105.03824 • Published • 1 -
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
Provable Benefits of In-Tool Learning for Large Language Models
Paper • 2508.20755 • Published • 11 -
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Paper • 2508.20453 • Published • 63 -
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench
Paper • 2508.20931 • Published • 15 -
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
Paper • 2509.08755 • Published • 56