RL4Reasoning

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

PeterV09 authored a paper about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuzhen17 authored a paper about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuzhen17 authored a paper 7 months ago

Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning

View all activity

PeterV09

authored a paper about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

yuzhen17

authored a paper about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

yuzhen17

authored a paper 7 months ago

Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning

Paper • 2505.22203 • Published May 28 • 6

Junteng

authored 2 papers 7 months ago

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26 • 68

On the Perception Bottleneck of VLMs for Chart Understanding

Paper • 2503.18435 • Published Mar 24 • 1

yuzhen17

authored a paper 7 months ago

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Paper • 2505.15612 • Published May 21 • 34

PeterV09

authored 4 papers 7 months ago

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 42

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24 • 31

Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

Paper • 2505.05464 • Published May 8 • 11

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Paper • 2505.15612 • Published May 21 • 34

PeterV09

updated a model 7 months ago

hkust-nlp/Laser-DE-L4096-7B

8B • Updated May 14 • 18

PeterV09

published a model 7 months ago

hkust-nlp/Laser-DE-L4096-7B

8B • Updated May 14 • 18

PeterV09

updated a model 7 months ago

hkust-nlp/Laser-D-L4096-7B

8B • Updated May 14 • 21

PeterV09

published a model 7 months ago

hkust-nlp/Laser-D-L4096-7B

8B • Updated May 14 • 21

PeterV09

published a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-8192-rtl-cliphigh-hf-1.5B-2_deepscaler_-390

Updated Apr 22

PeterV09

updated a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-rtl-dynamic-m-e-l4096-cliphigh-hf-1.5B-4_deepscaler_-220

2B • Updated Apr 22 • 6

PeterV09

published a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-rtl-dynamic-m-e-l4096-cliphigh-hf-1.5B-4_deepscaler_-220

2B • Updated Apr 22 • 6

PeterV09

updated a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-rtl-dynamic-m-e-cliphigh-hf-1.5B-4_deepscaler_-390

2B • Updated Apr 22 • 5

PeterV09

published a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-rtl-dynamic-m-e-cliphigh-hf-1.5B-4_deepscaler_-390

2B • Updated Apr 22 • 5

PeterV09

updated a model 8 months ago

RL4Reasoning/verl-grpo-lr-deepscaler-bsz128-16384-2048-rtl-cliphigh-hf-1.5B-4_deepscaler_-340

2B • Updated Apr 22 • 7

AI & ML interests

Recent Activity

Team members 3

RL4Reasoning's activity