2 18 1

Devin Thang

winvswon78

devininthelab

AI & ML interests

None yet

Recent Activity

upvoted a paper 17 days ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

commented on an article 28 days ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

updated a dataset about 1 month ago

winvswon78/objaverse_1k_human_furniture

View all activity

Organizations

upvoted a paper 17 days ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 21 days ago • 91

commented on Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment 28 days ago

Same question

updated a dataset about 1 month ago

winvswon78/objaverse_1k_human_furniture

Viewer • Updated Nov 10 • 2k • 25

published a dataset about 1 month ago

winvswon78/objaverse_1k_human_furniture

Viewer • Updated Nov 10 • 2k • 25

updated a dataset about 1 month ago

winvswon78/objaverse_furniture_human_single_mask

Preview • Updated Nov 10 • 21

published a dataset about 1 month ago

winvswon78/objaverse_furniture_human_single_mask

Preview • Updated Nov 10 • 21

updated a dataset about 1 month ago

winvswon78/objaverse_human_furniture_2k

Viewer • Updated Nov 9 • 15.8k • 24

published a dataset about 1 month ago

winvswon78/objaverse_human_furniture_2k

Viewer • Updated Nov 9 • 15.8k • 24

updated a dataset about 1 month ago

winvswon78/t2v_epipolar_dpo

Updated Nov 8 • 21

published a dataset about 1 month ago

winvswon78/t2v_epipolar_dpo

Updated Nov 8 • 21

upvoted an article 3 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11

•

updated a model 3 months ago

winvswon78/Qwen2.5-Math-1.5B-GRPO

Updated Sep 19

published 2 models 3 months ago

winvswon78/Qwen2.5-Math-1.5B-GRPO

Updated Sep 19

winvswon78/Qwen2-0.5B-GRPO-test

Updated Sep 19

upvoted an article 3 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7

•

255

New activity in lmms-lab/EMMA 3 months ago

[bot] Conversion to Parquet

#1 opened 5 months ago by

parquet-converter

upvoted a paper 4 months ago

Reconstructing 4D Spatial Intelligence: A Survey

Paper • 2507.21045 • Published Jul 28 • 35

updated a dataset 4 months ago

winvswon78/emma_stone

Viewer • Updated Aug 5 • 64 • 70

published a dataset 4 months ago

winvswon78/emma_stone

Viewer • Updated Aug 5 • 64 • 70

updated a dataset 5 months ago

lmms-lab/EMMA

Viewer • Updated Jul 25 • 5.58k • 38 • 1

Devin Thang

AI & ML interests

Recent Activity

Organizations

winvswon78's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

[bot] Conversion to Parquet