๐ง GREAM: Generative Reasoning Recommendation Model
Paper: Generative Reasoning Recommendation via LLMs, 2025.
Authors: Minjie Hong*, Zetong Zhou*, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, Zhou Zhaoโ
Repository: https://github.com/Indolent-Kawhi/GRRM
HF Papers Link: https://huggingface.co/papers/2510.20815
๐งฉ Model Summary
GREAM (Generative Reasoning Recommendation Model) is a large language model (LLM)-based generative reasoning recommender designed to unify understanding, reasoning, and prediction for recommendation tasks.
It introduces a reasoning-enhanced, verifiable reinforcement learning framework that allows both high-throughput direct recommendations and interpretable reasoning-based outputs.
Key Features
- CollaborativeโSemantic Alignment: Fuses textual (titles, descriptions, reviews) and behavioral signals to align linguistic and collaborative semantics.
- Reasoning Curriculum Activation: Builds synthetic Chain-of-Thought (CoT) data and trains via curriculum to develop causal reasoning for recommendations.
- Sparse-Regularized Group Policy Optimization (SRPO): Enables stable RL fine-tuning using Residual-Sensitive Verifiable Rewards and Bonus-Calibrated Group Advantage Estimation for sparse feedback.
๐ง Model Architecture
| Component | Description |
|---|---|
| Backbone | Qwen3-4B-Instruct |
| Indexing | Residual Quantization (RQ-KMeans, 5 levels, 256 values per level) |
| Training Phases | โ CollaborativeโSemantic Alignment โ โก Reasoning Curriculum Activation โ โข SRPO Reinforcement Learning |
| Inference Modes | - Direct Sequence Recommendation: low-latency item generation - Sequential Reasoning Recommendation: interpretable CoT reasoning chains |
| RL Framework | Verl + SGLang backend |
๐ Training Data
| Data Type | Source | Description |
|---|---|---|
| Dalign | Amazon Review Datasets (Beauty, Sports, Instruments) | Sequential, semantic reconstruction, and preference understanding tasks |
| Dreason | Synthetic CoT data generated via GPT-5 / Qwen3-30B / Llama-3.1 | Multi-step reasoning sequences with <think>...</think> and <answer>...</answer> supervision |
| Text Sources | Item titles, descriptions, and high-quality reviews | Combined and rewritten to form dense item semantics |
๐ Evaluation
Datasets
- Amazon-Beauty
- Amazon-Sports & Outdoors
- Amazon-Musical Instruments
Citation
@misc{hong2025generativereasoningrecommendationllms,
title={Generative Reasoning Recommendation via LLMs},
author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao},
year={2025},
eprint={2510.20815},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2510.20815},
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support