Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.18095

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

Unified Multimodal Model

A curated list for Multimodal Model Generation papers.

OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 78
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 115
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published Feb 8 • 21
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 80
OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 78
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

Paper • 2506.17202 • Published Jun 20 • 10
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 30
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published Apr 16 • 9

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 22k • 93
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Paper • 2507.04590 • Published Jul 7 • 16
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published Aug 30 • 17

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 97
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 54
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models

Paper • 2506.19103 • Published Jun 23 • 42

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 52
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1 • 92.3k • 22k • 93
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Paper • 2507.04590 • Published Jul 7 • 16
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published Aug 30 • 17

Unified Multimodal Model

A curated list for Multimodal Model Generation papers.

OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 78
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 115
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published Feb 8 • 21
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 97
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 54
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models

Paper • 2506.19103 • Published Jun 23 • 42

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5 • 80
OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 78
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

Paper • 2506.17202 • Published Jun 20 • 10
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 66

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 30
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 30
Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published Apr 16 • 9

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 52
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs