-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2506.18095
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 114 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 100 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104
-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 21 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
Paper • 2506.17202 • Published • 10 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 30 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 9
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer • Updated • 92.3k • 22k • 93 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 75
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
Paper • 2507.04590 • Published • 16 -
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation
Paper • 2509.00428 • Published • 17
-
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 97 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models
Paper • 2506.19103 • Published • 42
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 32 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 55 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
FreedomIntelligence/ShareGPT-4o-Image
Viewer • Updated • 92.3k • 22k • 93 -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 75
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 114 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 100 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104
-
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
Paper • 2507.04590 • Published • 16 -
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation
Paper • 2509.00428 • Published • 17
-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 21 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 97 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66 -
Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models
Paper • 2506.19103 • Published • 42
-
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 80 -
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
Paper • 2506.17202 • Published • 10 -
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
Paper • 2506.18095 • Published • 66
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 32 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 30 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 30 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 9
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 55 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11