-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2405.17247
-
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Paper • 2401.06209 • Published -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
CogVLM: Visual Expert for Pretrained Language Models
Paper • 2311.03079 • Published • 28 -
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Paper • 2309.01262 • Published
-
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs
Paper • 2408.13467 • Published • 25 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54
-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 101 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Paper • 2409.18124 • Published • 33 -
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Paper • 2409.18125 • Published • 34 -
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
Paper • 2410.11795 • Published • 18 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 24
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 95 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Paper • 2401.06209 • Published -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
CogVLM: Visual Expert for Pretrained Language Models
Paper • 2311.03079 • Published • 28 -
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Paper • 2309.01262 • Published
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Paper • 2409.18124 • Published • 33 -
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Paper • 2409.18125 • Published • 34 -
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
Paper • 2410.11795 • Published • 18 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs
Paper • 2408.13467 • Published • 25 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 24
-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 52 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 101 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 95 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133