Collections
Discover the best community collections!
Collections including paper arxiv:2205.13147
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 24
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 9
-
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published -
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published -
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Paper • 2212.03533 • Published • 1 -
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Paper • 2104.08821 • Published
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Paper • 2306.00989 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 24
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 11 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 9
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 34 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
-
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published -
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Paper • 2104.06979 • Published -
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Paper • 2212.03533 • Published • 1 -
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Paper • 2104.08821 • Published