QwerkyAI
/

Qwerky-Optimized-Llama3.1-Mamba-0.2-8B-Instruct

Text Generation

qwerky_llama_mamba_hybrid

Model card Files Files and versions

ulmentflam commited on 9 days ago

Commit

3a374d4

·

verified ·

1 Parent(s): 2d4bf89

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pipeline_tag: text-generation
 # QwerkyLlamaMambaHybrid
-This is a hybrid Mamba-Transformer model based on the Llama 3.2 architecture, distilled from Llama 3.3 70B into a 8B parameter model using Qwerky's proprietary distillation method. The model uses MAMBA layers interleaved with attention layers for efficient sequence modeling. The results are a 3B parameter model comparable in quality to Llama's 3.2 3B but running at speeds as fast or faster than Llama's 3.2 1B model.
 **Model Developer**: Qwerky AI

 # QwerkyLlamaMambaHybrid
+This is a hybrid Mamba-Transformer model based on the Llama 3.2 architecture, distilled from Llama 3.3 70B into a 8B parameter model using Qwerky's proprietary distillation method. The model uses MAMBA layers interleaved with attention layers for efficient sequence modeling. The results are a 8B parameter model comparable in quality to Llama's 3.2 8B but running at speeds as fast or faster than Llama's 3.2 3B model.
 **Model Developer**: Qwerky AI