marcus-daily
commited on
Commit
ยท
16c8130
1
Parent(s):
633cfd0
Smart Turn v3.1
Browse files- README.md +18 -8
- benchmarks/smart-turn-v3.0.md +61 -0
- benchmarks/smart-turn-v3.1-cpu.md +61 -0
- benchmarks/smart-turn-v3.1-gpu.md +61 -0
- smart-turn-v3.1-cpu.onnx +3 -0
- smart-turn-v3.1-gpu.onnx +3 -0
README.md
CHANGED
|
@@ -6,29 +6,39 @@ tags:
|
|
| 6 |
- semantic-vad
|
| 7 |
- multilingual
|
| 8 |
datasets:
|
| 9 |
-
- pipecat-ai/smart-turn-data-v3-train
|
| 10 |
-
- pipecat-ai/smart-turn-data-v3-test
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# Smartย Turnโฏv3
|
| 14 |
|
| 15 |
-
**SmartโฏTurn
|
| 16 |
|
| 17 |
## Links
|
| 18 |
|
| 19 |
* [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/)
|
| 20 |
-
* [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code
|
| 21 |
-
* [Datasets](https://
|
| 22 |
|
| 23 |
|
| 24 |
## Modelย architecture
|
| 25 |
|
| 26 |
* Backbone: Whisper Tiny encoder
|
| 27 |
* Head: shallow linear classifier
|
| 28 |
-
* Params:
|
| 29 |
-
* Checkpoint: 8โฏMB ONNX
|
| 30 |
|
| 31 |
|
| 32 |
## How to use
|
| 33 |
|
| 34 |
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
- semantic-vad
|
| 7 |
- multilingual
|
| 8 |
datasets:
|
| 9 |
+
- pipecat-ai/smart-turn-data-v3.1-train
|
| 10 |
+
- pipecat-ai/smart-turn-data-v3.1-test
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Smartย Turnโฏv3.x
|
| 14 |
|
| 15 |
+
**SmartโฏTurn** is an openโsource semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
|
| 16 |
|
| 17 |
## Links
|
| 18 |
|
| 19 |
* [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/)
|
| 20 |
+
* [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code, and more information
|
| 21 |
+
* [Datasets](https://huggingface.co/pipecat-ai/datasets)
|
| 22 |
|
| 23 |
|
| 24 |
## Modelย architecture
|
| 25 |
|
| 26 |
* Backbone: Whisper Tiny encoder
|
| 27 |
* Head: shallow linear classifier
|
| 28 |
+
* Params: 8M
|
| 29 |
+
* Checkpoint: 8โฏMB ONNX (int8 quantized), 32MB ONNX (unquantized)
|
| 30 |
|
| 31 |
|
| 32 |
## How to use
|
| 33 |
|
| 34 |
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
## Thanks
|
| 38 |
+
|
| 39 |
+
Thank you to the following organisations for contributing audio datasets:
|
| 40 |
+
|
| 41 |
+
- [Liva AI](https://www.theliva.ai/)
|
| 42 |
+
- [Midcentury](https://www.midcentury.xyz/)
|
| 43 |
+
- [MundoAI](https://mundoai.world/)
|
| 44 |
+
|
benchmarks/smart-turn-v3.0.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Endpointing Model Benchmark Report
|
| 2 |
+
|
| 3 |
+
**Model:** `/data/smart-turn-v3.0.onnx`
|
| 4 |
+
|
| 5 |
+
**Generated:** 2025-12-03 16:04:09 UTC
|
| 6 |
+
|
| 7 |
+
## Accuracy Results
|
| 8 |
+
|
| 9 |
+
**Total Samples:** 31,473
|
| 10 |
+
|
| 11 |
+
**Unique Languages:** ๐ธ๐ฆ Arabic, ๐ง๐ฉ Bengali, ๐ฉ๐ฐ Danish, ๐ฉ๐ช German, ๐ฌ๐ง ๐บ๐ธ English, ๐ซ๐ฎ Finnish, ๐ซ๐ท French, ๐ฎ๐ณ Hindi, ๐ฎ๐ฉ Indonesian, ๐ฎ๐น Italian, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ฎ๐ณ Marathi, ๐ณ๐ฑ Dutch, ๐ณ๐ด Norwegian, ๐ต๐ฑ Polish, ๐ต๐น Portuguese, ๐ท๐บ Russian, ๐ช๐ธ Spanish, ๐น๐ท Turkish, ๐บ๐ฆ Ukrainian, ๐ป๐ณ Vietnamese, ๐จ๐ณ Chinese
|
| 12 |
+
|
| 13 |
+
**Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
|
| 14 |
+
|
| 15 |
+
### Overall Performance
|
| 16 |
+
| Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 17 |
+
|--------|--------------|--------------|---------------------|---------------------|
|
| 18 |
+
| Overall | 31,473 | 91.60 | 4.68 | 3.72 |
|
| 19 |
+
|
| 20 |
+
### Performance by Language
|
| 21 |
+
| Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 22 |
+
|----------|--------------|--------------|---------------------|---------------------|
|
| 23 |
+
| ๐น๐ท Turkish | 966 | 97.10 | 1.66 | 1.24 |
|
| 24 |
+
| ๐ฏ๐ต Japanese | 834 | 96.88 | 1.92 | 1.20 |
|
| 25 |
+
| ๐ฐ๐ท Korean | 890 | 96.74 | 1.12 | 2.13 |
|
| 26 |
+
| ๐ฉ๐ช German | 1,322 | 96.22 | 2.42 | 1.36 |
|
| 27 |
+
| ๐ซ๐ท French | 1,253 | 96.17 | 1.52 | 2.31 |
|
| 28 |
+
| ๐ณ๐ฑ Dutch | 1,401 | 96.15 | 2.00 | 1.86 |
|
| 29 |
+
| ๐ต๐น Portuguese | 1,398 | 95.42 | 2.79 | 1.79 |
|
| 30 |
+
| ๐ฎ๐น Italian | 782 | 94.88 | 3.07 | 2.05 |
|
| 31 |
+
| ๐ซ๐ฎ Finnish | 1,010 | 94.85 | 3.17 | 1.98 |
|
| 32 |
+
| ๐ฎ๐ฉ Indonesian | 971 | 94.54 | 4.12 | 1.34 |
|
| 33 |
+
| ๐บ๐ฆ Ukrainian | 929 | 94.51 | 2.80 | 2.69 |
|
| 34 |
+
| ๐ต๐ฑ Polish | 976 | 94.47 | 2.87 | 2.66 |
|
| 35 |
+
| ๐ณ๐ด Norwegian | 1,014 | 93.98 | 3.55 | 2.47 |
|
| 36 |
+
| ๐ท๐บ Russian | 1,470 | 93.54 | 3.33 | 3.13 |
|
| 37 |
+
| ๐ฎ๐ณ Hindi | 1,295 | 93.36 | 4.40 | 2.24 |
|
| 38 |
+
| ๐ฉ๐ฐ Danish | 779 | 93.07 | 4.88 | 2.05 |
|
| 39 |
+
| ๐ธ๐ฆ Arabic | 947 | 88.60 | 6.97 | 4.44 |
|
| 40 |
+
| ๐จ๐ณ Chinese | 945 | 88.57 | 4.76 | 6.67 |
|
| 41 |
+
| ๐ฌ๐ง ๐บ๐ธ English | 7,722 | 88.31 | 6.00 | 5.70 |
|
| 42 |
+
| ๐ฎ๐ณ Marathi | 774 | 87.47 | 8.53 | 4.01 |
|
| 43 |
+
| ๐ช๐ธ Spanish | 1,791 | 86.71 | 4.69 | 8.60 |
|
| 44 |
+
| ๐ง๐ฉ Bengali | 1,000 | 84.10 | 10.90 | 5.00 |
|
| 45 |
+
| ๐ป๐ณ Vietnamese | 1,004 | 81.57 | 14.94 | 3.49 |
|
| 46 |
+
|
| 47 |
+
### Performance by Dataset
|
| 48 |
+
| Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 49 |
+
|---------|--------------|--------------|---------------------|---------------------|
|
| 50 |
+
| rime_2 | 396 | 99.75 | 0.00 | 0.25 |
|
| 51 |
+
| human_5 | 402 | 96.27 | 1.00 | 2.74 |
|
| 52 |
+
| chirp3_1 | 16,300 | 94.53 | 2.93 | 2.53 |
|
| 53 |
+
| orpheus_endfiller_1 | 182 | 94.51 | 0.00 | 5.49 |
|
| 54 |
+
| orpheus_grammar_1 | 163 | 92.64 | 3.68 | 3.68 |
|
| 55 |
+
| orpheus_midfiller_1 | 140 | 91.43 | 3.57 | 5.00 |
|
| 56 |
+
| human_convcollector_1 | 90 | 91.11 | 3.33 | 5.56 |
|
| 57 |
+
| chirp3_2 | 8,428 | 90.27 | 6.68 | 3.05 |
|
| 58 |
+
| midcentury_1 | 1,044 | 85.44 | 11.78 | 2.78 |
|
| 59 |
+
| liva_1 | 3,832 | 84.68 | 6.92 | 8.40 |
|
| 60 |
+
| mundo_1 | 496 | 72.78 | 5.24 | 21.98 |
|
| 61 |
+
|
benchmarks/smart-turn-v3.1-cpu.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Endpointing Model Benchmark Report
|
| 2 |
+
|
| 3 |
+
**Model:** `/data/smart-turn-v3.1-cpu.onnx`
|
| 4 |
+
|
| 5 |
+
**Generated:** 2025-12-03 16:13:05 UTC
|
| 6 |
+
|
| 7 |
+
## Accuracy Results
|
| 8 |
+
|
| 9 |
+
**Total Samples:** 31,473
|
| 10 |
+
|
| 11 |
+
**Unique Languages:** ๐ธ๐ฆ Arabic, ๐ง๐ฉ Bengali, ๐ฉ๐ฐ Danish, ๐ฉ๐ช German, ๐ฌ๐ง ๐บ๐ธ English, ๐ซ๐ฎ Finnish, ๐ซ๐ท French, ๐ฎ๐ณ Hindi, ๐ฎ๐ฉ Indonesian, ๐ฎ๐น Italian, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ฎ๐ณ Marathi, ๐ณ๐ฑ Dutch, ๐ณ๐ด Norwegian, ๐ต๐ฑ Polish, ๐ต๐น Portuguese, ๐ท๐บ Russian, ๐ช๐ธ Spanish, ๐น๐ท Turkish, ๐บ๐ฆ Ukrainian, ๐ป๐ณ Vietnamese, ๐จ๐ณ Chinese
|
| 12 |
+
|
| 13 |
+
**Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
|
| 14 |
+
|
| 15 |
+
### Overall Performance
|
| 16 |
+
| Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 17 |
+
|--------|--------------|--------------|---------------------|---------------------|
|
| 18 |
+
| Overall | 31,473 | 93.02 | 3.07 | 3.91 |
|
| 19 |
+
|
| 20 |
+
### Performance by Language
|
| 21 |
+
| Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 22 |
+
|----------|--------------|--------------|---------------------|---------------------|
|
| 23 |
+
| ๐ฏ๐ต Japanese | 834 | 97.36 | 1.44 | 1.20 |
|
| 24 |
+
| ๐ฐ๐ท Korean | 890 | 96.97 | 1.01 | 2.02 |
|
| 25 |
+
| ๐น๐ท Turkish | 966 | 96.27 | 2.07 | 1.66 |
|
| 26 |
+
| ๐ณ๐ฑ Dutch | 1,401 | 95.93 | 1.57 | 2.50 |
|
| 27 |
+
| ๐ซ๐ท French | 1,253 | 95.45 | 1.84 | 2.71 |
|
| 28 |
+
| ๐ฉ๐ช German | 1,322 | 95.39 | 2.27 | 2.34 |
|
| 29 |
+
| ๐ฎ๐น Italian | 782 | 95.14 | 2.30 | 2.56 |
|
| 30 |
+
| ๐ต๐ฑ Polish | 976 | 95.08 | 3.07 | 1.84 |
|
| 31 |
+
| ๐ซ๐ฎ Finnish | 1,010 | 94.85 | 2.48 | 2.67 |
|
| 32 |
+
| ๐ต๐น Portuguese | 1,398 | 94.71 | 1.57 | 3.72 |
|
| 33 |
+
| ๐ฌ๐ง ๐บ๐ธ English | 7,722 | 94.69 | 2.24 | 3.07 |
|
| 34 |
+
| ๐ฎ๐ฉ Indonesian | 971 | 94.34 | 2.47 | 3.19 |
|
| 35 |
+
| ๐ฉ๐ฐ Danish | 779 | 94.09 | 3.21 | 2.70 |
|
| 36 |
+
| ๐ท๐บ Russian | 1,470 | 93.61 | 2.93 | 3.47 |
|
| 37 |
+
| ๐บ๐ฆ Ukrainian | 929 | 93.33 | 3.01 | 3.66 |
|
| 38 |
+
| ๐ณ๐ด Norwegian | 1,014 | 93.00 | 3.06 | 3.94 |
|
| 39 |
+
| ๐ฎ๐ณ Hindi | 1,295 | 92.90 | 4.09 | 3.01 |
|
| 40 |
+
| ๐ช๐ธ Spanish | 1,791 | 90.12 | 4.97 | 4.91 |
|
| 41 |
+
| ๐ธ๐ฆ Arabic | 947 | 88.60 | 5.60 | 5.81 |
|
| 42 |
+
| ๐ฎ๐ณ Marathi | 774 | 87.34 | 7.24 | 5.43 |
|
| 43 |
+
| ๐จ๐ณ Chinese | 945 | 85.93 | 3.49 | 10.58 |
|
| 44 |
+
| ๐ง๐ฉ Bengali | 1,000 | 84.90 | 8.10 | 7.00 |
|
| 45 |
+
| ๐ป๐ณ Vietnamese | 1,004 | 77.39 | 6.47 | 16.14 |
|
| 46 |
+
|
| 47 |
+
### Performance by Dataset
|
| 48 |
+
| Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 49 |
+
|---------|--------------|--------------|---------------------|---------------------|
|
| 50 |
+
| midcentury_1 | 1,044 | 99.04 | 0.10 | 0.86 |
|
| 51 |
+
| orpheus_endfiller_1 | 182 | 98.90 | 0.00 | 1.10 |
|
| 52 |
+
| rime_2 | 396 | 98.23 | 0.25 | 1.52 |
|
| 53 |
+
| human_5 | 402 | 97.76 | 0.50 | 1.74 |
|
| 54 |
+
| liva_1 | 3,832 | 94.18 | 2.71 | 3.11 |
|
| 55 |
+
| chirp3_1 | 16,300 | 94.07 | 2.55 | 3.38 |
|
| 56 |
+
| chirp3_2 | 8,428 | 89.68 | 4.60 | 5.72 |
|
| 57 |
+
| orpheus_grammar_1 | 163 | 89.57 | 6.13 | 4.29 |
|
| 58 |
+
| human_convcollector_1 | 90 | 88.89 | 2.22 | 8.89 |
|
| 59 |
+
| orpheus_midfiller_1 | 140 | 88.57 | 2.86 | 8.57 |
|
| 60 |
+
| mundo_1 | 496 | 86.69 | 7.66 | 5.65 |
|
| 61 |
+
|
benchmarks/smart-turn-v3.1-gpu.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Endpointing Model Benchmark Report
|
| 2 |
+
|
| 3 |
+
**Model:** `/data/smart-turn-v3.1-gpu.onnx`
|
| 4 |
+
|
| 5 |
+
**Generated:** 2025-12-03 16:21:25 UTC
|
| 6 |
+
|
| 7 |
+
## Accuracy Results
|
| 8 |
+
|
| 9 |
+
**Total Samples:** 31,473
|
| 10 |
+
|
| 11 |
+
**Unique Languages:** ๐ธ๐ฆ Arabic, ๐ง๐ฉ Bengali, ๐ฉ๐ฐ Danish, ๐ฉ๐ช German, ๐ฌ๐ง ๐บ๐ธ English, ๐ซ๐ฎ Finnish, ๐ซ๐ท French, ๐ฎ๐ณ Hindi, ๐ฎ๐ฉ Indonesian, ๐ฎ๐น Italian, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ฎ๐ณ Marathi, ๐ณ๐ฑ Dutch, ๐ณ๐ด Norwegian, ๐ต๐ฑ Polish, ๐ต๐น Portuguese, ๐ท๐บ Russian, ๐ช๐ธ Spanish, ๐น๐ท Turkish, ๐บ๐ฆ Ukrainian, ๐ป๐ณ Vietnamese, ๐จ๐ณ Chinese
|
| 12 |
+
|
| 13 |
+
**Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
|
| 14 |
+
|
| 15 |
+
### Overall Performance
|
| 16 |
+
| Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 17 |
+
|--------|--------------|--------------|---------------------|---------------------|
|
| 18 |
+
| Overall | 31,473 | 93.98 | 3.21 | 2.81 |
|
| 19 |
+
|
| 20 |
+
### Performance by Language
|
| 21 |
+
| Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 22 |
+
|----------|--------------|--------------|---------------------|---------------------|
|
| 23 |
+
| ๐ฏ๐ต Japanese | 834 | 98.08 | 0.84 | 1.08 |
|
| 24 |
+
| ๐ฐ๐ท Korean | 890 | 97.98 | 0.79 | 1.24 |
|
| 25 |
+
| ๐น๐ท Turkish | 966 | 97.52 | 1.24 | 1.24 |
|
| 26 |
+
| ๐ณ๐ฑ Dutch | 1,401 | 96.79 | 1.57 | 1.64 |
|
| 27 |
+
| ๐ฉ๐ช German | 1,322 | 96.37 | 2.42 | 1.21 |
|
| 28 |
+
| ๐ซ๐ท French | 1,253 | 96.09 | 1.68 | 2.23 |
|
| 29 |
+
| ๐ต๐น Portuguese | 1,398 | 95.99 | 2.15 | 1.86 |
|
| 30 |
+
| ๐ซ๐ฎ Finnish | 1,010 | 95.74 | 2.18 | 2.08 |
|
| 31 |
+
| ๐ต๐ฑ Polish | 976 | 95.59 | 2.56 | 1.84 |
|
| 32 |
+
| ๐ฎ๐ฉ Indonesian | 971 | 95.57 | 2.47 | 1.96 |
|
| 33 |
+
| ๐ฌ๐ง ๐บ๐ธ English | 7,722 | 95.55 | 2.55 | 1.90 |
|
| 34 |
+
| ๐ฎ๐น Italian | 782 | 95.52 | 2.81 | 1.66 |
|
| 35 |
+
| ๐ท๐บ Russian | 1,470 | 94.15 | 2.93 | 2.93 |
|
| 36 |
+
| ๐ฉ๐ฐ Danish | 779 | 94.09 | 3.34 | 2.57 |
|
| 37 |
+
| ๐บ๐ฆ Ukrainian | 929 | 93.97 | 2.58 | 3.44 |
|
| 38 |
+
| ๐ณ๐ด Norwegian | 1,014 | 93.69 | 3.25 | 3.06 |
|
| 39 |
+
| ๐ฎ๐ณ Hindi | 1,295 | 93.36 | 3.17 | 3.47 |
|
| 40 |
+
| ๐ช๐ธ Spanish | 1,791 | 90.95 | 5.75 | 3.29 |
|
| 41 |
+
| ๐จ๐ณ Chinese | 945 | 88.99 | 4.34 | 6.67 |
|
| 42 |
+
| ๐ธ๐ฆ Arabic | 947 | 88.91 | 6.44 | 4.65 |
|
| 43 |
+
| ๐ฎ๐ณ Marathi | 774 | 88.24 | 6.20 | 5.56 |
|
| 44 |
+
| ๐ง๐ฉ Bengali | 1,000 | 85.10 | 7.10 | 7.80 |
|
| 45 |
+
| ๐ป๐ณ Vietnamese | 1,004 | 81.87 | 9.86 | 8.27 |
|
| 46 |
+
|
| 47 |
+
### Performance by Dataset
|
| 48 |
+
| Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|
| 49 |
+
|---------|--------------|--------------|---------------------|---------------------|
|
| 50 |
+
| midcentury_1 | 1,044 | 99.52 | 0.10 | 0.38 |
|
| 51 |
+
| human_5 | 402 | 98.51 | 0.50 | 1.00 |
|
| 52 |
+
| orpheus_endfiller_1 | 182 | 98.35 | 0.00 | 1.65 |
|
| 53 |
+
| rime_2 | 396 | 97.98 | 0.25 | 1.77 |
|
| 54 |
+
| liva_1 | 3,832 | 95.12 | 2.97 | 1.91 |
|
| 55 |
+
| chirp3_1 | 16,300 | 95.10 | 2.58 | 2.32 |
|
| 56 |
+
| orpheus_grammar_1 | 163 | 93.87 | 4.29 | 1.84 |
|
| 57 |
+
| chirp3_2 | 8,428 | 90.76 | 4.84 | 4.40 |
|
| 58 |
+
| human_convcollector_1 | 90 | 88.89 | 6.67 | 4.44 |
|
| 59 |
+
| orpheus_midfiller_1 | 140 | 87.86 | 5.00 | 7.14 |
|
| 60 |
+
| mundo_1 | 496 | 85.69 | 8.87 | 5.44 |
|
| 61 |
+
|
smart-turn-v3.1-cpu.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fb68d55c2d542ce79e44b12013bfd571e90df8594ab096d757198e851b0c6594
|
| 3 |
+
size 8679180
|
smart-turn-v3.1-gpu.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a32f7445d5076029472b6c9f7a71005df576ea19d5f929021200f535b962af84
|
| 3 |
+
size 32411198
|