Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
marcus-daily commited on
Commit
16c8130
ยท
1 Parent(s): 633cfd0

Smart Turn v3.1

Browse files
README.md CHANGED
@@ -6,29 +6,39 @@ tags:
6
  - semantic-vad
7
  - multilingual
8
  datasets:
9
- - pipecat-ai/smart-turn-data-v3-train
10
- - pipecat-ai/smart-turn-data-v3-test
11
  ---
12
 
13
- # Smartย Turnโ€ฏv3
14
 
15
- **Smartโ€ฏTurnโ€ฏv3** is an openโ€‘source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
16
 
17
  ## Links
18
 
19
  * [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/)
20
- * [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code
21
- * [Datasets](https://github.com/pipecat-ai/datasets) with training and inference code
22
 
23
 
24
  ## Modelย architecture
25
 
26
  * Backbone: Whisper Tiny encoder
27
  * Head: shallow linear classifier
28
- * Params: 8โ€ฏM (int8)
29
- * Checkpoint: 8โ€ฏMB ONNX
30
 
31
 
32
  ## How to use
33
 
34
  Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.
 
 
 
 
 
 
 
 
 
 
 
6
  - semantic-vad
7
  - multilingual
8
  datasets:
9
+ - pipecat-ai/smart-turn-data-v3.1-train
10
+ - pipecat-ai/smart-turn-data-v3.1-test
11
  ---
12
 
13
+ # Smartย Turnโ€ฏv3.x
14
 
15
+ **Smartโ€ฏTurn** is an openโ€‘source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
16
 
17
  ## Links
18
 
19
  * [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/)
20
+ * [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code, and more information
21
+ * [Datasets](https://huggingface.co/pipecat-ai/datasets)
22
 
23
 
24
  ## Modelย architecture
25
 
26
  * Backbone: Whisper Tiny encoder
27
  * Head: shallow linear classifier
28
+ * Params: 8M
29
+ * Checkpoint: 8โ€ฏMB ONNX (int8 quantized), 32MB ONNX (unquantized)
30
 
31
 
32
  ## How to use
33
 
34
  Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.
35
+
36
+
37
+ ## Thanks
38
+
39
+ Thank you to the following organisations for contributing audio datasets:
40
+
41
+ - [Liva AI](https://www.theliva.ai/)
42
+ - [Midcentury](https://www.midcentury.xyz/)
43
+ - [MundoAI](https://mundoai.world/)
44
+
benchmarks/smart-turn-v3.0.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Endpointing Model Benchmark Report
2
+
3
+ **Model:** `/data/smart-turn-v3.0.onnx`
4
+
5
+ **Generated:** 2025-12-03 16:04:09 UTC
6
+
7
+ ## Accuracy Results
8
+
9
+ **Total Samples:** 31,473
10
+
11
+ **Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
12
+
13
+ **Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
14
+
15
+ ### Overall Performance
16
+ | Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
17
+ |--------|--------------|--------------|---------------------|---------------------|
18
+ | Overall | 31,473 | 91.60 | 4.68 | 3.72 |
19
+
20
+ ### Performance by Language
21
+ | Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
22
+ |----------|--------------|--------------|---------------------|---------------------|
23
+ | ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 97.10 | 1.66 | 1.24 |
24
+ | ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 96.88 | 1.92 | 1.20 |
25
+ | ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 890 | 96.74 | 1.12 | 2.13 |
26
+ | ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 96.22 | 2.42 | 1.36 |
27
+ | ๐Ÿ‡ซ๐Ÿ‡ท French | 1,253 | 96.17 | 1.52 | 2.31 |
28
+ | ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,401 | 96.15 | 2.00 | 1.86 |
29
+ | ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 95.42 | 2.79 | 1.79 |
30
+ | ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 94.88 | 3.07 | 2.05 |
31
+ | ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 94.85 | 3.17 | 1.98 |
32
+ | ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 94.54 | 4.12 | 1.34 |
33
+ | ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 94.51 | 2.80 | 2.69 |
34
+ | ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 976 | 94.47 | 2.87 | 2.66 |
35
+ | ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 93.98 | 3.55 | 2.47 |
36
+ | ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,470 | 93.54 | 3.33 | 3.13 |
37
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,295 | 93.36 | 4.40 | 2.24 |
38
+ | ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 93.07 | 4.88 | 2.05 |
39
+ | ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 88.60 | 6.97 | 4.44 |
40
+ | ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 945 | 88.57 | 4.76 | 6.67 |
41
+ | ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,722 | 88.31 | 6.00 | 5.70 |
42
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 87.47 | 8.53 | 4.01 |
43
+ | ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,791 | 86.71 | 4.69 | 8.60 |
44
+ | ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 84.10 | 10.90 | 5.00 |
45
+ | ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 81.57 | 14.94 | 3.49 |
46
+
47
+ ### Performance by Dataset
48
+ | Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
49
+ |---------|--------------|--------------|---------------------|---------------------|
50
+ | rime_2 | 396 | 99.75 | 0.00 | 0.25 |
51
+ | human_5 | 402 | 96.27 | 1.00 | 2.74 |
52
+ | chirp3_1 | 16,300 | 94.53 | 2.93 | 2.53 |
53
+ | orpheus_endfiller_1 | 182 | 94.51 | 0.00 | 5.49 |
54
+ | orpheus_grammar_1 | 163 | 92.64 | 3.68 | 3.68 |
55
+ | orpheus_midfiller_1 | 140 | 91.43 | 3.57 | 5.00 |
56
+ | human_convcollector_1 | 90 | 91.11 | 3.33 | 5.56 |
57
+ | chirp3_2 | 8,428 | 90.27 | 6.68 | 3.05 |
58
+ | midcentury_1 | 1,044 | 85.44 | 11.78 | 2.78 |
59
+ | liva_1 | 3,832 | 84.68 | 6.92 | 8.40 |
60
+ | mundo_1 | 496 | 72.78 | 5.24 | 21.98 |
61
+
benchmarks/smart-turn-v3.1-cpu.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Endpointing Model Benchmark Report
2
+
3
+ **Model:** `/data/smart-turn-v3.1-cpu.onnx`
4
+
5
+ **Generated:** 2025-12-03 16:13:05 UTC
6
+
7
+ ## Accuracy Results
8
+
9
+ **Total Samples:** 31,473
10
+
11
+ **Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
12
+
13
+ **Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
14
+
15
+ ### Overall Performance
16
+ | Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
17
+ |--------|--------------|--------------|---------------------|---------------------|
18
+ | Overall | 31,473 | 93.02 | 3.07 | 3.91 |
19
+
20
+ ### Performance by Language
21
+ | Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
22
+ |----------|--------------|--------------|---------------------|---------------------|
23
+ | ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 97.36 | 1.44 | 1.20 |
24
+ | ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 890 | 96.97 | 1.01 | 2.02 |
25
+ | ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 96.27 | 2.07 | 1.66 |
26
+ | ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,401 | 95.93 | 1.57 | 2.50 |
27
+ | ๐Ÿ‡ซ๐Ÿ‡ท French | 1,253 | 95.45 | 1.84 | 2.71 |
28
+ | ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 95.39 | 2.27 | 2.34 |
29
+ | ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 95.14 | 2.30 | 2.56 |
30
+ | ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 976 | 95.08 | 3.07 | 1.84 |
31
+ | ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 94.85 | 2.48 | 2.67 |
32
+ | ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 94.71 | 1.57 | 3.72 |
33
+ | ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,722 | 94.69 | 2.24 | 3.07 |
34
+ | ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 94.34 | 2.47 | 3.19 |
35
+ | ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 94.09 | 3.21 | 2.70 |
36
+ | ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,470 | 93.61 | 2.93 | 3.47 |
37
+ | ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 93.33 | 3.01 | 3.66 |
38
+ | ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 93.00 | 3.06 | 3.94 |
39
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,295 | 92.90 | 4.09 | 3.01 |
40
+ | ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,791 | 90.12 | 4.97 | 4.91 |
41
+ | ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 88.60 | 5.60 | 5.81 |
42
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 87.34 | 7.24 | 5.43 |
43
+ | ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 945 | 85.93 | 3.49 | 10.58 |
44
+ | ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 84.90 | 8.10 | 7.00 |
45
+ | ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 77.39 | 6.47 | 16.14 |
46
+
47
+ ### Performance by Dataset
48
+ | Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
49
+ |---------|--------------|--------------|---------------------|---------------------|
50
+ | midcentury_1 | 1,044 | 99.04 | 0.10 | 0.86 |
51
+ | orpheus_endfiller_1 | 182 | 98.90 | 0.00 | 1.10 |
52
+ | rime_2 | 396 | 98.23 | 0.25 | 1.52 |
53
+ | human_5 | 402 | 97.76 | 0.50 | 1.74 |
54
+ | liva_1 | 3,832 | 94.18 | 2.71 | 3.11 |
55
+ | chirp3_1 | 16,300 | 94.07 | 2.55 | 3.38 |
56
+ | chirp3_2 | 8,428 | 89.68 | 4.60 | 5.72 |
57
+ | orpheus_grammar_1 | 163 | 89.57 | 6.13 | 4.29 |
58
+ | human_convcollector_1 | 90 | 88.89 | 2.22 | 8.89 |
59
+ | orpheus_midfiller_1 | 140 | 88.57 | 2.86 | 8.57 |
60
+ | mundo_1 | 496 | 86.69 | 7.66 | 5.65 |
61
+
benchmarks/smart-turn-v3.1-gpu.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Endpointing Model Benchmark Report
2
+
3
+ **Model:** `/data/smart-turn-v3.1-gpu.onnx`
4
+
5
+ **Generated:** 2025-12-03 16:21:25 UTC
6
+
7
+ ## Accuracy Results
8
+
9
+ **Total Samples:** 31,473
10
+
11
+ **Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
12
+
13
+ **Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
14
+
15
+ ### Overall Performance
16
+ | Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
17
+ |--------|--------------|--------------|---------------------|---------------------|
18
+ | Overall | 31,473 | 93.98 | 3.21 | 2.81 |
19
+
20
+ ### Performance by Language
21
+ | Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
22
+ |----------|--------------|--------------|---------------------|---------------------|
23
+ | ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 98.08 | 0.84 | 1.08 |
24
+ | ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 890 | 97.98 | 0.79 | 1.24 |
25
+ | ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 97.52 | 1.24 | 1.24 |
26
+ | ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,401 | 96.79 | 1.57 | 1.64 |
27
+ | ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 96.37 | 2.42 | 1.21 |
28
+ | ๐Ÿ‡ซ๐Ÿ‡ท French | 1,253 | 96.09 | 1.68 | 2.23 |
29
+ | ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 95.99 | 2.15 | 1.86 |
30
+ | ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 95.74 | 2.18 | 2.08 |
31
+ | ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 976 | 95.59 | 2.56 | 1.84 |
32
+ | ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 95.57 | 2.47 | 1.96 |
33
+ | ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,722 | 95.55 | 2.55 | 1.90 |
34
+ | ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 95.52 | 2.81 | 1.66 |
35
+ | ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,470 | 94.15 | 2.93 | 2.93 |
36
+ | ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 94.09 | 3.34 | 2.57 |
37
+ | ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 93.97 | 2.58 | 3.44 |
38
+ | ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 93.69 | 3.25 | 3.06 |
39
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,295 | 93.36 | 3.17 | 3.47 |
40
+ | ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,791 | 90.95 | 5.75 | 3.29 |
41
+ | ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 945 | 88.99 | 4.34 | 6.67 |
42
+ | ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 88.91 | 6.44 | 4.65 |
43
+ | ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 88.24 | 6.20 | 5.56 |
44
+ | ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 85.10 | 7.10 | 7.80 |
45
+ | ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 81.87 | 9.86 | 8.27 |
46
+
47
+ ### Performance by Dataset
48
+ | Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
49
+ |---------|--------------|--------------|---------------------|---------------------|
50
+ | midcentury_1 | 1,044 | 99.52 | 0.10 | 0.38 |
51
+ | human_5 | 402 | 98.51 | 0.50 | 1.00 |
52
+ | orpheus_endfiller_1 | 182 | 98.35 | 0.00 | 1.65 |
53
+ | rime_2 | 396 | 97.98 | 0.25 | 1.77 |
54
+ | liva_1 | 3,832 | 95.12 | 2.97 | 1.91 |
55
+ | chirp3_1 | 16,300 | 95.10 | 2.58 | 2.32 |
56
+ | orpheus_grammar_1 | 163 | 93.87 | 4.29 | 1.84 |
57
+ | chirp3_2 | 8,428 | 90.76 | 4.84 | 4.40 |
58
+ | human_convcollector_1 | 90 | 88.89 | 6.67 | 4.44 |
59
+ | orpheus_midfiller_1 | 140 | 87.86 | 5.00 | 7.14 |
60
+ | mundo_1 | 496 | 85.69 | 8.87 | 5.44 |
61
+
smart-turn-v3.1-cpu.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb68d55c2d542ce79e44b12013bfd571e90df8594ab096d757198e851b0c6594
3
+ size 8679180
smart-turn-v3.1-gpu.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a32f7445d5076029472b6c9f7a71005df576ea19d5f929021200f535b962af84
3
+ size 32411198