smart-turn-v3 / README.md

marcus-daily

Smart Turn v3.1

16c8130 8 days ago

1.25 kB

metadata

pipeline_tag: voice-activity-detection
license: bsd-2-clause
tags:
  - speech-processing
  - semantic-vad
  - multilingual
datasets:
  - pipecat-ai/smart-turn-data-v3.1-train
  - pipecat-ai/smart-turn-data-v3.1-test

Smart Turn v3.x

Smart Turn is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.

Model architecture

Backbone: Whisper Tiny encoder
Head: shallow linear classifier
Params: 8M
Checkpoint: 8 MB ONNX (int8 quantized), 32MB ONNX (unquantized)

How to use

Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.

Thanks

Thank you to the following organisations for contributing audio datasets:

pipecat-ai
/

smart-turn-v3

Smart Turn v3.x

Links

Model architecture

How to use

Thanks

Smart Turn v3.x

Links

Model architecture

How to use

Thanks

Smart Turn v3.x