Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS.
What Changed from v2 to v3
Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. Version 2 supported English, Korean, Spanish, Portuguese, and French. Version 3 adds Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Estonian, Finnish, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese — 31 total ISO language codes. There is also a special na fallback for text whose language is unknown or outside the supported set.
The model grows modestly to accommodate the added languages. At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. The update also brings the total disk footprint of the public ONNX assets to 404 MB. Additionally, Supertone recently launched the Voice Builder, allowing developers to create custom, edge-native TTS models from their own voice recordings.
Expressive Tags
One new capability in v3 that wasn’t present in v2 is expressive tag support. Supertonic 3 supports simple expression tags such as <laugh>, <breath>, and <sigh>. These let you embed prosodic cues directly into input text without a separate preprocessing step or a separate model for expressiveness. For engineers building voice interfaces or accessibility tools, this means you can specify breathing pauses or laughter inline in your text payload.
Architecture and Runtime
The underlying architecture carries over from prior versions: a speech autoencoder that encodes waveforms into continuous latent representations, a flow-matching based text-to-latent module that maps text to audio features, and a duration predictor that controls natural timing. Flow matching is a generative modeling technique that learns a vector field to transform a simple distribution into a target distribution — it samples faster than diffusion models at low step counts, which is why Supertonic can produce usable output in just 2 inference steps. To further refine output, v3 integrates Length-Aware Rotary Position Embedding (LARoPE) for superior text-speech alignment and utilizes a Self-Purifying Flow Matching technique during training to remain robust against noisy data labels.
On runtime efficiency, Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.
Reading Accuracy
Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. WER (Word Error Rate) and CER (Character Error Rate) are standard TTS readability metrics: you synthesize a passage, run ASR over the output, and compare the transcription to the original text. CER is used for languages without clear word boundaries; the others use WER. The system’s efficiency is best demonstrated on extreme edge hardware; it achieves an average RTF of 0.3x on an Onyx Boox Go 6 (an E-ink e-reader) in airplane mode. Furthermore, the ecosystem has expanded to include Flutter (with macOS support), .NET 9, and Go, while the web implementation leverages onnxruntime-web for pure client-side execution.
Text Normalization
A differentiating property carried forward from v2 is built-in text normalization. Supertonic handles complex surface forms — financial expressions like $5.2M, phone numbers with area codes and extensions like (212) 555-0142 ext. 402, time and date formats like 4:45 PM on Wed, Apr 3, 2024, and technical units like 2.3h and 30kph — without any preprocessing pipeline or phonetic annotations. The financial expression “$5.2M” must read as “five point two million dollars,” and “$450K” as “four hundred fifty thousand dollars.” All four competing systems failed this. The technical unit “2.3h” must read as “two point three hours” and “30kph” as “thirty kilometers per hour.” All four competitors also failed this category. The competing systems evaluated include ElevenLabs Flash v2.5, OpenAI TTS-1, Gemini 2.5 Flash TTS, and Microsoft.
https://github.com/supertone-inc/supertonic
Getting Started
The Python SDK install is pip install supertonic. On first run, the SDK downloads the model assets from Hugging Face automatically. A minimal example:
Copy CodeCopiedUse a different Browserfrom supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name=”M1″)
text = “A gentle breeze moved through the open window while everyone listened to the story.”
wav, duration = tts.synthesize(text, voice_style=style, lang=”en”)
tts.save_audio(wav, “output.wav”)
print(f”Generated {duration:.2f}s of audio”)
Marktechpost’s Visual Explainer
/* =============================================
SUPERTONIC 3 GUIDE — WORDPRESS EMBED
Google Color Theme | Slider Format
Scoped to #st3-guide
============================================= */
#st3-guide *,
#st3-guide *::before,
#st3-guide *::after {
box-sizing: border-box !important;
margin: 0 !important;
padding: 0 !important;
}
#st3-guide hr,
#st3-guide p:empty,
#st3-guide del,
#st3-guide s {
display: none !important;
}
#st3-guide {
font-family: ‘Google Sans’, ‘Nunito Sans’, ‘Segoe UI’, sans-serif !important;
background: #ffffff !important;
border: 1px solid #DADCE0 !important;
border-radius: 16px !important;
overflow: hidden !important;
max-width: 820px !important;
margin: 0 auto !important;
box-shadow: 0 1px 3px rgba(60,64,67,0.12), 0 4px 16px rgba(60,64,67,0.10) !important;
position: relative !important;
}
/* Google Font import */
#st3-guide::before {
content: ” !important;
display: none !important;
}
/* —- TOP BAR —- */
#st3-guide .st3-topbar {
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
padding: 14px 24px !important;
border-bottom: 1px solid #DADCE0 !important;
background: #ffffff !important;
}
#st3-guide .st3-logo {
display: flex !important;
align-items: center !important;
gap: 10px !important;
}
#st3-guide .st3-logo-dots {
display: flex !important;
gap: 4px !important;
}
#st3-guide .st3-logo-dots span {
width: 10px !important;
height: 10px !important;
border-radius: 50% !important;
display: block !important;
}
#st3-guide .st3-logo-dots span:nth-child(1) { background: #4285F4 !important; }
#st3-guide .st3-logo-dots span:nth-child(2) { background: #EA4335 !important; }
#st3-guide .st3-logo-dots span:nth-child(3) { background: #FBBC05 !important; }
#st3-guide .st3-logo-dots span:nth-child(4) { background: #34A853 !important; }
#st3-guide .st3-logo-text {
font-size: 13px !important;
font-weight: 600 !important;
color: #5F6368 !important;
letter-spacing: 0.3px !important;
}
#st3-guide .st3-slide-label {
font-size: 12px !important;
color: #5F6368 !important;
font-weight: 500 !important;
background: #F1F3F4 !important;
border-radius: 20px !important;
padding: 4px 12px !important;
}
/* —- SLIDER WRAPPER —- */
#st3-guide .st3-slider-wrap {
overflow: hidden !important;
position: relative !important;
background: #fff !important;
}
#st3-guide .st3-track {
display: flex !important;
transition: transform 0.42s cubic-bezier(0.4, 0, 0.2, 1) !important;
will-change: transform !important;
}
#st3-guide .st3-slide {
min-width: 100% !important;
padding: 36px 40px 32px !important;
position: relative !important;
background: #ffffff !important;
}
/* —- SLIDE ACCENT BAR —- */
#st3-guide .st3-slide-accent {
width: 40px !important;
height: 4px !important;
border-radius: 2px !important;
margin-bottom: 20px !important;
display: block !important;
}
/* —- TYPOGRAPHY —- */
#st3-guide .st3-tag {
display: inline-block !important;
font-size: 11px !important;
font-weight: 700 !important;
letter-spacing: 1.2px !important;
text-transform: uppercase !important;
border-radius: 4px !important;
padding: 3px 9px !important;
margin-bottom: 14px !important;
background: #E8F0FE !important;
color: #4285F4 !important;
}
#st3-guide .st3-tag.red { background: #FCE8E6 !important; color: #EA4335 !important; }
#st3-guide .st3-tag.green { background: #E6F4EA !important; color: #34A853 !important; }
#st3-guide .st3-tag.yellow{ background: #FEF7E0 !important; color: #E37400 !important; }
#st3-guide .st3-h1 {
font-size: 26px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1.25 !important;
margin-bottom: 12px !important;
letter-spacing: -0.3px !important;
}
#st3-guide .st3-h2 {
font-size: 20px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1.3 !important;
margin-bottom: 10px !important;
}
#st3-guide .st3-sub {
font-size: 14px !important;
color: #5F6368 !important;
line-height: 1.65 !important;
margin-bottom: 24px !important;
max-width: 600px !important;
}
/* —- STAT PILLS (slide 1) —- */
#st3-guide .st3-stats {
display: flex !important;
gap: 12px !important;
flex-wrap: wrap !important;
margin-top: 8px !important;
}
#st3-guide .st3-stat {
background: #F1F3F4 !important;
border-radius: 12px !important;
padding: 14px 20px !important;
display: flex !important;
flex-direction: column !important;
gap: 4px !important;
min-width: 130px !important;
}
#st3-guide .st3-stat-val {
font-size: 22px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1 !important;
}
#st3-guide .st3-stat-val.blue { color: #4285F4 !important; }
#st3-guide .st3-stat-val.red { color: #EA4335 !important; }
#st3-guide .st3-stat-val.green { color: #34A853 !important; }
#st3-guide .st3-stat-val.yellow { color: #E37400 !important; }
#st3-guide .st3-stat-lbl {
font-size: 12px !important;
color: #5F6368 !important;
font-weight: 500 !important;
}
/* —- WHAT’S NEW LIST —- */
#st3-guide .st3-newlist {
list-style: none !important;
display: flex !important;
flex-direction: column !important;
gap: 12px !important;
}
#st3-guide .st3-newlist li {
display: flex !important;
align-items: flex-start !important;
gap: 12px !important;
font-size: 14px !important;
color: #202124 !important;
line-height: 1.55 !important;
}
#st3-guide .st3-newlist li .st3-icon {
width: 26px !important;
height: 26px !important;
border-radius: 8px !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
font-size: 13px !important;
flex-shrink: 0 !important;
margin-top: 1px !important;
}
#st3-guide .st3-newlist li .st3-icon.blue { background: #E8F0FE !important; }
#st3-guide .st3-newlist li .st3-icon.green { background: #E6F4EA !important; }
#st3-guide .st3-newlist li .st3-icon.red { background: #FCE8E6 !important; }
#st3-guide .st3-newlist li .st3-icon.yellow { background: #FEF7E0 !important; }
#st3-guide .st3-newlist li strong {
font-weight: 600 !important;
color: #202124 !important;
}
/* —- CODE BLOCKS —- */
#st3-guide .st3-code-wrap {
background: #F8F9FA !important;
border: 1px solid #DADCE0 !important;
border-radius: 10px !important;
overflow: hidden !important;
margin-top: 8px !important;
}
#st3-guide .st3-code-header {
background: #F1F3F4 !important;
padding: 8px 16px !important;
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
border-bottom: 1px solid #DADCE0 !important;
}
#st3-guide .st3-code-lang {
font-size: 11px !important;
font-weight: 700 !important;
letter-spacing: 0.8px !important;
text-transform: uppercase !important;
color: #5F6368 !important;
}
#st3-guide .st3-copy-btn {
font-size: 11px !important;
color: #4285F4 !important;
font-weight: 600 !important;
background: none !important;
border: none !important;
cursor: pointer !important;
padding: 2px 6px !important;
border-radius: 4px !important;
transition: background 0.2s !important;
}
#st3-guide .st3-copy-btn:hover {
background: #E8F0FE !important;
}
#st3-guide pre,
#st3-guide code {
font-family: ‘JetBrains Mono’, ‘Fira Code’, ‘Courier New’, monospace !important;
font-size: 13px !important;
line-height: 1.7 !important;
color: #202124 !important;
background: transparent !important;
border: none !important;
white-space: pre !important;
overflow-x: auto !important;
display: block !important;
padding: 16px !important;
}
#st3-guide .kw { color: #4285F4 !important; font-weight: 600 !important; }
#st3-guide .fn { color: #34A853 !important; }
#st3-guide .st { color: #EA4335 !important; }
#st3-guide .cm { color: #9AA0A6 !important; font-style: italic !important; }
#st3-guide .num { color: #E37400 !important; }
/* —- INSTALL BLOCK —- */
#st3-guide .st3-install {
background: #202124 !important;
border-radius: 10px !important;
padding: 16px 20px !important;
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
gap: 12px !important;
margin-top: 8px !important;
}
#st3-guide .st3-install code {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 15px !important;
color: #34A853 !important;
background: transparent !important;
border: none !important;
padding: 0 !important;
white-space: nowrap !important;
overflow-x: auto !important;
display: block !important;
}
#st3-guide .st3-install-copy {
background: #4285F4 !important;
color: #fff !important;
border: none !important;
border-radius: 6px !important;
padding: 7px 14px !important;
font-size: 12px !important;
font-weight: 600 !important;
cursor: pointer !important;
white-space: nowrap !important;
flex-shrink: 0 !important;
}
/* —- STEP LIST (install steps) —- */
#st3-guide .st3-steps {
display: flex !important;
flex-direction: column !important;
gap: 0 !important;
margin-top: 20px !important;
}
#st3-guide .st3-step {
display: flex !important;
gap: 16px !important;
position: relative !important;
}
#st3-guide .st3-step-left {
display: flex !important;
flex-direction: column !important;
align-items: center !important;
}
#st3-guide .st3-step-num {
width: 28px !important;
height: 28px !important;
border-radius: 50% !important;
background: #4285F4 !important;
color: #fff !important;
font-size: 12px !important;
font-weight: 700 !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
flex-shrink: 0 !important;
z-index: 1 !important;
position: relative !important;
}
#st3-guide .st3-step-line {
width: 2px !important;
height: 100% !important;
min-height: 20px !important;
background: #DADCE0 !important;
flex: 1 !important;
margin-top: 2px !important;
height: 1px !important;
}
#st3-guide .st3-step:last-child .st3-step-line {
display: none !important;
}
#st3-guide .st3-step-right {
padding-bottom: 20px !important;
flex: 1 !important;
}
#st3-guide .st3-step-title {
font-size: 14px !important;
font-weight: 600 !important;
color: #202124 !important;
margin-bottom: 4px !important;
margin-top: 4px !important;
}
#st3-guide .st3-step-desc {
font-size: 13px !important;
color: #5F6368 !important;
line-height: 1.6 !important;
}
/* —- LANG GRID —- */
#st3-guide .st3-lang-grid {
display: grid !important;
grid-template-columns: repeat(4, 1fr) !important;
gap: 8px !important;
margin-top: 8px !important;
}
#st3-guide .st3-lang-chip {
background: #F1F3F4 !important;
border-radius: 8px !important;
padding: 8px 10px !important;
font-size: 12px !important;
color: #202124 !important;
display: flex !important;
align-items: center !important;
gap: 6px !important;
}
#st3-guide .st3-lang-chip .lcode {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 10px !important;
color: #4285F4 !important;
font-weight: 700 !important;
background: #E8F0FE !important;
border-radius: 4px !important;
padding: 1px 5px !important;
}
/* —- EXPRESSION TAGS —- */
#st3-guide .st3-expr-grid {
display: grid !important;
grid-template-columns: repeat(3, 1fr) !important;
gap: 14px !important;
margin-top: 8px !important;
}
#st3-guide .st3-expr-card {
border-radius: 12px !important;
padding: 18px 16px !important;
text-align: center !important;
}
#st3-guide .st3-expr-card.blue { background: #E8F0FE !important; }
#st3-guide .st3-expr-card.green { background: #E6F4EA !important; }
#st3-guide .st3-expr-card.yellow { background: #FEF7E0 !important; }
#st3-guide .st3-expr-tag {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 15px !important;
font-weight: 700 !important;
margin-bottom: 8px !important;
display: block !important;
}
#st3-guide .st3-expr-card.blue .st3-expr-tag { color: #4285F4 !important; }
#st3-guide .st3-expr-card.green .st3-expr-tag { color: #34A853 !important; }
#st3-guide .st3-expr-card.yellow .st3-expr-tag { color: #E37400 !important; }
#st3-guide .st3-expr-desc {
font-size: 12px !important;
color: #5F6368 !important;
line-height: 1.5 !important;
}
/* —- NORMALIZATION TABLE —- */
#st3-guide .st3-norm-table {
width: 100% !important;
border-collapse: collapse !important;
margin-top: 10px !important;
font-size: 13px !important;
}
#st3-guide .st3-norm-table th {
background: #F1F3F4 !important;
color: #5F6368 !important;
font-size: 11px !important;
font-weight: 700 !important;
text-transform: uppercase !important;
letter-spacing: 0.8px !important;
padding: 10px 14px !important;
text-align: left !important;
border-bottom: 1px solid #DADCE0 !important;
}
#st3-guide .st3-norm-table td {
padding: 10px 14px !important;
color: #202124 !important;
border-bottom: 1px solid #F1F3F4 !important;
vertical-align: middle !important;
}
#st3-guide .st3-norm-table tr:last-child td {
border-bottom: none !important;
}
#st3-guide .st3-norm-table .st3-check {
display: inline-block !important;
width: 20px !important;
height: 20px !important;
border-radius: 50% !important;
background: #34A853 !important;
color: #fff !important;
font-size: 11px !important;
font-weight: 700 !important;
text-align: center !important;
line-height: 20px !important;
}
#st3-guide .st3-norm-table .st3-fail {
display: inline-block !important;
width: 20px !important;
height: 20px !important;
border-radius: 50% !important;
background: #EA4335 !important;
color: #fff !important;
font-size: 11px !important;
font-weight: 700 !important;
text-align: center !important;
line-height: 20px !important;
}
#st3-guide .st3-norm-table .st3-input {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 12px !important;
background: #F1F3F4 !important;
border-radius: 4px !important;
padding: 2px 7px !important;
color: #202124 !important;
}
/* —- PLATFORM GRID —- */
#st3-guide .st3-platform-grid {
display: grid !important;
grid-template-columns: repeat(4, 1fr) !important;
gap: 10px !important;
margin-top: 8px !important;
}
#st3-guide .st3-platform-card {
background: #F8F9FA !important;
border: 1px solid #DADCE0 !important;
border-radius: 10px !important;
padding: 14px 12px !important;
text-align: center !important;
}
#st3-guide .st3-platform-card .picon {
font-size: 20px !important;
margin-bottom: 6px !important;
display: block !important;
}
#st3-guide .st3-platform-card .pname {
font-size: 12px !important;
font-weight: 600 !important;
color: #202124 !important;
}
#st3-guide .st3-platform-card .psub {
font-size: 11px !important;
color: #5F6368 !important;
margin-top: 2px !important;
}
/* —- LINK ROW —- */
#st3-guide .st3-links {
display: flex !important;
flex-wrap: wrap !important;
gap: 10px !important;
margin-top: 20px !important;
}
#st3-guide .st3-link-btn {
display: inline-flex !important;
align-items: center !important;
gap: 6px !important;
padding: 9px 18px !important;
border-radius: 8px !important;
font-size: 13px !important;
font-weight: 600 !important;
text-decoration: none !important;
transition: opacity 0.2s !important;
cursor: pointer !important;
border: none !important;
}
#st3-guide .st3-link-btn:hover { opacity: 0.85 !important; }
#st3-guide .st3-link-btn.blue { background: #4285F4 !important; color: #fff !important; }
#st3-guide .st3-link-btn.out { background: transparent !important; color: #4285F4 !important; border: 1.5px solid #4285F4 !important; }
/* —- BOTTOM NAV —- */
#st3-guide .st3-nav {
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
padding: 14px 24px !important;
border-top: 1px solid #DADCE0 !important;
background: #ffffff !important;
}
#st3-guide .st3-dots {
display: flex !important;
gap: 6px !important;
}
#st3-guide .st3-dot {
width: 8px !important;
height: 8px !important;
border-radius: 50% !important;
background: #DADCE0 !important;
cursor: pointer !important;
border: none !important;
padding: 0 !important;
transition: background 0.2s, transform 0.2s !important;
}
#st3-guide .st3-dot.active {
background: #4285F4 !important;
transform: scale(1.3) !important;
}
#st3-guide .st3-arrows {
display: flex !important;
gap: 8px !important;
}
#st3-guide .st3-arrow {
width: 36px !important;
height: 36px !important;
border-radius: 50% !important;
border: 1.5px solid #DADCE0 !important;
background: #fff !important;
cursor: pointer !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
font-size: 14px !important;
color: #5F6368 !important;
transition: border-color 0.2s, color 0.2s, background 0.2s !important;
padding: 0 !important;
}
#st3-guide .st3-arrow:hover {
border-color: #4285F4 !important;
color: #4285F4 !important;
background: #E8F0FE !important;
}
#st3-guide .st3-arrow:disabled {
opacity: 0.35 !important;
cursor: default !important;
}
/* —- PROGRESS BAR —- */
#st3-guide .st3-progress-bar {
height: 3px !important;
background: #F1F3F4 !important;
position: relative !important;
}
#st3-guide .st3-progress-fill {
height: 3px !important;
background: linear-gradient(90deg, #4285F4, #34A853) !important;
transition: width 0.42s cubic-bezier(0.4, 0, 0.2, 1) !important;
border-radius: 0 2px 2px 0 !important;
}
/* —- MOBILE —- */
@media (max-width: 640px) {
#st3-guide .st3-slide {
padding: 24px 20px 20px !important;
}
#st3-guide .st3-h1 {
font-size: 20px !important;
}
#st3-guide .st3-h2 {
font-size: 17px !important;
}
#st3-guide .st3-stats {
gap: 8px !important;
}
#st3-guide .st3-stat {
min-width: 100px !important;
padding: 10px 14px !important;
}
#st3-guide .st3-stat-val {
font-size: 18px !important;
}
#st3-guide .st3-lang-grid {
grid-template-columns: repeat(3, 1fr) !important;
}
#st3-guide .st3-expr-grid {
grid-template-columns: 1fr !important;
gap: 10px !important;
}
#st3-guide .st3-platform-grid {
grid-template-columns: repeat(3, 1fr) !important;
}
#st3-guide .st3-norm-table th,
#st3-guide .st3-norm-table td {
padding: 8px 10px !important;
font-size: 12px !important;
}
#st3-guide pre,
#st3-guide code {
font-size: 11px !important;
overflow-x: auto !important;
}
#st3-guide .st3-install code {
font-size: 12px !important;
}
}
Supertonic 3 — Developer Guide
1 / 7
Overview
Supertonic 3: On-Device TTS,Now in 31 Languages
Supertonic 3 is a lightweight, open-weight text-to-speech system by Supertone Inc. It runs entirely via ONNX Runtime on your device — no cloud, no API call, no data leaving your machine. v3 expands from 5 to 31 languages, adds expressive tags, reduces reading failures, and stays compatible with the v2 ONNX interface.
31
Languages
~99M
Parameters
404 MB
ONNX Assets
MIT
Code License
What’s New in v3
Four Core Improvements Over Supertonic 2
Version 3 is a focused upgrade — same inference contract, meaningfully better output.
31 languages — Expanded from the 5-language v2 release (en, ko, es, pt, fr). Now includes Japanese, Arabic, German, Hindi, Russian, Turkish, Vietnamese, and 20 more ISO codes, plus a special na fallback for unknown languages.
More stable reading — Fewer repeat and skip failures, especially on short and long utterances. This was a known limitation in v2 that v3 directly addresses.
Expression tags — Supports <laugh>, <breath>, and <sigh> inline in text, without any separate preprocessing or external model.
Higher speaker similarity — Improved similarity across the shared-language set compared with Supertonic 2. Voices are more consistent across languages.
Installation
Get Running in Under a Minute
Install the Python SDK via pip. On first run, model assets are downloaded automatically from Hugging Face — no manual setup required.
pip install supertonic
1
Install the SDK
Run pip install supertonic in your Python environment (Python 3.8+).
2
First Run — Auto Download
On first use, TTS(auto_download=True) fetches the ONNX model assets (~404 MB) from Supertone/supertonic-3 on Hugging Face. Requires Git LFS.
3
All Inference Runs On-Device
After the initial download, no internet connection is needed. All synthesis happens locally via ONNX Runtime.
Quick Start
Basic Python Usage
The SDK auto-downloads model assets on first run. Specify a voice, pass your text with a language code, and save the WAV output.
Python
Copy
from supertonic import TTS
# Auto-downloads ONNX assets on first run
tts = TTS(auto_download=True)
# Select a preset voice (M1—M5 male, F1—F5 female)
style = tts.get_voice_style(voice_name=”M1″)
text = “A gentle breeze moved through the open window.”
# synthesize() returns (wav_array, duration_in_seconds)
wav, duration = tts.synthesize(text, voice_style=style, lang=”en”)
tts.save_audio(wav, “output.wav”)
print(f”Generated {duration:.2f}s of audio”)
Python — With Expression Tags
Copy
text = “I can’t believe it <laugh> that actually worked!”
wav, duration = tts.synthesize(text, voice_style=style, lang=”en”)
Languages
31 Supported Languages + na Fallback
All 31 languages share the same model architecture and ONNX inference pipeline. Use the na code for text whose language is unknown or outside the supported set.
en English
ko Korean
ja Japanese
ar Arabic
bg Bulgarian
cs Czech
da Danish
de German
el Greek
es Spanish
et Estonian
fi Finnish
fr French
hi Hindi
hr Croatian
hu Hungarian
id Indonesian
it Italian
lt Lithuanian
lv Latvian
nl Dutch
pl Polish
pt Portuguese
ro Romanian
ru Russian
sk Slovak
sl Slovenian
sv Swedish
tr Turkish
uk Ukrainian
vi Vietnamese
Text Normalization
Handles Complex Inputs Without Pre-Processing
Supertonic 3 reads financial expressions, dates, phone numbers, and technical units correctly out of the box — no G2P module or phonetic annotations required. Below: Supertonic vs. four major commercial/open-source systems.
Category
Input Example
Supertonic 3
ElevenLabs / OpenAI / Gemini / Microsoft
Financial Expression
$5.2M / $450K
✓
✗ All four failed
Time & Date
4:45 PM, Wed Apr 3
✓
✗ All four failed
Phone Number
(212) 555-0142 ext. 402
✓
✗ All four failed
Technical Unit
2.3h at 30kph
✓
✗ All four failed
Deployment & Resources
Runs Everywhere — 11 Platforms, No GPU Required
The public ONNX assets run on CPU in fixed-voice mode with no GPU dependency. Browser support is via WebGPU and WASM through onnxruntime-web. Audio output is 16-bit WAV; batch inference is supported.
PythonONNX Runtime
Node.jsServer-side JS
BrowserWebGPU / WASM
JavaJVM
C++High-perf
C#.NET
GoGo runtime
Swift / iOSNative
RustSystems
FlutterCross-platform
Code: MITLicense
Model: OpenRAIL-MLicense
GitHub Repo
HF Model
Live Demo
PyPI
←
→
(function(){
var total = 7;
var cur = 0;
var track = document.getElementById(‘st3-track’);
var label = document.getElementById(‘st3-slide-label’);
var progress = document.getElementById(‘st3-progress’);
var dotsWrap = document.getElementById(‘st3-dots’);
var prevBtn = document.getElementById(‘st3-prev’);
var nextBtn = document.getElementById(‘st3-next’);
for(var i=0;i

