smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3
Audio AI has had a breakout year. Automatic speech recognition has gotten dramatically better with models like OpenAI’s Whisper variants, NVIDIA’s Parakeet, and Mistral’s Voxtral. Audio understanding stepped forward with models like NVIDIA’s Audio Flamingo 3. Dialogue-grade text-to-speech arrived via Nari Labs’ Dia-1.6B. And Meta shipped the Perception Encoder Audiovisual (PE-AV), a multimodal encoder capable […]
