Three new Kitten TTS models – smallest less than 25MB

Kitten TTS ( https://github.com/KittenML/KittenTTS ) is an open-source series of tiny and expressive text-to-speech models for on-device applications. We had a thread last year here: https://news.ycombinator.com/item?id=44807868 . Today we're releasing three new models with 80M, 40M and 14M parameters. The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. This release is a major upgrade from the previous one and supports English text-to-speech applications in eight voices: four male and four female. Here's a short demo: https://www.youtube.com/watch?v=ge3u5qblqZA . Most models are quantized to int8 + fp16, and they use ONNX for runtime. Our models are designed to run anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release aims to bridge the gap between on-device and cloud models for tts applications. Multi-lingual model release is coming soon. On-device AI is bottlenecked by one thing: a lack of tiny models that actually perform. Our goal is to open-source more models to run production-ready voice agents and apps entirely on-device. We would love your feedback!

  • Agente de IA
  • Android
  • Aplicación Web

Resumen de IA

Kitten TTS offers a series of open-source, on-device text-to-speech models, with the smallest variant under 25MB and achieving state-of-the-art expressivity for its size. These models are designed for low-resource environments and support eight English voices.

Ideal para

Developers of on-device AI applications, Mobile app developers, Developers working with embedded systems (e.g., Raspberry Pi)

Por qué importa

Kitten TTS provides highly efficient and expressive text-to-speech capabilities that can run entirely on-device without requiring a GPU.

Funciones clave

  • Offers three text-to-speech models with 80M, 40M, and 14M parameters.
  • Smallest model is under 25MB, achieving SOTA expressivity for its size.
  • Supports English text-to-speech with eight distinct voices (four male, four female).
  • Quantized to int8 + fp16 for efficiency.

Casos de uso

  • A mobile game developer integrates the 14M parameter Kitten TTS model into their game to provide in-game character dialogue that runs entirely on the user's device, ensuring low latency and offline playability without draining battery life.
  • A wearable device manufacturer uses the quantized Kitten TTS models to enable voice commands and spoken notifications for their smartwatches, allowing users to interact with their devices hands-free and without needing a constant internet connection.
  • An independent author creates an audiobook using the 80M parameter Kitten TTS model, achieving high-quality narration for their story that can be distributed as a small, easily downloadable file for listeners with limited bandwidth.