NeMo TTS API#

Model Classes#

MagpieTTS (Codec-based TTS)#

MagpieTTS is an end-to-end TTS model that generates audio codes from transcript and optional context (audio or text). It supports multiple architectures (e.g. multi-encoder context, decoder context) and can be used for standard, long-form, and streaming inference.

Mel-Spectrogram Generators#

Vocoders#

Codecs#

Base Classes#

The classes below are the base of the TTS pipeline.

Dataset Processing Classes#