ailia Voice

Getting Started

Choose your platform and synthesize your first voice clip.

Install

Install the ailia Voice Python package together with the librosa and soundfile helpers used by the sample.

pip3 install ailia_voice librosa soundfile

View on PyPI

Run a Sample

Download example_ailia_voice.py from ailia-models and run it. It clones a target voice from a short reference clip and writes output.wav. Models are downloaded into ./models/ automatically.

wget https://raw.githubusercontent.com/ailia-ai/ailia-models/master/audio_processing/gpt-sovits/example_ailia_voice.py
python3 example_ailia_voice.py

example_ailia_voice.py

Get Evaluation Package

Apply for the free trial to obtain the evaluation package, which contains the C++ binding (ailia_voice.h), runtime libraries, the license file, and a runnable sample.

Apply for Free Trial

Download Models & Run

Run the bundled model download script, then build and execute the sample.

# Fetch GPT-SoVITS / Tacotron2 model files
cd onnx
bash download_models.sh   # download_models.bat on Windows
cd ../cpp

# Build (macOS)
clang++ -o ailia_voice_sample ailia_voice_sample.cpp \
  wave_writer.cpp wave_reader.cpp \
  libailia_voice.dylib libailia.dylib libailia_audio.dylib \
  -Wl,-rpath,./ -std=c++17

./ailia_voice_sample gpt-sovits

Full C++ Setup Guide

Install via UPM

Open Window > Package Manager in Unity (2021.3.10f1+), click + > Add package from git URL, and enter the binding URL below.

https://github.com/ailia-ai/ailia-voice-unity.git

Unity API Reference

Run a Sample

Clone ailia-models-unity, open it in the Unity Editor (2021.3.10f1+), and play the TextToSpeech scene with a reference audio clip plus your target text.

git clone https://github.com/ailia-ai/ailia-models-unity.git

AiliaVoiceSample.cs

Add to pubspec

Add ailia Voice as a git dependency in your Flutter project's pubspec.yaml, then run flutter pub get. Flutter 3.19.6 or later is required. On macOS, set com.apple.security.app-sandbox to false in macos/Runner/Release.entitlements and Debug.entitlements.

dependencies:
  ailia_voice:
    git:
      url: https://github.com/ailia-ai/ailia-voice-flutter.git
      ref: main

Flutter API Reference

Run a Sample

Clone the Flutter sample repository for a ready-to-run app demonstrating GPT-SoVITS synthesis.

git clone https://github.com/ailia-ai/ailia-models-flutter.git
cd ailia-models-flutter
flutter pub get
flutter run

Sample Repository

Features

Voice synthesis capabilities provided across the C, C#, and Python APIs.

TTS Models

Tacotron2 — fast English baseline
GPT-SoVITS v1 / v2 / v2-pro / v3

Voice Cloning

Clone any timbre from ~10 s reference audio
Reference audio + transcript pairing
Speaker Verification embeddings (v2-pro)

Multi-Language

Japanese accent support (v2 onward)
OpenJtalk built in for JA phonemes
g2pw + jieba for Chinese (v2 onward)

Customization

User dictionary (pyopenjtalk format)
Standard v3 user dictionary downloadable
Playback speed control (v2 onward)

Use the API in Your Project

Minimal examples for synthesizing speech in your own application.

import ailia_voice

voice = ailia_voice.GPTSoVITS()
voice.initialize_model(model_path="./models/")
voice.set_reference_audio(
    ref_text, ailia_voice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA, ref_audio, rate,
)
buf, sr = voice.synthesize_voice("こんにちは。", ailia_voice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA)

#include "ailia_voice.h"

struct AILIAVoice *voice = nullptr;
ailiaVoiceCreate(&voice, env_id, AILIA_VOICE_FLAG_NONE);
ailiaVoiceOpenModelFileGPTSoVITSV1A(voice,
    t2s_encoder, t2s_fsdec, t2s_sdec, vits, cnhubert);

ailiaVoiceSetReferenceA(voice, ref_pcm, ref_samples, channels, rate,
                        ref_text, AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);
ailiaVoiceInferenceA(voice, "こんにちは。",
                     AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);

unsigned int n; ailiaVoiceGetWaveSamples(voice, &n);
ailiaVoiceDestroy(voice);

using ailiaVoice;

var voice = new AiliaVoiceModel();
voice.Create(voice.GetEnvironmentId(gpuMode), AiliaVoice.AILIA_VOICE_FLAG_NONE);
voice.OpenGPTSoVITSV1ModelFile(t2sEnc, t2sFsdec, t2sSdec, vits, cnhubert);

var refText = voice.G2P(refLabel, AiliaVoice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);
voice.SetReference(refClip, refText);

var text = voice.G2P("こんにちは。", AiliaVoice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);
voice.Inference(text);
AudioClip clip = voice.GetAudioClip();

import 'package:ailia_voice/ailia_voice.dart';

final voice = AiliaVoiceModel();
await voice.create();
await voice.openGPTSoVITSV1ModelFile(t2sEnc, t2sFsdec, t2sSdec, vits, cnhubert);

await voice.setReference(refClip, refText);
final pcm = await voice.synthesize('こんにちは。',
    g2pType: AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);

val voice = AiliaVoice(envId = envId)
voice.openGPTSoVITSV1ModelFile(t2sEnc, t2sFsdec, t2sSdec, vits, cnhubert)
voice.setReferenceAudio(refAudio, refAudio.size * 4, channels, sampleRate, refG2pText)

val g2p = voice.g2p("こんにちは。", AiliaVoice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA)
val audio = voice.synthesizeVoice(g2p)

FAQ

Common questions about ailia Voice.

Which TTS models are supported?

Two families: Tacotron2 (English baseline) and GPT-SoVITS (zero-shot voice cloning).

GPT-SoVITS comes in four versions — v1, v2, v2-pro, and v3 — each with Japanese, English, and Chinese variants. Pick a sample model with the CLI argument (tacotron2, gpt-sovits, gpt-sovits-v2-en, etc.).

Which GPT-SoVITS version should I use?

v1: lightest and fastest, no Japanese accent support.
v2: adds Japanese pitch / accent and playback-speed control. Good real-time default.
v3: highest audio quality (CFM + DiT + BigVGAN diffusion), but slower.
v2-pro: combines v3's text analysis with v2's fast vocoder plus speaker verification embeddings — recommended for the best quality/speed balance.

What does "reference audio" mean and why is it required?

GPT-SoVITS clones the voice characteristics of a target speaker from about 10 seconds of clean reference audio plus the matching transcript. Pass both to set_reference_audio() before calling synthesize_voice().

Tacotron2 does not require reference audio — it speaks in a fixed voice.

How do I create a custom pronunciation dictionary?

ailia Voice integrates OpenJtalk for Japanese phoneme conversion. To override pronunciations, prepare a userdic.csv in MeCab format (the trailing 0/5 means 5 morae with accent on position 0) and convert it to a binary .dic with pyopenjtalk:

import pyopenjtalk
pyopenjtalk.mecab_dict_index("userdic.csv", "userdic.dic")

Then pass user_dict_path to initialize_model() (Python) or call ailiaVoiceSetUserDictionary (C). A standard user dictionary for v3 is also available.

Which languages can I synthesize?

Japanese, English, and Chinese, selected via the AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA / _EN / _ZH constants passed to set_reference_audio() and synthesize_voice().

Where do I place the license file when using C++?

The C++ binding requires ailia.lic next to the runtime libraries:

Windows: same folder as ailia.dll (or in cpp/ for the sample).
macOS: ~/Library/SHALO/
Linux: ~/.shalo/

Python, Unity, Flutter, and JNI bindings auto-download the license on first run, so this only applies to the native C++ binding.

How do I enable GPU acceleration?

On macOS / iOS, Metal is used automatically. On Windows / Linux, install CUDA Toolkit and cuDNN. See the CUDA Toolkit / cuDNN Installation Guide for detailed instructions.

Can I run it offline?

Yes, after the first run. Model weights are downloaded into the directory passed to initialize_model(model_path=...) on first use, and the evaluation license is fetched automatically. Subsequent runs work without an internet connection.

How does licensing work?

An evaluation license is downloaded automatically at runtime, suitable for development and trial. For commercial deployment, request a production license. See the ailia license terms.

Getting Started

Install

Run a Sample

Get Evaluation Package

Download Models & Run

Install via UPM

Run a Sample

Add to pubspec

Run a Sample

System Requirements

Operating Systems

Languages & Compilers

Supported Models

Language Coverage

Features

TTS Models

Voice Cloning

Multi-Language

Customization

Use the API in Your Project

API Reference by Platform

Python

C++

Unity

Flutter

JNI

FAQ

Materials