ailia LLM

Getting Started

Choose your platform and run your first local chat completion.

Install

Install the ailia LLM Python package from PyPI.

pip3 install ailia_llm

View on PyPI

Run a Sample

Download example_ailia_llm.py from ailia-models and run it. The script downloads the Gemma 3 4B GGUF file on first run and streams a chat completion. For a multimodal (vision) variant, fetch example_ailia_llm_mtmd.py from the same folder instead.

wget https://raw.githubusercontent.com/ailia-ai/ailia-models/master/large_language_model/gemma3/example_ailia_llm.py
python3 example_ailia_llm.py

example_ailia_llm.py example_ailia_llm_mtmd.py (multimodal)

Get Evaluation Package

Apply for the free trial to obtain the evaluation package, which contains the C++ binding (ailia_llm.h), the runtime library (ailia_llm), and a runnable sample.

Apply for Free Trial

Download a Model & Run

Place a GGUF file (e.g. Gemma 2-2B from Hugging Face) under models/, then build and run the sample.

# macOS
clang++ -o ailia_llm_sample ailia_llm_sample.cpp \
  libailia_llm.dylib -Wl,-rpath,./ -std=c++17

./ailia_llm_sample

Full C++ Setup Guide

Install via UPM

Open Window > Package Manager in Unity (2021.3.10f1+), click + > Add package from git URL, and enter the binding URL below.

https://github.com/ailia-ai/ailia-llm-unity.git

Unity API Reference

Run a Sample

Clone ailia-models-unity, open it in the Unity Editor (2021.3.10f1+), drop a GGUF file under StreamingAssets/, and play the LargeLanguageModel scene to stream chat tokens.

git clone https://github.com/ailia-ai/ailia-models-unity.git

AiliaLargeLanguageModelSample.cs

Add to pubspec

Add ailia LLM as a git dependency in your Flutter project's pubspec.yaml, then run flutter pub get. Flutter 3.19.6 or later is required. On macOS, set com.apple.security.app-sandbox to false in macos/Runner/Release.entitlements and Debug.entitlements.

dependencies:
  ailia_llm:
    git:
      url: https://github.com/ailia-ai/ailia-llm-flutter.git
      ref: main

Flutter API Reference

Run a Sample

Clone the Flutter sample repository for a ready-to-run chat app.

git clone https://github.com/ailia-ai/ailia-models-flutter.git
cd ailia-models-flutter
flutter pub get
flutter run

Sample Repository

Use the API in Your Project

Minimal examples for streaming a chat completion in your own application.

import ailia_llm

model = ailia_llm.AiliaLLM()
model.open("gemma-2-2b-it-Q4_K_M.gguf")

messages = [{"role": "user", "content": "あなたの名前は何ですか？"}]
for delta in model.generate(messages):
    print(delta, end="")

#include "ailia_llm.h"

struct AILIALLM *llm = nullptr;
ailiaLLMCreate(&llm);
ailiaLLMOpenModelFileA(llm, "gemma-2-2b-it-Q4_K_M.gguf", /*ctx_size=*/0);

AILIALLMChatMessage messages[] = {
    {"user", "あなたの名前は何ですか？"},
};
ailiaLLMSetPrompt(llm, messages, 1);

unsigned int done = 0;
while (!done) {
    ailiaLLMGenerate(llm, &done);
    /* read delta via ailiaLLMGetDeltaText */
}
ailiaLLMDestroy(llm);

using ailiaLLM;

var llm = new AiliaLLMModel();
llm.Create();
llm.Open("gemma-2-2b-it-Q4_K_M.gguf");

var messages = new List<AiliaLLMChatMessage> {
    new AiliaLLMChatMessage("user", "あなたの名前は何ですか？"),
};
llm.SetPrompt(messages);

bool done = false;
while (!done) {
    llm.Generate(ref done);
    Debug.Log(llm.GetDeltaText());
}
llm.Close();

import 'package:ailia_llm/ailia_llm.dart';

final llm = AiliaLLMModel();
await llm.open('gemma-2-2b-it-Q4_K_M.gguf');

final messages = [AiliaLLMChatMessage(role: 'user', content: 'あなたの名前は何ですか？')];
await for (final delta in llm.generate(messages)) {
  stdout.write(delta);
}

val llm = AiliaLLM()
llm.openModelFile(modelPath, contextSize)
llm.setSamplingParams(40, 0.9f, 0.4f, 1234)

llm.setPrompt(arrayOf(
    AiliaLLMChatMessage("user", "あなたの名前は何ですか？"),
))

var done = false
while (!done) {
    done = llm.generate()
    Log.d("ailia", llm.getDeltaText())
}

FAQ

Common questions about ailia LLM.

What model formats are supported?

ailia LLM loads GGUF files, the format used by llama.cpp. You can convert Hugging Face checkpoints to GGUF with the convert_hf_to_gguf.py script bundled with llama.cpp, or download pre-converted GGUF weights from Hugging Face.

Which model architectures are supported?

Llama, Gemma, Mistral, Qwen, Phi, DeepSeek, and other architectures supported by llama.cpp. Compatibility tracks the upstream llama.cpp project.

How do I stream tokens as they are generated?

model.generate(messages) returns an iterator that yields delta strings as the model decodes each token. Iterate over it and append to a buffer (or print directly) for streaming UX.

How much memory do I need?

Memory usage roughly equals the GGUF file size plus the KV cache and intermediate tensors. As a rule of thumb at Q4 quantization: 2B models ≈ 2 GB, 7B ≈ 5 GB, 13B ≈ 9 GB. Use smaller models or higher quantization (Q4 → Q3) on memory-constrained devices.

Does it support GPU acceleration?

Yes. ailia LLM uses Metal on iOS and macOS, and Vulkan on Windows. (Unlike ailia SDK / Speech / Voice, ailia LLM does not require cuDNN.) Inference falls back to CPU when no GPU is available.

Where do I place the license file when using C++?

The C++ binding requires ailia.lic next to the runtime libraries:

Windows: same folder as ailia.dll (or in cpp/ for the sample).
macOS: ~/Library/SHALO/
Linux: ~/.shalo/

Python, Unity, Flutter, and JNI bindings auto-download the license on first run, so this only applies to the native C++ binding.

How does licensing work?

An evaluation license is downloaded automatically at runtime, suitable for development and trial. For commercial deployment, request a production license. See the ailia license terms.

Getting Started

Install

Run a Sample

Get Evaluation Package

Download a Model & Run

Install via UPM

Run a Sample

Add to pubspec

Run a Sample

System Requirements

Operating Systems

Languages & Compilers

Model Format

Memory Guidance

Use the API in Your Project

API Reference by Platform

Python

C++

Unity

Flutter

FAQ

Materials