ailia_llm package

Classes

class ailia_llm.AiliaLLM

Bases: object

Main class for ailia LLM inference.

Provides text generation using large language models via the ailia LLM native backend. Supports both text-only and multimodal (vision/audio) generation.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> for delta in llm.generate([{"role": "user", "content": "Hello!"}]):
...     print(delta, end="", flush=True)
__init__()

Create an ailia LLM instance.

Allocates a native LLM context. Call open() to load a model before generating text.

Raises:

AiliaLLMError – If the native context creation fails.

context_full()

Check whether the context window is full.

Returns:

True if the context window was exhausted during the last generate() or generate_multimodal() call.

Return type:

bool

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> if llm.context_full():
...     print("Warning: context window was full.")
generate(prompts, top_k=40, top_p=0.9, temp=0.4, dist=1234)

Generate text from chat-style prompts.

This is a unified generator function that automatically detects whether the prompts contain media_data and routes to the appropriate internal API. It yields text fragments (delta tokens) as they are produced by the model.

Parameters:
  • prompts (list[dict]) –

    A list of message dictionaries, each containing:
    • "role" (str): The role (e.g., “system”, “user”, “assistant”).

    • "content" (str): The text content of the message.

    • "media_data" (list[dict], optional): A list of media entries, each containing:

      • "media_type" (str): Type of media (e.g., “image”).

      • "file_path" (str, optional): Path to the media file.

      • "data" (bytes, optional): Raw media data.

      • "width" (int, optional): Media width in pixels.

      • "height" (int, optional): Media height in pixels.

  • top_k (int, optional, default=40) – Top-k sampling parameter. Limits the token candidates to the top-k most probable tokens.

  • top_p (float, optional, default=0.9) – Top-p (nucleus) sampling parameter. Limits the token candidates to those within the cumulative probability p.

  • temp (float, optional, default=0.4) – Sampling temperature. Higher values produce more random output.

  • dist (int, optional, default=1234) – Random seed for reproducible generation.

Yields:

str – Delta text fragments as they are generated by the model. Concatenating all yielded strings produces the full response.

Raises:
  • AiliaLLMError – If prompt setting or generation fails.

  • RuntimeError – If media_data is provided but multimodal projector is not loaded.

Notes

If the context window becomes full during generation, the generator stops early. Use context_full() to check whether the context was exhausted.

For multimodal generation with media_data, you must first call open_multimodal_projector() to load the projector file.

Examples

Text-only generation:

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> messages = [
...     {"role": "system", "content": "You are a helpful assistant."},
...     {"role": "user", "content": "What is Python?"}
... ]
>>> for delta in llm.generate(messages):
...     print(delta, end="", flush=True)

Multimodal generation with images:

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")
>>> messages = [
...     {"role": "user", "content": "Describe this image: <__media__>",
...      "media_data": [{"media_type": "image", "file_path": "photo.jpg"}]}
... ]
>>> for delta in llm.generate(messages):
...     print(delta, end="", flush=True)
generated_token_count()

Get the token count of the most recently generated output.

Returns:

Number of tokens generated in the last generate() or generate_multimodal() call.

Return type:

int

Raises:

AiliaLLMError – If the token count retrieval fails.

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> llm.generated_token_count()
128
get_multimodal_capabilities()

Query the multimodal capabilities of the loaded model.

Returns:

A dictionary with the following keys:
  • "vision" (bool): Whether the model supports image input.

  • "audio" (bool): Whether the model supports audio input.

Return type:

dict

Raises:

AiliaLLMError – If the capability query fails.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")
>>> caps = llm.get_multimodal_capabilities()
>>> caps["vision"]
True
open(model_path, n_ctx=0)

Open and load a language model file.

Parameters:
  • model_path (str) – Path to the GGUF model file.

  • n_ctx (int, optional, default=0) – Context window size (number of tokens). When 0, the default context size of the model is used.

Raises:

AiliaLLMError – If the model file cannot be opened or loaded.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("path/to/model.gguf")
>>> llm.open("path/to/model.gguf", n_ctx=4096)
open_multimodal_projector(mmproj_path)

Open a multimodal projector file for vision or audio support.

Must be called after open() to enable multimodal capabilities such as image or audio understanding.

Parameters:

mmproj_path (str) – Path to the multimodal projector file (e.g., mmproj-model.gguf).

Raises:

AiliaLLMError – If the projector file cannot be opened or loaded.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")
prompt_token_count()

Get the token count of the most recently set prompt.

Returns:

Number of tokens in the prompt that was last passed to generate() or generate_multimodal().

Return type:

int

Raises:

AiliaLLMError – If the token count retrieval fails.

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> llm.prompt_token_count()
42
token_count(text)

Count the number of tokens in the given text.

Parameters:

text (str) – Input text to tokenize and count.

Returns:

Number of tokens in the text.

Return type:

int

Raises:

AiliaLLMError – If the token counting operation fails.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.token_count("Hello, world!")
4