ailia_llm package¶

Classes¶

class ailia_llm.AiliaLLM¶

Bases: object

Main class for ailia LLM inference.

Provides text generation using large language models via the ailia LLM native backend. Supports both text-only and multimodal (vision/audio) generation.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> for delta in llm.generate([{"role": "user", "content": "Hello!"}]):
...     print(delta, end="", flush=True)

__init__()¶

Create an ailia LLM instance.

Allocates a native LLM context. Call open() to load a model before generating text.

Raises:: AiliaLLMError – If the native context creation fails.

context_full()¶

Check whether the context window is full.

Returns:: True if the context window was exhausted during the last generate() or generate_multimodal() call.
Return type:: bool

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> if llm.context_full():
...     print("Warning: context window was full.")

generate(prompts, top_k=40, top_p=0.9, temp=0.4, dist=1234)¶

Generate text from chat-style prompts.

This is a unified generator function that automatically detects whether the prompts contain media_data and routes to the appropriate internal API. It yields text fragments (delta tokens) as they are produced by the model.

Parameters:

prompts (list[dict]) –
A list of message dictionaries, each containing:
- "role" (str): The role (e.g., “system”, “user”, “assistant”).
- "content" (str): The text content of the message.
- "media_data" (list[dict], optional): A list of media entries, each containing:
  "media_type" (str): Type of media (e.g., “image”).
  
  "file_path" (str, optional): Path to the media file.
  
  "data" (bytes, optional): Raw media data.
  
  "width" (int, optional): Media width in pixels.
  
  "height" (int, optional): Media height in pixels.
top_k (int, optional, default=40) – Top-k sampling parameter. Limits the token candidates to the top-k most probable tokens.
top_p (float, optional, default=0.9) – Top-p (nucleus) sampling parameter. Limits the token candidates to those within the cumulative probability p.
temp (float, optional, default=0.4) – Sampling temperature. Higher values produce more random output.
dist (int, optional, default=1234) – Random seed for reproducible generation.

Yields:

str – Delta text fragments as they are generated by the model. Concatenating all yielded strings produces the full response.

Raises:

AiliaLLMError – If prompt setting or generation fails.
RuntimeError – If media_data is provided but multimodal projector is not loaded.

Notes

If the context window becomes full during generation, the generator stops early. Use context_full() to check whether the context was exhausted.

For multimodal generation with media_data, you must first call open_multimodal_projector() to load the projector file.

Examples

Text-only generation:

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> messages = [
...     {"role": "system", "content": "You are a helpful assistant."},
...     {"role": "user", "content": "What is Python?"}
... ]
>>> for delta in llm.generate(messages):
...     print(delta, end="", flush=True)

Multimodal generation with images:

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")
>>> messages = [
...     {"role": "user", "content": "Describe this image: <__media__>",
...      "media_data": [{"media_type": "image", "file_path": "photo.jpg"}]}
... ]
>>> for delta in llm.generate(messages):
...     print(delta, end="", flush=True)

generated_token_count()¶

Get the token count of the most recently generated output.

Returns:: Number of tokens generated in the last generate() or generate_multimodal() call.
Return type:: int
Raises:: AiliaLLMError – If the token count retrieval fails.

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> llm.generated_token_count()
128

get_multimodal_capabilities()¶

Query the multimodal capabilities of the loaded model.

Returns:

A dictionary with the following keys:

"vision" (bool): Whether the model supports image input.
"audio" (bool): Whether the model supports audio input.

Return type:

dict

Raises:

AiliaLLMError – If the capability query fails.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")
>>> caps = llm.get_multimodal_capabilities()
>>> caps["vision"]
True

open(model_path, n_ctx=0)¶

Open and load a language model file.

Parameters:

model_path (str) – Path to the GGUF model file.
n_ctx (int, optional, default=0) – Context window size (number of tokens). When 0, the default context size of the model is used.

Raises:

AiliaLLMError – If the model file cannot be opened or loaded.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("path/to/model.gguf")
>>> llm.open("path/to/model.gguf", n_ctx=4096)

open_multimodal_projector(mmproj_path)¶

Open a multimodal projector file for vision or audio support.

Must be called after open() to enable multimodal capabilities such as image or audio understanding.

Parameters:: mmproj_path (str) – Path to the multimodal projector file (e.g., mmproj-model.gguf).
Raises:: AiliaLLMError – If the projector file cannot be opened or loaded.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.open_multimodal_projector("mmproj-model.gguf")

prompt_token_count()¶

Get the token count of the most recently set prompt.

Returns:: Number of tokens in the prompt that was last passed to generate() or generate_multimodal().
Return type:: int
Raises:: AiliaLLMError – If the token count retrieval fails.

Examples

>>> for delta in llm.generate(messages):
...     print(delta, end="")
>>> llm.prompt_token_count()
42

token_count(text)¶

Count the number of tokens in the given text.

Parameters:: text (str) – Input text to tokenize and count.
Returns:: Number of tokens in the text.
Return type:: int
Raises:: AiliaLLMError – If the token counting operation fails.

Examples

>>> import ailia_llm
>>> llm = ailia_llm.AiliaLLM()
>>> llm.open("model.gguf")
>>> llm.token_count("Hello, world!")
4