ailia_llm package¶
Classes¶
- class ailia_llm.AiliaLLM¶
Bases:
objectMain class for ailia LLM inference.
Provides text generation using large language models via the ailia LLM native backend. Supports both text-only and multimodal (vision/audio) generation.
Examples
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> for delta in llm.generate([{"role": "user", "content": "Hello!"}]): ... print(delta, end="", flush=True)
- __init__()¶
Create an ailia LLM instance.
Allocates a native LLM context. Call
open()to load a model before generating text.- Raises:
AiliaLLMError – If the native context creation fails.
- context_full()¶
Check whether the context window is full.
- Returns:
True if the context window was exhausted during the last
generate()orgenerate_multimodal()call.- Return type:
bool
Examples
>>> for delta in llm.generate(messages): ... print(delta, end="") >>> if llm.context_full(): ... print("Warning: context window was full.")
- generate(prompts, top_k=40, top_p=0.9, temp=0.4, dist=1234)¶
Generate text from chat-style prompts.
This is a unified generator function that automatically detects whether the prompts contain media_data and routes to the appropriate internal API. It yields text fragments (delta tokens) as they are produced by the model.
- Parameters:
prompts (list[dict]) –
- A list of message dictionaries, each containing:
"role"(str): The role (e.g., “system”, “user”, “assistant”)."content"(str): The text content of the message."media_data"(list[dict], optional): A list of media entries, each containing:"media_type"(str): Type of media (e.g., “image”)."file_path"(str, optional): Path to the media file."data"(bytes, optional): Raw media data."width"(int, optional): Media width in pixels."height"(int, optional): Media height in pixels.
top_k (int, optional, default=40) – Top-k sampling parameter. Limits the token candidates to the top-k most probable tokens.
top_p (float, optional, default=0.9) – Top-p (nucleus) sampling parameter. Limits the token candidates to those within the cumulative probability p.
temp (float, optional, default=0.4) – Sampling temperature. Higher values produce more random output.
dist (int, optional, default=1234) – Random seed for reproducible generation.
- Yields:
str – Delta text fragments as they are generated by the model. Concatenating all yielded strings produces the full response.
- Raises:
AiliaLLMError – If prompt setting or generation fails.
RuntimeError – If media_data is provided but multimodal projector is not loaded.
Notes
If the context window becomes full during generation, the generator stops early. Use
context_full()to check whether the context was exhausted.For multimodal generation with media_data, you must first call
open_multimodal_projector()to load the projector file.Examples
Text-only generation:
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> messages = [ ... {"role": "system", "content": "You are a helpful assistant."}, ... {"role": "user", "content": "What is Python?"} ... ] >>> for delta in llm.generate(messages): ... print(delta, end="", flush=True)
Multimodal generation with images:
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> llm.open_multimodal_projector("mmproj-model.gguf") >>> messages = [ ... {"role": "user", "content": "Describe this image: <__media__>", ... "media_data": [{"media_type": "image", "file_path": "photo.jpg"}]} ... ] >>> for delta in llm.generate(messages): ... print(delta, end="", flush=True)
- generated_token_count()¶
Get the token count of the most recently generated output.
- Returns:
Number of tokens generated in the last
generate()orgenerate_multimodal()call.- Return type:
int
- Raises:
AiliaLLMError – If the token count retrieval fails.
Examples
>>> for delta in llm.generate(messages): ... print(delta, end="") >>> llm.generated_token_count() 128
- get_multimodal_capabilities()¶
Query the multimodal capabilities of the loaded model.
- Returns:
- A dictionary with the following keys:
"vision"(bool): Whether the model supports image input."audio"(bool): Whether the model supports audio input.
- Return type:
dict
- Raises:
AiliaLLMError – If the capability query fails.
Examples
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> llm.open_multimodal_projector("mmproj-model.gguf") >>> caps = llm.get_multimodal_capabilities() >>> caps["vision"] True
- open(model_path, n_ctx=0)¶
Open and load a language model file.
- Parameters:
model_path (str) – Path to the GGUF model file.
n_ctx (int, optional, default=0) – Context window size (number of tokens). When 0, the default context size of the model is used.
- Raises:
AiliaLLMError – If the model file cannot be opened or loaded.
Examples
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("path/to/model.gguf") >>> llm.open("path/to/model.gguf", n_ctx=4096)
- open_multimodal_projector(mmproj_path)¶
Open a multimodal projector file for vision or audio support.
Must be called after
open()to enable multimodal capabilities such as image or audio understanding.- Parameters:
mmproj_path (str) – Path to the multimodal projector file (e.g., mmproj-model.gguf).
- Raises:
AiliaLLMError – If the projector file cannot be opened or loaded.
Examples
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> llm.open_multimodal_projector("mmproj-model.gguf")
- prompt_token_count()¶
Get the token count of the most recently set prompt.
- Returns:
Number of tokens in the prompt that was last passed to
generate()orgenerate_multimodal().- Return type:
int
- Raises:
AiliaLLMError – If the token count retrieval fails.
Examples
>>> for delta in llm.generate(messages): ... print(delta, end="") >>> llm.prompt_token_count() 42
- token_count(text)¶
Count the number of tokens in the given text.
- Parameters:
text (str) – Input text to tokenize and count.
- Returns:
Number of tokens in the text.
- Return type:
int
- Raises:
AiliaLLMError – If the token counting operation fails.
Examples
>>> import ailia_llm >>> llm = ailia_llm.AiliaLLM() >>> llm.open("model.gguf") >>> llm.token_count("Hello, world!") 4