ailia TFLite Runtime

はじめに

プラットフォームを選び、最初の TFLite 推論を動かしてみましょう。

インストール

ailia TFLite の Python パッケージを PyPI からインストールします。

pip3 install ailia_tflite

Python や git をまだインストールしていない方は、Python 環境のセットアップ（Windows / Mac / Linux）を先にご覧ください。

PyPI で見る

サンプルを実行

ailia-models-tflite にはモデルごとの推論スクリプトが同梱されています。一度クローンして共通の依存をインストールしたあと、各モデルのフォルダへ cd して実行するだけです。-v 0 で Web カメラ入力、-i image.png で静止画を扱えます。リポジトリ直下の python3 launcher.py を起動すると、量子化 TFLite モデルを GUI で参照できます。

git clone https://github.com/ailia-ai/ailia-models-tflite.git
cd ailia-models-tflite
pip3 install -r requirements.txt
cd object_detection/yolox
python3 yolox.py -v 0

Windows の場合は python3 の代わりに python を使用してください。

サンプルリポジトリ

サンプルをクローン

C++ サンプルリポジトリをクローンしてサブモジュールを初期化します。ailia-tflite-cpp バインディングはサブモジュールとして含まれています。

git clone https://github.com/ailia-ai/ailia-models-cpp.git
cd ailia-models-cpp
git submodule init
git submodule update

macOS の場合のみ、dylib の quarantine 属性を解除します。

./xattr.sh

サンプルリポジトリ yolox_tflite サンプル

ビルドして実行

1 か月の評価ライセンスを取得し、CMake と OpenCV をインストールしてビルド、YOLOX TFLite サンプルを実行します。yolox_tflite.sh がモデル TFLite を自動でダウンロードしてから推論を行います。

# 評価ライセンスを取得
cd ailia_tflite
python3 download_license.py
cd ..

# macOS
brew install cmake opencv
# Linux: apt install cmake libopencv-dev
# Windows: install CMake and Visual Studio,
#   then set OpenCV_DIR to your OpenCV build path

cmake .
cmake --build .
cd object_detection/yolox_tflite
./yolox_tflite.sh    # use yolox_tflite.bat on Windows

C API リファレンス

UPM でインストール

Unity (2021.3.10f1 以降) で Window > Package Manager を開き、+ > Add package from git URL をクリックして下記のバインディング URL を入力します。

https://github.com/ailia-ai/ailia-tflite-unity.git

Unity API リファレンス

サンプルを実行

ailia-models-unity をクローンして Unity Editor (2021.3.10f1 以降) で開き、ObjectDetection/ObjectDetectionSample.unity を Play してください。YOLOX TFLite ルートは AiliaTFLiteYoloxSample.cs に委譲され、NNAPI / モバイル推論で動作します。

git clone https://github.com/ailia-ai/ailia-models-unity.git

AiliaTFLiteYoloxSample.cs

バインディングをクローン

自分のプロジェクトに組み込む場合は、JNI バインディングリポジトリをクローンして Android Studio プロジェクトに追加します。

git clone https://github.com/ailia-ai/ailia-tflite-jni.git

バインディング

サンプルを実行

ailia-models-kotlin をサブモジュール込みでクローンし Android Studio で開きます。接続デバイス上で TFLite 物体検出サンプルを実行してください。

git clone https://github.com/ailia-ai/ailia-models-kotlin.git
cd ailia-models-kotlin
git submodule update --init --recursive

AiliaTFLiteObjectDetectionSample.kt

プロジェクトで API を使う

TFLite モデルを読み込み推論を実行する最小サンプル。Python API は tflite_runtime.interpreter と同じ形なので、import を差し替えるだけで既存コードがそのまま動きます。

import ailia_tflite
import numpy as np

interpreter = ailia_tflite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

input_data = np.zeros(input_details[0]["shape"], dtype=np.float32)
interpreter.set_tensor(input_details[0]["index"], input_data)
interpreter.invoke()

output = interpreter.get_tensor(output_details[0]["index"])
print(output.shape)

#include "ailia_tflite.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// Load model.tflite into a buffer
FILE *f = fopen("model.tflite", "rb");
fseek(f, 0, SEEK_END);
size_t len = ftell(f);
fseek(f, 0, SEEK_SET);
void *model = malloc(len);
fread(model, 1, len, f);
fclose(f);

struct AILIATFLiteInstance *interp = NULL;
ailiaTFLiteCreate(&interp, model, len,
                  NULL, NULL, NULL, NULL,    // default allocators
                  AILIA_TFLITE_ENV_REFERENCE,
                  AILIA_TFLITE_MEMORY_MODE_DEFAULT,
                  AILIA_TFLITE_FLAG_NONE);
ailiaTFLiteAllocateTensors(interp);

// Quantize float → int8 and write into the first input tensor
int32_t in_idx;
ailiaTFLiteGetInputTensorIndex(interp, &in_idx, 0);
float in_scale; int64_t in_zp;
ailiaTFLiteGetTensorQuantizationScale(interp, &in_scale, in_idx);
ailiaTFLiteGetTensorQuantizationZeroPoint(interp, &in_zp, in_idx);
int8_t *in_buf = NULL;
ailiaTFLiteGetTensorBuffer(interp, (void**)&in_buf, in_idx);
for (size_t i = 0; i < in_count; i++) {
    int32_t q = (int32_t)lroundf(input_float[i] / in_scale) + (int32_t)in_zp;
    in_buf[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q);
}

ailiaTFLitePredict(interp);

// Dequantize int8 → float from the first output tensor
int32_t out_idx;
ailiaTFLiteGetOutputTensorIndex(interp, &out_idx, 0);
float out_scale; int64_t out_zp;
ailiaTFLiteGetTensorQuantizationScale(interp, &out_scale, out_idx);
ailiaTFLiteGetTensorQuantizationZeroPoint(interp, &out_zp, out_idx);
const int8_t *out_buf = NULL;
ailiaTFLiteGetTensorBuffer(interp, (void**)&out_buf, out_idx);
for (size_t i = 0; i < out_count; i++) {
    output_float[i] = (out_buf[i] - (int32_t)out_zp) * out_scale;
}

ailiaTFLiteDestroy(interp);
free(model);

using ailiaTFLite;

var interpreter = new AiliaTFLiteModel();
interpreter.OpenFile("model.tflite");
interpreter.AllocateTensors();

var input = new float[1 * 224 * 224 * 3];
interpreter.SetInputTensorData(0, input);
interpreter.Predict();

var output = interpreter.GetOutputTensorData<float>(0);

val tflite = AiliaTFLite()
tflite.open(modelData, AiliaTFLite.AILIA_TFLITE_ENV_REFERENCE)
tflite.allocateTensors()

val inputIdx = tflite.getInputTensorIndex(0)
val outputIdx = tflite.getOutputTensorIndex(0)
tflite.setTensorData(inputIdx, inputBuffer)
tflite.predict()

val output = tflite.getTensorData(outputIdx)
tflite.close()

よくある質問

ailia TFLite Runtime についてのよくある質問。

本当に TensorFlow Lite のドロップイン置き換えになりますか?

はい。Python の ailia_tflite.Interpreter クラスは tflite_runtime.interpreter.Interpreter と同じインターフェース (コンストラクタも、allocate_tensors() / set_tensor() / invoke() / get_tensor() も同じ) を提供します。既存の TFLite Python スクリプトは通常 import を差し替えるだけで動作します。

本家 TFLite と比べてどこが優れていますか?

主に 2 点です。Intel MKL による PC 上の高速推論、そして軽量な C99 実装による NonOS / RTOS 組み込み配備。Android では NNAPI 経由でオンデバイス NPU も駆動できます。

ailia-models-tflite のサンプルに --tflite を渡すと、本家挙動と比較できます。

量子化モデルに対応していますか?

はい。INT8 量子化 TFLite モデルが標準でサポートされています。組み込みや NPU 向けには量子化が推奨です。ailia-models-tflite のモデル群も量子化版を中心に構成されています。

Float のテンソルをどのように Int8 に量子化しますか?

各テンソルの量子化係数は ailiaTFLiteGetTensorQuantizationScale と ailiaTFLiteGetTensorQuantizationZeroPoint で取得できます。

Float → Int8: q = round(f / scale) + zero_point

Int8 → Float: f = (q − zero_point) × scale

TFLite はテンソル単位で 1 つの scale と 1 つの zero_point を持ちます。重み (Conv の weight など) はチャンネル単位で scale を持ち (per-axis quantization)、その zero_point は 0 に固定されています。チャンネル数や量子化軸は ailiaTFLiteGetTensorQuantizationCount と ailiaTFLiteGetTensorQuantizationQuantizedDimension で確認できます。

仕様の詳細は TensorFlow Lite 8 ビット量子化仕様を参照してください。

マイコン上でも使えますか?

C99 コアは NonOS / RTOS や小フットプリント配備向けに設計されています。具体的な MCU 対応は利用可能メモリやツールチェーンに依存します。組み込み移植の詳細は ailia までお問い合わせください。

バックエンド (CPU / NPU / MKL) を切り替えるには?

ailia_tflite.Interpreter 構築時に env_id (および任意で flags / num_threads) を指定します。デフォルトは CPU で MKL が使用されます。Android で NPU を使用するには AILIA_TFLITE_ENV_NNAPI (=1) を指定する必要があります。

ライセンスはどう扱われますか?

評価ライセンスは実行時に自動でダウンロードされ、開発・評価用途に利用できます。組み込みでの再配布を含む商用配布には製品ライセンスを申請してください。詳細は ailia ライセンス規約を参照してください。

はじめに

インストール

サンプルを実行

サンプルをクローン

ビルドして実行

UPM でインストール

サンプルを実行

バインディングをクローン

サンプルを実行

動作環境

対応 OS

言語

アクセラレーション

モデル形式

プロジェクトで API を使う

プラットフォーム別 API リファレンス

Python

C99

Unity

JNI

よくある質問

資料

関連記事