How to compile a Gemma 7b TFLite model for MediaPipe?

68 Views Asked by Csaba Toth At 20 March 2024 at 05:49

I tried gemma-2b-it-gpu-int4 and gemma-2b-it-cpu-int4 on my phone. I'd like to test a gemma-7b-it-gpu-int4 because 2b was extremely snappy and MLC-LLM could handle a 7b Llama2 so I assume Gemma 7b will fit too.

https://developers.google.com/mediapipe/solutions/genai/llm_inference#models offers 4 2b Gamma out of the box downloadable from Kaggle:

gemma-2b-it-cpu-int4: Gemma 4-bit model with CPU compatibility.
gemma-2b-it-cpu-int8: Gemma 8-bit model with CPU compatibility.
gemma-2b-it-gpu-int4: Gemma 4-bit model with GPU compatibility.
gemma-2b-it-gpu-int8: Gemma 8-bit model with GPU compatibility.

https://developers.google.com/mediapipe/solutions/genai/llm_inference#convert-model shows a converter, but that has a model_type parameter with values {"PHI_2", "FALCON_RW_1B", "STABLELM_4E1T_3B", "GEMMA_2B"}. Something is missing.

What do I do for 7b Gemma? Looks like I can start off of a PyTorch Gemma 7b bin, but then what should be the converter parameters, especially the model_type so I can end up with a GPU or CPU model which is 4 bit quantized and 16 bit floating point precision?

Original Q&A

How to compile a Gemma 7b TFLite model for MediaPipe?

There are 0 best solutions below

Related Questions in ANDROID

Related Questions in TENSORFLOW-LITE

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in TFLITE

Related Questions in GEMMA

Trending Questions

Popular # Hahtags

Popular Questions