I'm trying to run Gemma-2b on my computer with 16GB of RAM, but it gives me an error below:
2024-03-19 16:23:38.722556: I external/local_tsl/tsl/framework/bfc_allocator.cc:1112] Sum Total of in-use chunks: 4.78GiB
2024-03-19 16:23:38.722568: I external/local_tsl/tsl/framework/bfc_allocator.cc:1114] Total bytes in pool: 8589934592 memory_limit_: 16352886784 available bytes: 7762952192 curr_region_allocation_bytes_: 8589934592
2024-03-19 16:23:38.722589: I external/local_tsl/tsl/framework/bfc_allocator.cc:1119] Stats:
Limit: 16352886784
InUse: 5129633792
MaxInUse: 6157238272
NumAllocs: 509
MaxAllocSize: 1073741824
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2024-03-19 16:23:38.722623: W external/local_tsl/tsl/framework/bfc_allocator.cc:499] ************************_______________________________________*************************************
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Out of memory while trying to allocate 11685516680 bytes. [Op:__inference_generate_step_12720]
Also, I have around 160GB of swap.
$ swapon
NAME TYPE SIZE USED PRIO
/dev/nvme0n1p7 partition 60G 0B -2
/media/.../ram file 100G 768K 0
$ sysctl vm.swappiness
vm.swappiness = 100
I have more than enough swap partition to cover the needed RAM, and I'm also running it on the CPU and not the GPU. The problem is, when I run this code on my Mac with 16GB of RAM, it uses the swap and runs smoothly. I am wondering why Ubuntu cannot manage memory as efficiently as Mac. What can I do to make Ubuntu use the swap partition to run the code? I even ran Gemma-7b with 50GB of swap memory usage on Mac like a piece of cake.
Here is the code download the model and run the model offline to see the result
import tensorflow as tf
tf.keras.backend.set_floatx('float16')
from keras_nlp.src.models import GemmaCausalLM
from keras_nlp.src.backend import keras
def read_config(path):
import json
with open(path) as config_file:
return json.load(config_file)
if __name__ == '__main__':
gemma_path = '/opt/gemma_2b'
config_path = f'{gemma_path}/config.json'
tokenizer_path = f'{gemma_path}/tokenizer.json'
config = read_config(config_path)
tokenizer_config = read_config(tokenizer_path)
cls = keras.saving.get_registered_object(config["registered_name"])
backbone = keras.saving.deserialize_keras_object(config)
backbone.load_weights(f'{gemma_path}/model.weights.h5')
tokenizer = keras.saving.deserialize_keras_object(tokenizer_config)
tokenizer.load_assets(f'{gemma_path}/assets/tokenizer')
preprocessor = GemmaCausalLM.preprocessor_cls(tokenizer=tokenizer)
gemma_lm = GemmaCausalLM(backbone=backbone, preprocessor=preprocessor)
response = gemma_lm.generate(["Keras is a"], max_length=10)
Thanks.