Ubuntu not uses swap as much as needed

46 Views Asked by At

I'm trying to run Gemma-2b on my computer with 16GB of RAM, but it gives me an error below:

2024-03-19 16:23:38.722556: I external/local_tsl/tsl/framework/bfc_allocator.cc:1112] Sum Total of in-use chunks: 4.78GiB
2024-03-19 16:23:38.722568: I external/local_tsl/tsl/framework/bfc_allocator.cc:1114] Total bytes in pool: 8589934592 memory_limit_: 16352886784 available bytes: 7762952192 curr_region_allocation_bytes_: 8589934592
2024-03-19 16:23:38.722589: I external/local_tsl/tsl/framework/bfc_allocator.cc:1119] Stats: 
Limit:                     16352886784
InUse:                      5129633792
MaxInUse:                   6157238272
NumAllocs:                         509
MaxAllocSize:               1073741824
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2024-03-19 16:23:38.722623: W external/local_tsl/tsl/framework/bfc_allocator.cc:499] ************************_______________________________________*************************************
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Out of memory while trying to allocate 11685516680 bytes. [Op:__inference_generate_step_12720]

Also, I have around 160GB of swap.

$ swapon
NAME                     TYPE      SIZE USED PRIO
/dev/nvme0n1p7           partition  60G   0B   -2
/media/.../ram           file      100G 768K    0

$ sysctl vm.swappiness                
vm.swappiness = 100

I have more than enough swap partition to cover the needed RAM, and I'm also running it on the CPU and not the GPU. The problem is, when I run this code on my Mac with 16GB of RAM, it uses the swap and runs smoothly. I am wondering why Ubuntu cannot manage memory as efficiently as Mac. What can I do to make Ubuntu use the swap partition to run the code? I even ran Gemma-7b with 50GB of swap memory usage on Mac like a piece of cake.

Here is the code download the model and run the model offline to see the result

import tensorflow as tf

tf.keras.backend.set_floatx('float16')

from keras_nlp.src.models import GemmaCausalLM

from keras_nlp.src.backend import keras

def read_config(path):
    import json
    with open(path) as config_file:
        return json.load(config_file)


if __name__ == '__main__':
    gemma_path = '/opt/gemma_2b'
    config_path = f'{gemma_path}/config.json'
    tokenizer_path = f'{gemma_path}/tokenizer.json'

    config = read_config(config_path)
    tokenizer_config = read_config(tokenizer_path)

    cls = keras.saving.get_registered_object(config["registered_name"])

    backbone = keras.saving.deserialize_keras_object(config)
    backbone.load_weights(f'{gemma_path}/model.weights.h5')

    tokenizer = keras.saving.deserialize_keras_object(tokenizer_config)
    tokenizer.load_assets(f'{gemma_path}/assets/tokenizer')

    preprocessor = GemmaCausalLM.preprocessor_cls(tokenizer=tokenizer)
    gemma_lm = GemmaCausalLM(backbone=backbone, preprocessor=preprocessor)

    response = gemma_lm.generate(["Keras is a"], max_length=10)

Thanks.

0

There are 0 best solutions below