I am getting a bus error when trying to initialize "TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ" model from Huggingface,
self.model = LLM(
model="TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ",
quantization="awq",
dtype="auto",
tensor_parallel_size=tensor_parallel_size,
)
The error i get is:
ERROR 2024-02-16T21:46:33.751635551Z *** SIGBUS received at time=1708119993 on cpu 67 *** ERROR 2024-02-16T21:46:33.754929304Z PC: @ 0x7e9291287a37 (unknown) ncclShmOpen() ERROR 2024-02-16T21:46:33.755156517Z @ 0x7e9456f02520 3456 (unknown) ERROR 2024-02-16T21:46:33.756768941Z @ 0x74352d6c63636e2f (unknown) (unknown) ERROR 2024-02-16T21:46:33.756790876Z [2024-02-16 21:46:33,756 E 1 6357] logging.cc:361: *** SIGBUS received at time=1708119993 on cpu 67 *** ERROR 2024-02-16T21:46:33.756800651Z [2024-02-16 21:46:33,756 E 1 6357] logging.cc:361: PC: @ 0x7e9291287a37 (unknown) ncclShmOpen() ERROR 2024-02-16T21:46:33.758085489Z [2024-02-16 21:46:33,758 E 1 6357] logging.cc:361: @ 0x7e9456f02520 3456 (unknown) ERROR 2024-02-16T21:46:33.759702920Z [2024-02-16 21:46:33,759 E 1 6357] logging.cc:361: @ 0x74352d6c63636e2f (unknown) (unknown)
I have tried with 2/4/8 NVIDIA_L4 GPUs,
Dockerfile
FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04 as builder
...
# install deps/poetry/etc..
# install project deps and other deps i don't need locally:
RUN poetry add vllm\
accelerate\
deepspeed\
auto-gptq\
optimum\
peft\
transformers\
flax==0.8.0\
torch==2.1.2\
tensorflow\
bitsandbytes\
autoawq
Also, this log might be important to understand:
Initializing an LLM engine with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, seed=0)" }
Thanks!