Inconsistent completion for identical prompts and params with llama.cpp python and ctransformer

90 Views Asked by At

I've been comparing various langchain compatible llama2 runtimes, using langchain llm chain. Having the following parameter overrides:

# llama.cpp:
    model_path="../llama.cpp/models/generated/codellama-instruct-7b.ggufv3.Q5_K_M.bin",

    n_ctx = 2048,
    max_tokens = 2048,
    temperature = 0.85,
    top_k = 40,
    top_p = 0.95,
    repeat_penalty = 1.1,
    seed = 112358,

# ctransformer:
    model="../llama.cpp/models/generated/codellama-instruct-7b.ggufv3.Q5_K_M.bin",

    config={
        "context_length": 2048,
        "max_new_tokens": 2048,
        "temperature": 0.85,
        "top_k": 40,
        "top_p": 0.95,
        "repetition_penalty" :1.1,
        "seed" : 112358
    },

The model is derived from original codellama-7b-instruct, using methods suggested for llama.cpp.

The system and user prompts are the same. And the prompt template is from the codellama paper.

template = """<s>[INST] <<SYS>>
{system}
<</SYS>>

{user} [/INST]"""

system = """You are very helpful coding assistant who can write complete and correct programs in various programming languages, expecially in java and scala."""

The ctransformer based completion is adequate, but the llama.cpp completion is qualitatively bad, often incomplete, repetitive, and sometimes stuck in a repeat loop.

Apart from the overrides, I have verified that the defaults AFAIK are the same for both implementations.

What aspects can I check more, to bring llama.cpp to behave the same, since I'm more interested in using llama.cpp.

0

There are 0 best solutions below