Inconsistent completion for identical prompts and params with llama.cpp python and ctransformer

90 Views Asked by JayabalanAaron At 18 February 2024 at 11:29

I've been comparing various langchain compatible llama2 runtimes, using langchain llm chain. Having the following parameter overrides:

# llama.cpp:
    model_path="../llama.cpp/models/generated/codellama-instruct-7b.ggufv3.Q5_K_M.bin",

    n_ctx = 2048,
    max_tokens = 2048,
    temperature = 0.85,
    top_k = 40,
    top_p = 0.95,
    repeat_penalty = 1.1,
    seed = 112358,

# ctransformer:
    model="../llama.cpp/models/generated/codellama-instruct-7b.ggufv3.Q5_K_M.bin",

    config={
        "context_length": 2048,
        "max_new_tokens": 2048,
        "temperature": 0.85,
        "top_k": 40,
        "top_p": 0.95,
        "repetition_penalty" :1.1,
        "seed" : 112358
    },

The model is derived from original codellama-7b-instruct, using methods suggested for llama.cpp.

The system and user prompts are the same. And the prompt template is from the codellama paper.

template = """<s>[INST] <<SYS>>
{system}
<</SYS>>

{user} [/INST]"""

system = """You are very helpful coding assistant who can write complete and correct programs in various programming languages, expecially in java and scala."""

The ctransformer based completion is adequate, but the llama.cpp completion is qualitatively bad, often incomplete, repetitive, and sometimes stuck in a repeat loop.

Apart from the overrides, I have verified that the defaults AFAIK are the same for both implementations.

What aspects can I check more, to bring llama.cpp to behave the same, since I'm more interested in using llama.cpp.

Original Q&A

Inconsistent completion for identical prompts and params with llama.cpp python and ctransformer

There are 0 best solutions below

Related Questions in LANGCHAIN

Related Questions in LLAMA-CPP-PYTHON

Related Questions in LLAMACPP

Related Questions in CTRANSFORMERS

Trending Questions

Popular # Hahtags

Popular Questions