Fine-tuning Llama2-7B with Instruction-tuning data

By: Kevin Vegda
Posted: April 14, 2024

I recently carried out Instruction-tuning on Llama2-7B with Parameter-Efficient Fine-Tuning methods like QLoRA. Here is what I learned.

Fine-tuning LLMs is not an easy task, especially considering the kind of hardware you need to do it. Most models above the 3 billion parameters mark won’t even fit on consumer hardware and require special GPUs for AI like the NVidia A100 (80GB). That is what I used for fine-tuning the Llama 2 7B model which is openly available off HuggingFace with instruction-tuning data that was generated by GPT4.

Llama 2 (7B) finetuned on 50k instruction-tuning data produced with GPT4. The data is from here. The authors of the data used GPT4 to generate 52k instruction-tuning data because the original data used for the Alpaca model was generated with GPT3.5.

Training the last few layers

The first thing I did was train the last 8 (of 32) models of the model while keeping the rest frozen. This is a common way to fine-tune a model for certain tasks. (like Instruction-following, question-answering, summarization, etc.) It can be a little parameter-heavy, though. And even on the A100 it took ~1.5 hours of training.

PEFT

I followed this with training using the PEFT methods available on HuggingFace Transformers. (which now natively supports LoRA as well as QLoRA)

The model can be found on Huggingface.

The GPT4-based eval can be seen here