Really? You can't even fine-tune a quantized 8B model on such a machine? That's ...

lostmsu · on July 24, 2024

You can't really fine-tune quantized models as-is. Gradients require floating-point calculations at descent precision.

BaculumMeumEst · on July 24, 2024

I see... I'm a little out of my depth, but I did see that Meta has some documentation that talks about fine-tuning using LoRA and QLoRA, and at a glance it sounds like its doable on consumer hardware. Are you familiar with those? Am I misunderstanding? https://llama.meta.com/docs/how-to-guides/fine-tuning/

lostmsu · on July 24, 2024

Both still compute FP16 gradients. It comes back to the original point that Mac is just going to be too slow even on 8B model.