Too bad larger LLMs require ungodly amounts of VRAM, or self-hosting would be a good alternative. LLMs have their uses, but they’re not useful enough to put up with ads.
You can get by surprisingly well on 20b parameter models using a Mac with decent ram or even 8b parameter models that fit on most high end (eg 16gb) model cards. Depends on your use cases but I almost exclusively use smaller local models.
I have a RTX 4060 low-profile in my 2U server and am limited to 8GB of VRAM for anything I self-host at the moment. I may consider a larger chassis with a better GPU in the future.
Too bad larger LLMs require ungodly amounts of VRAM, or self-hosting would be a good alternative. LLMs have their uses, but they’re not useful enough to put up with ads.
You can get by surprisingly well on 20b parameter models using a Mac with decent ram or even 8b parameter models that fit on most high end (eg 16gb) model cards. Depends on your use cases but I almost exclusively use smaller local models.
I have a RTX 4060 low-profile in my 2U server and am limited to 8GB of VRAM for anything I self-host at the moment. I may consider a larger chassis with a better GPU in the future.