You might also try https://github.com/Mozilla-Ocho/llamafile , which may have better CPU-only performance than ollama. It does require you to grab .gguf files yourself (unless you use one of their prebuilts in which case it comes with the binary!), but with that done it's really easy to use and has decent performance.
Be careful running this on work machines – it will get flagged by Crowdstrike Falcon and probably other EDR tools. In my case the first time I tried it, I just saw “Killed” and then got a DM from SecOps within two minutes.
the irony, preventing and killing something that is actually useful, while we let crowdcrap hum along consuming tons of memory and bottlenecking IO so it can do snakeoil things...
Nah nothing to do with LLMs, it’s just because the method of Llamafile is very similar to malware - basically zip up an executable, concatenate it with some stuff, throw it in /tmp and execute it with a randomly generated high entropy name.
(That said, after I explained it to SecOps they did tell me I would need to “consult legal” if I wanted to use a local LLM, but I’ll give them the benefit of the doubt there…)
Good call out; in my context yes I do want it listening for use by other machines in its subnets and deliberately set that option (including using the IPv6 form), but most people are probably better off binding to loopback. Thanks
For reference, this is how I run it:
And then but you can just run that ExecStart command directly and it works.