sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

656
active users

Anand Philip

i'm using llama.cpp and llamaindex to run a query engine-chatbot. The text it needs to query over is very short. Each generation takes about 2 minutes. running a 7b parameter gguf file locally. Is this normal?

I tried this with a bigger GPU and all the layers offloaded to the GPU and it was spiffy. Solved