The Gradient @thegradient

Anand Philip @anandphilipc@sigmoid.social

i'm using llama.cpp and llamaindex to run a query engine-chatbot. The text it needs to query over is very short. Each generation takes about 2 minutes. running a 7b parameter gguf file locally. Is this normal? #LlamaIndex #chatbots #help

Apr 29, 2024, 09:51 PM··Web

0boosts·0favorites

**Anand Philip** @anandphilipc · May 1, 2024

May 1, 2024

Anand Philip @anandphilipc

I tried this with a bigger GPU and all the layers offloaded to the GPU and it was spiffy. Solved

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back