Graphs in llama.cpp #11039
Replies: 1 comment
-
I'll try to answer and hopefully others can correct me if I'm wrong about anything here. The two computation graphs are for the The worst case is for the case where the prefill prompt is completely filled with the maximum number of tokens. Regarding the kv cache I think this because if a k-shift is needed or a defragmentation, then the scheduler is reset and a new graph is built ( |
Beta Was this translation helpful? Give feedback.
-
Hi
I'm trying to debug and understand the codebase of llama.cpp
I have a couple of questions regarding graph creation.
Beta Was this translation helpful? Give feedback.
All reactions