Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add block_size arg to chat api #2986

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

JackWeiw
Copy link
Contributor

@JackWeiw JackWeiw commented Jan 3, 2025

Motivation

Since camb device backend only supports block_size=16 for paged_attention relevant ops, we need to enable control of block_size.

This PR add cache_block_seq_len arg parser to lmdeploy chat API.

After this PR, we can use lmdeploy chat /Shanghai_AI_Laboratory/internlm2_5-7b --backend pytorch --device camb --cache-block-seq-len 16 to chat on camb device

@jinminxi104 jinminxi104 marked this pull request as draft January 6, 2025 06:59
@grimoire grimoire requested a review from RunningLeon January 7, 2025 07:35
@@ -256,6 +259,12 @@ def chat(args):
if backend == 'pytorch':
from lmdeploy.messages import PytorchEngineConfig
from lmdeploy.pytorch.chat import run_chat
block_size = 64 # default block size
if args.device == 'camb':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, camb is not in the choices of --device,

choices: List[str] = ['cuda', 'ascend', 'maca']):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, camb is on WIP, will be merged as part of dlinfer backend in the future, we can apply this PR after camb main work is merged

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants