Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor async engine & turbomind IO #2968

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Dec 27, 2024

TODO

  • output logits
  • output logprobs
  • output hidden states

@zhyncs
Copy link
Collaborator

zhyncs commented Dec 27, 2024

It’s amazing!

@lvhan028
Copy link
Collaborator

FINALLY!!

prompts = [prompt for prompt, _, _ in requests]
gen_configs = [
GenerationConfig(temperature=temperature,
top_p=top_p,
top_k=top_k,
ignore_eos=True,
do_sample=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do_sample True?

sequence_start=True,
sequence_end=True,
stream_output=stream_output)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it works for pytorch engine

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 6, 2025

stop_words should be trimmed

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 6, 2025

when backend is pt engine, the pipeline cannot be destroyed successfully.

from lmdeploy import pipeline, PytorchEngineConfig
model_path =internlm2_5-7b-chatengine_config = PytorchEngineConfig()
pipe = pipeline(model_path, backend_config=engine_config, log_level='INFO')
response = pipe('hi')

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 7, 2025

I will fix get_ppl and saving csv in profile_throughput in another PR

@lvhan028 lvhan028 removed the WIP label Jan 9, 2025
@zhulinJulia24
Copy link
Collaborator

@lzhangzz output tokens doubles when ignore_eos is true

from lmdeploy.messages import (GenerationConfig, PytorchEngineConfig,
                               TurbomindEngineConfig)
from lmdeploy import pipeline

gen_config = GenerationConfig(ignore_eos=True, max_new_tokens=10)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'], gen_config=gen_config)
print(response)

I suppose the generate_token_len shoud be 10. but it's 20 actually, the output is:

[Response(text='你好!我是书生·浦语,由你好!我是书生·浦语,由', generate_token_len=20, input_token_len=108, finish_reason='length', token_ids=[77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620, 77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620], logprobs=None, logits=None, last_hidden_state=None, index=0), Response(text='上海,作为中国最大的城市之一,不仅是中国上海,作为中国最大的城市之一,不仅是中国', generate_token_len=20, input_token_len=105, finish_reason='length', token_ids=[68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543, 68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543], logprobs=None, logits=None, last_hidden_state=None, index=1)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants