Refactor async engine & turbomind IO #2968

lzhangzz · 2024-12-27T11:34:23Z

TODO

output logits
output logprobs
output hidden states

zhyncs · 2024-12-27T11:35:36Z

It’s amazing!

lvhan028 · 2024-12-27T11:37:38Z

FINALLY!!

lvhan028 · 2025-01-03T07:10:28Z

benchmark/profile_pipeline_api.py

        prompts = [prompt for prompt, _, _ in requests]
        gen_configs = [
            GenerationConfig(temperature=temperature,
                             top_p=top_p,
                             top_k=top_k,
                             ignore_eos=True,
+                             do_sample=True,


why do_sample True?

lvhan028 · 2025-01-03T07:15:23Z

benchmark/profile_throughput.py

+                sequence_start=True,
+                sequence_end=True,
+                stream_output=stream_output)
+            try:


I am not sure if it works for pytorch engine

lvhan028 · 2025-01-06T10:08:25Z

stop_words should be trimmed

lvhan028 · 2025-01-06T12:45:57Z

when backend is pt engine, the pipeline cannot be destroyed successfully.

from lmdeploy import pipeline, PytorchEngineConfig
model_path = ‘internlm2_5-7b-chat’
engine_config = PytorchEngineConfig()
pipe = pipeline(model_path, backend_config=engine_config, log_level='INFO')
response = pipe('hi')

lvhan028 · 2025-01-07T13:49:18Z

I will fix get_ppl and saving csv in profile_throughput in another PR

zhulinJulia24 · 2025-01-09T08:47:04Z

@lzhangzz output tokens doubles when ignore_eos is true

from lmdeploy.messages import (GenerationConfig, PytorchEngineConfig,
                               TurbomindEngineConfig)
from lmdeploy import pipeline

gen_config = GenerationConfig(ignore_eos=True, max_new_tokens=10)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'], gen_config=gen_config)
print(response)

I suppose the generate_token_len shoud be 10. but it's 20 actually, the output is:

[Response(text='你好！我是书生·浦语，由你好！我是书生·浦语，由', generate_token_len=20, input_token_len=108, finish_reason='length', token_ids=[77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620, 77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620], logprobs=None, logits=None, last_hidden_state=None, index=0), Response(text='上海，作为中国最大的城市之一，不仅是中国上海，作为中国最大的城市之一，不仅是中国', generate_token_len=20, input_token_len=105, finish_reason='length', token_ids=[68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543, 68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543], logprobs=None, logits=None, last_hidden_state=None, index=1)]

lzhangzz added 17 commits November 26, 2024 17:10

refactor

7b8a841

Merge remote-tracking branch 'origin/main' into refactor-1

160ba9e

async interface

382f92b

Merge remote-tracking branch 'origin/main' into refactor-3

9c56be8

update perf metrics & adaptive tokens per tick

2cf49bd

wait-free

aa5573d

refactor gateway

6378aaa

optimize throughput

9812538

add cancel cb

8baa784

simplify async engine

1bc68d1

simplify async engine

f220762

fix end session

31c6223

faster synchronization

b3d15b1

fix async engine

c6fd260

refactor async engine

8fa85dc

fix semaphore

3f07733

refactor inference API

2382d7e

lvhan028 requested review from lvhan028, irexyc and AllentDan December 27, 2024 11:37

lvhan028 added the improvement label Dec 27, 2024

lzhangzz added the WIP label Dec 27, 2024

lzhangzz added 2 commits December 27, 2024 20:04

remove turbomind sync interface

747252c

Merge remote-tracking branch 'origin/main' into refactor-3

54df9f1

This was referenced Dec 28, 2024

add a thread pool executor to control the vl engine traffic #2970

Merged

Remove threadsafe #2907

Merged

lzhangzz added 2 commits January 1, 2025 18:11

fix msvc build

5266f27

fix msvc build

33ad2be

fix msvc build

1c20608

lvhan028 reviewed Jan 3, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into refactor-3

6d1d209

lzhangzz added 4 commits January 6, 2025 22:27

add extra outputs

43020b5

skip stop tokens

8412518

exit gracefully

3409742

cancel all tasks atexit

21a7553

lzhangzz added 11 commits January 7, 2025 22:24

refactor profiler

49701df

fix id2step for api server

f4b37af

save csv

2644fb7

fix interactive

6029a2e

fix lint

50fdb68

fix generate_token_len

e2ed1a2

fix async_end

21432bf

update pipeline ut

ad0e07c

fix ignore eos

4186da5

minor

bee78b6

refactor profile pipeline api

5f02cad

lvhan028 removed the WIP label Jan 9, 2025

fix stop ids

1965327

fix duplication

7b513cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor async engine & turbomind IO #2968

Refactor async engine & turbomind IO #2968

lzhangzz commented Dec 27, 2024 •

edited

Loading

zhyncs commented Dec 27, 2024

lvhan028 commented Dec 27, 2024

lvhan028 Jan 3, 2025

lvhan028 Jan 3, 2025

lvhan028 commented Jan 6, 2025

lvhan028 commented Jan 6, 2025

lvhan028 commented Jan 7, 2025

zhulinJulia24 commented Jan 9, 2025

Refactor async engine & turbomind IO #2968

Are you sure you want to change the base?

Refactor async engine & turbomind IO #2968

Conversation

lzhangzz commented Dec 27, 2024 • edited Loading

zhyncs commented Dec 27, 2024

lvhan028 commented Dec 27, 2024

lvhan028 Jan 3, 2025

Choose a reason for hiding this comment

lvhan028 Jan 3, 2025

Choose a reason for hiding this comment

lvhan028 commented Jan 6, 2025

lvhan028 commented Jan 6, 2025

lvhan028 commented Jan 7, 2025

zhulinJulia24 commented Jan 9, 2025

lzhangzz commented Dec 27, 2024 •

edited

Loading