New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] Phi3poc #2301

Open

JessicaXYWang wants to merge 11 commits into microsoft:master from JessicaXYWang:phi3poc

Contributor

JessicaXYWang commented Oct 15, 2024

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Briefly describe the changes included in this Pull Request.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

JessicaXYWang added 2 commits

September 12, 2024 09:28

poc

f0c2b00

poc

603777a

JessicaXYWang requested a review from mhamilton723 as a code owner

October 15, 2024 06:12


          Merge branch 'master' into phi3poc

47ae241

Contributor Author

JessicaXYWang commented Oct 15, 2024

/azp run

azure-pipelines bot commented Oct 15, 2024

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter commented Oct 15, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.48%. Comparing base (17e06b2) to head (6efa59c).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2301   +/-   ##
=======================================
  Coverage   84.48%   84.48%           
=======================================
  Files         328      328           
  Lines       16799    16799           
  Branches     1498     1498           
=======================================
+ Hits        14192    14193    +1     
+ Misses       2607     2606    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JessicaXYWang added 2 commits

October 15, 2024 08:02


          rename module

23f8ca0


          Merge branch 'phi3poc' of https://github.com/JessicaXYWang/SynapseML …

bb5b2b6

…into phi3poc

Contributor Author

JessicaXYWang commented Oct 15, 2024

/azp run

azure-pipelines bot commented Oct 15, 2024

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang added 2 commits

October 17, 2024 15:03


          update dependency

f235535


          Merge branch 'master' into phi3poc

f2ab308

Contributor Author

JessicaXYWang commented Oct 17, 2024

/azp run

azure-pipelines bot commented Oct 17, 2024

Azure Pipelines successfully started running 1 pipeline(s).


          add set device type

3ee9168

Contributor Author

JessicaXYWang commented Oct 24, 2024

/azp run

azure-pipelines bot commented Oct 24, 2024

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang added 3 commits

January 2, 2025 08:11


          add Downloader

b30f168


          remove import

d760733


          Merge branch 'master' into phi3poc

6efa59c

Contributor Author

JessicaXYWang commented Jan 2, 2025

/azp run

azure-pipelines bot commented Jan 2, 2025

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+              import shutil
+              import sys
+              class Peekable:

Collaborator

mhamilton723 Jan 2, 2025

nit: _PeekableIterator

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

		return self._cache[:n]


		class ModelParam:

Collaborator

mhamilton723 Jan 2, 2025

nit: _ModelParam

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

		return self.param


		class ModelConfig:

Collaborator

mhamilton723 Jan 2, 2025

nit: _ModelConfig

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

		self.config.update(kwargs)


		def camel_to_snake(text):

Collaborator

mhamilton723 Jan 2, 2025

there might already be one in library to use

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                      "output column",
+                      typeConverter=TypeConverters.toString,
+                  )
+                  modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")

Collaborator

mhamilton723 Jan 2, 2025

Maybe explain difference between model params and other params (you can just link to other docs if easier)

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                      typeConverter=TypeConverters.toString,
+                  )
+                  modelParam = Param(Params._dummy(), "modelParam", "Model Parameters")
+                  modelConfig = Param(Params._dummy(), "modelConfig", "Model configuration")

Collaborator

mhamilton723 Jan 2, 2025

Maybe explain difference between model config and other params (you can just link to other docs if easier)

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  useFabricLakehouse = Param(
+                      Params._dummy(),
+                      "useFabricLakehouse",
+                      "Use FabricLakehouse",

Collaborator

mhamilton723 Jan 2, 2025 •

edited

Loading

If this is for a local cache then you might be able to make the verbage generic like useLocalCache

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  deviceMap = Param(
+                      Params._dummy(),
+                      "deviceMap",
+                      "device map",

Collaborator

mhamilton723 Jan 2, 2025

might need to explain a bit more about this param and what it takes

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  torchDtype = Param(
+                      Params._dummy(),
+                      "torchDtype",
+                      "torch dtype",

Collaborator

mhamilton723 Jan 2, 2025

likewise here

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  def load_model(self):
+                      """
+                      Loads model and tokenizer either from Fabric Lakehouse or the HuggingFace Hub,
+                      depending on the 'useFabricLakehouse' param.

Collaborator

mhamilton723 Jan 2, 2025

if you name it more generically place that name here

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                      "Use FabricLakehouse",
+                      typeConverter=TypeConverters.toBoolean,
+                  )
+                  lakehousePath = Param(

Collaborator

mhamilton723 Jan 2, 2025

might be able to get rid of earlier param just check if this is None

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

Comment on lines +163 to +166

+                      if self.getUseFabricLakehouse():
+                          local_path = (
+                              self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}"
+                          )

Collaborator

mhamilton723 Jan 2, 2025

switch to just use cachePath and then in our docs well say this is a good place to store things

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                      if self.getUseFabricLakehouse():
+                          local_path = (
+                              self.getLakehousePath() or f"/lakehouse/default/Files/{model_name}"

Collaborator

mhamilton723 Jan 2, 2025

nit: hf_cache

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  def __init__(
+                      self,
+                      base_cache_dir="./cache",
+                      base_url="https://mmlspark.blob.core.windows.net/huggingface/",

Collaborator

mhamilton723 Jan 2, 2025

lets use

%sh
azcopy cp https://mmlspark.blob.core.windows.net/huggingface/blah /lakehouse/blah

Collaborator

mhamilton723 Jan 2, 2025

Youy can also put in the mardown cell a little explanation of this and how its just for a speedup otherwise it will download from the huggingface hub

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  def _predict_single_chat(self, prompt, model, tokenizer):
+                      param = self.getModelParam().get_param()
+                      chat = [{"role": "user", "content": prompt}]

Collaborator

mhamilton723 Jan 2, 2025

if the prompt is a list, then assume its of structure of "chat"

mhamilton723 reviewed

View reviewed changes

core/src/main/python/synapse/ml/llm/HuggingFaceCausallmTransform.py

+                  def _predict_single_chat(self, prompt, model, tokenizer):
+                      param = self.getModelParam().get_param()
+                      chat = [{"role": "user", "content": prompt}]

Collaborator

mhamilton723 Jan 2, 2025 •

edited

Loading

Suggested change

      
                    chat = [{"role": "user", "content": prompt}]
          
                    if isinstance(prompt, list):
          
                            chat = prompt
          
                    else:
          
                            chat = [{"role": "user", "content": prompt}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet