Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support model providers other than OpenAI and Azure #657

Open
natoverse opened this issue Jul 22, 2024 · 7 comments · May be fixed by #1575
Open

Support model providers other than OpenAI and Azure #657

natoverse opened this issue Jul 22, 2024 · 7 comments · May be fixed by #1575
Labels
community_support Issue handled by community members enhancement New feature or request

Comments

@natoverse
Copy link
Collaborator

natoverse commented Jul 22, 2024

Right now GraphRAG only natively supports models hosted by OpenAI and Azure. Many users would like to run additional models, including alternate APIs, SLMs, or models running locally. As a research team with limited bandwidth it is unlikely we will add native support for more model providers in the near future. Our focus is on memory structures and algorithms to improve LLM information retrieval, and we've got a lot of experiments in the queue!

There are alternative options to achieve extensibility, and many GraphRAG users have had luck extending the library. So far we've seen this most commonly with Ollama, which runs on localhost and supports a very wide variety of models. This approach depends on Ollama supporting the standard OpenAI API for chat completion and embeddings so it can proxy our API calls, and it looks like this is working for a lot of folks (though may require some hacking).

Please note: while we are excited to see GraphRAG used with more models, our team will not have time to help diagnose issues. We'll do our best to route bug reports to existing conversations that might be helpful. For the most part you should expect that if you file a bug related to running an alternate solution, we'll link to this issue, a relevant conversation if we're aware of one, and then we'll close the bug.

Here is a general discussion regarding OSS LLMs: #321.

And a couple of popular Ollama-related issues: #339 and #345. We'll link to others in the comments when relevant.

Have a look at issues tagged with the community_support label as well.

@natoverse natoverse added the enhancement New feature or request label Jul 22, 2024
@natoverse natoverse pinned this issue Jul 22, 2024
@natoverse natoverse added the community_support Issue handled by community members label Jul 22, 2024
@natoverse
Copy link
Collaborator Author

@Mxk-1 has found chunking settings that help resolve issues with create_base_entity_graph with Ollama:

The chunk splitting in the original setting.yaml provided may not be suitable for the model launched with Ollama, as it could be either too large or too small, leading to errors in the model's responses. The original paper mentioned using the GPT-4o model, while the model I deployed locally is Gemma2:9b via Ollama. These two models differ in size and performance.

Additionally, since the pipeline relies on prompt-based Q&A with the text, the prompt itself takes up some of the model's processing length. By adjusting the chunk_size, I was able to successfully run the experiment. If you encounter this issue, try increasing or decreasing the chunk_size. If you have a better solution, feel free to discuss it with me.

@MortalHappiness
Copy link

I collected several comments scattered across different issues and created a monkey patch script along with a working setting for Ollama. It has been tested on version 0.3.2 and works properly. I’m sharing it for those who might need it: https://gist.github.com/MortalHappiness/7030bbe96c4bece8a07ea9057ba18b86.

I’m not sure if it’s appropriate to comment here, so if the reviewers think it’s not, I'll delete this comment and post it in a more suitable place. Thank you in advance!

@satyaloka93
Copy link

Please consider just using allowing openai compatible endpoint (vllm/llama-server) for llm and embedding model. I can get it to work for the normal llm, but not to make embeddings (nomic-embed-text) via llama-server. Please don't shoehorn this into only ollama, yet creating another niche constriction for usability.

@MortalHappiness does your monkey-patch work with other openai locally hosted endpoints that are not ollama?

@ishaan-jaff
Copy link

Hi @natoverse @MortalHappiness @satyaloka93 made a PR to add litellm - support for Ollama, Vertex AI, Gemini, Anthropic, Bedrock (100+LLMs)

#1575

adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/
LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-4o.

Example

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-cohere-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)

# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)

Response (OpenAI Format)

{
    "id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
    "created": 1734366691,
    "model": "claude-3-sonnet-20240229",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 43,
        "prompt_tokens": 13,
        "total_tokens": 56,
        "completion_tokens_details": null,
        "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": 0
        },
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members enhancement New feature or request
Projects
None yet
5 participants