adding documentation on how to call local hf models

2025-07-21 04:50:39 +08:00 · 2023-09-08 09:59:44 -07:00
parent e90c9e5853
commit 1b098aea13
2 changed files with 28 additions and 0 deletions
--- a/Usage.md
+++ b/Usage.md
@ -169,6 +169,31 @@ in the configuration.toml

 #### Huggingface

+**Local**  
+You can run Huggingface models locally through either [VLLM](https://docs.litellm.ai/docs/providers/vllm) or [Ollama](https://docs.litellm.ai/docs/providers/ollama)
+
+E.g. to use a new Huggingface model locally via Ollama, set:
+```
+[__init__.py]
+MAX_TOKENS = {
+    "model-name-on-ollama": <max_tokens>
+}
+e.g.
+MAX_TOKENS={
+    ...,
+    "llama2": 4096
+}
+
+
+[config] # in configuration.toml
+model = "ollama/llama2"
+
+[ollama] # in .secrets.toml
+api_base = ... # the base url for your huggingface inference endpoint 
+```
+
+**Inference Endpoints**
+
 To use a new model with Huggingface Inference Endpoints, for example, set:
 ```
 [__init__.py]
--- a/pr_agent/settings/.secrets_template.toml
+++ b/pr_agent/settings/.secrets_template.toml
@ -29,6 +29,9 @@ key = "" # Optional, uncomment if you want to use Replicate. Acquire through htt
 key = "" # Optional, uncomment if you want to use Huggingface Inference API. Acquire through https://huggingface.co/docs/api-inference/quicktour
 api_base = "" # the base url for your huggingface inference endpoint 

+[ollama]
+api_base = "" # the base url for your huggingface inference endpoint 
+
 [github]
 # ---- Set the following only for deployment type == "user"
 user_token = ""  # A GitHub personal access token with 'repo' scope.