This show has been flagged as Explicit by the host.
OpenWebUI notes ...
Open WebUI installer: https://github.com/freeload101/SCRIPTS/blob/master/Bash/OpenWebUI_Fast.bash
Older Professor synapse prompt you can use: https://raw.githubusercontent.com/freeload101/SCRIPTS/refs/heads/master/Prof%20Synapse%20Old.txt
Fabric prompts you can import into openwebui !!! ( https://github.com/danielmiessler/fabric/tree/main/patterns
) https://github.com/freeload101/SCRIPTS/blob/master/MISC/Fabric_Prompts_Open_WebUI_OpenWebUI_20241112.json
Example AT windows task startup script to make it start and not
die on boot https://github.com/freeload101/SCRIPTS/blob/master/MISC/StartKokoro.xml
Open WebUI RAG fail sause ... https://youtu.be/CfnLrTcnPtY
Open registration
Model list / order
NAME ID SIZE MODIFIED
hf.co/mradermacher/L3-8B-Stheno-v3.2-i1-GGUF:Q4_K_S 017d7a278e7e 4.7 GB 2 days ago
qwen2.5:32b 9f13ba1299af 19 GB 3 days ago
deepsex:latest c83a52741a8a 20 GB 3 days ago
HammerAI/openhermes-2.5-mistral:latest d98003b83e17 4.4 GB 2 weeks ago
Sweaterdog/Andy-3.5:latest d3d9dc04b65a 4.7 GB 2 weeks ago
nomic-embed-text:latest 0a109f422b47 274 MB 2 weeks ago
deepseek-r1:32b 38056bbcbb2d 19 GB 4 weeks ago
psyfighter2:latest c1b3d5e5be73 7.9 GB 2 months ago
CognitiveComputations/dolphin-llama3.1:latest ed9503dedda9 4.7 GB 2 months ago
Disable Arena models
Documents WIP RAG is not good .
Discord notes;
https://discord.com/channels/1170866489302188073/1340112218808909875
Abhi Chaturvedi: @(Operat0r) try this To reduce latency
and improve accuracy, modify the .env file: Enable RAG
ENABLE_RAG=true
Use Hybrid Mode (Retrieval + Reranking for better context)
RAG_MODE=hybrid
Reduce the number of retrieved documents (default: 5)
RETRIEVAL_TOP_K=3
Use a Fast Embedding Model (instead of OpenAI's Ada-002)
EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster and lightweight .
Optimize the Vector Database VECTOR_DB_TYPE=chroma
CHROMA_DB_IMPL=hnsw # Faster search
CHROMA_DB_PATH=/root/open-webui/backend/data/vector_db.
Optimize Backend Performance # Increase Uvicorn worker count
(improves concurrency) UVICORN_WORKERS=4
Increase FastAPI request timeout (prevents RAG failures)
FASTAPI_TIMEOUT=60
Optimize database connection pool (for better query
performance)
SQLALCHEMY_POOL_SIZE=10
So probably the first thing to do is increase the top K value
in admin -> settings -> documents, or you could try the
new "full context mode" for rag documents. You may also need
to increase the context size on the model, but it will make it
slower, so you probably don't want to do that unless you start
seeing the "truncating input" warnings.
@JamesK
So probably the first thing to do is increase the top K value
in admin -> settings -> documents, or you could try the
new "full context mode" for rag documents. You may also need
to increase the context size on the model, but it will make it
slower, so you probably don't want to do that unless you start
seeing the "truncating input" warnings.
M]
JamesK: Ah, I see. The rag didn't work great for you in
this prompt. There are three hits and the first two are
duplicates, so there isn't much data for the model to work
with
[9:12 PM] JamesK: context section
I see a message warning that you are using the default 2048
context length, but not the message saying you've hit that
limit (from my logs the warning looks like
level=WARN source=runner.go:126 msg="truncating input prompt"
limit=32768 prompt=33434 numKeep=5
[6:06 AM] JamesK: If you set the env var OLLAMA_DEBUG=1
before running ollama serve it will dump the full prompt being
sent to the model, that should let you confirm what the rag
has put in the prompt
JamesK: Watch the console output from ollama and check for
warnings about overflowing the context. If you have the
default 2k context you may need to increase it until the
warnings go away
[8:58 PM] JamesK: But also, if you're using the default
rag, it chunks the input into small fragments, then matches
the fragments against your prompt and only inserts a few
fragments into the context, not the entire document. So it's
easily possible for the information you want to not be
present.
Auto updates
echo '0,12 */4 * * * docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui' >> /etc/crontab
Search
red note for API keys
Go to Google Developers, use
Programmable Search Engine
, and log on or create account.
Go to
control panel
and click
Add
button
Enter a search engine name, set the other properties to suit
your needs, verify you're not a robot and click
Create
button.
Generate
API key
and get the
Search engine ID
. (Available after the engine is created)
With
API key
and
Search engine ID
, open
Open WebUI Admin panel
and click
Settings
tab, and then click
Web Search
Enable
Web search
and Set
Web Search Engine
to
google_pse
Fill
Google PSE API Key
with the
API key
and
Google PSE Engine Id
(# 4)
Click
Save
Note
You have to enable
Web search
in the prompt field, using plus (
+
) button. Search the web ;-)
Kokoro / Open Webui
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file
apt update
apt upgrade
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
apt install docker.io -y
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2
http://localhost:8880/v1
af_bella
Import fabric prompts
https://raw.githubusercontent.com/freeload101/Python/46317dee34ebb83b01c800ce70b0506352ae2f3c/Fabric_Prompts_Open_WebUI_OpenWebUI.py
Provide feedback on this episode.