March 18, 2025

HPR4337: Open Web UI

32 minutes

This show has been flagged as Explicit by the host.

OpenWebUI notes ...

Open WebUI installer: https://github.com/freeload101/SCRIPTS/blob/master/Bash/OpenWebUI_Fast.bash

Older Professor synapse prompt you can use: https://raw.githubusercontent.com/freeload101/SCRIPTS/refs/heads/master/Prof%20Synapse%20Old.txt

Fabric prompts you can import into openwebui !!! ( https://github.com/danielmiessler/fabric/tree/main/patterns

) https://github.com/freeload101/SCRIPTS/blob/master/MISC/Fabric_Prompts_Open_WebUI_OpenWebUI_20241112.json

Example AT windows task startup script to make it start and not

die on boot https://github.com/freeload101/SCRIPTS/blob/master/MISC/StartKokoro.xml

Open WebUI RAG fail sause ... https://youtu.be/CfnLrTcnPtY

Open registration

Model list / order

NAME ID SIZE MODIFIED

hf.co/mradermacher/L3-8B-Stheno-v3.2-i1-GGUF:Q4_K_S 017d7a278e7e 4.7 GB 2 days ago

qwen2.5:32b 9f13ba1299af 19 GB 3 days ago

deepsex:latest c83a52741a8a 20 GB 3 days ago

HammerAI/openhermes-2.5-mistral:latest d98003b83e17 4.4 GB 2 weeks ago

Sweaterdog/Andy-3.5:latest d3d9dc04b65a 4.7 GB 2 weeks ago

nomic-embed-text:latest 0a109f422b47 274 MB 2 weeks ago

deepseek-r1:32b 38056bbcbb2d 19 GB 4 weeks ago

psyfighter2:latest c1b3d5e5be73 7.9 GB 2 months ago

CognitiveComputations/dolphin-llama3.1:latest ed9503dedda9 4.7 GB 2 months ago

Disable Arena models

Documents WIP RAG is not good .

Discord notes;

https://discord.com/channels/1170866489302188073/1340112218808909875

Abhi Chaturvedi: @(Operat0r) try this To reduce latency

and improve accuracy, modify the .env file: Enable RAG

ENABLE_RAG=true

Use Hybrid Mode (Retrieval + Reranking for better context)

RAG_MODE=hybrid

Reduce the number of retrieved documents (default: 5)

RETRIEVAL_TOP_K=3

Use a Fast Embedding Model (instead of OpenAI's Ada-002)

EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster and lightweight .

Optimize the Vector Database VECTOR_DB_TYPE=chroma

CHROMA_DB_IMPL=hnsw # Faster search

CHROMA_DB_PATH=/root/open-webui/backend/data/vector_db.

Optimize Backend Performance # Increase Uvicorn worker count

(improves concurrency) UVICORN_WORKERS=4

Increase FastAPI request timeout (prevents RAG failures)

FASTAPI_TIMEOUT=60

Optimize database connection pool (for better query

performance)

SQLALCHEMY_POOL_SIZE=10

So probably the first thing to do is increase the top K value

in admin -> settings -> documents, or you could try the

new "full context mode" for rag documents. You may also need

to increase the context size on the model, but it will make it

slower, so you probably don't want to do that unless you start

seeing the "truncating input" warnings.

@JamesK

So probably the first thing to do is increase the top K value

in admin -> settings -> documents, or you could try the

new "full context mode" for rag documents. You may also need

to increase the context size on the model, but it will make it

slower, so you probably don't want to do that unless you start

seeing the "truncating input" warnings.

JamesK: Ah, I see. The rag didn't work great for you in

this prompt. There are three hits and the first two are

duplicates, so there isn't much data for the model to work

with

[9:12 PM] JamesK: context section

I see a message warning that you are using the default 2048

context length, but not the message saying you've hit that

limit (from my logs the warning looks like

level=WARN source=runner.go:126 msg="truncating input prompt"

limit=32768 prompt=33434 numKeep=5

[6:06 AM] JamesK: If you set the env var OLLAMA_DEBUG=1

before running ollama serve it will dump the full prompt being

sent to the model, that should let you confirm what the rag

has put in the prompt

JamesK: Watch the console output from ollama and check for

warnings about overflowing the context. If you have the

default 2k context you may need to increase it until the

warnings go away

[8:58 PM] JamesK: But also, if you're using the default

rag, it chunks the input into small fragments, then matches

the fragments against your prompt and only inserts a few

fragments into the context, not the entire document. So it's

easily possible for the information you want to not be

present.

Auto updates

echo '0,12 */4 * * * docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui' >> /etc/crontab

red note for API keys

Go to Google Developers, use

Programmable Search Engine

, and log on or create account.

Go to

control panel

and click

Add

button

Enter a search engine name, set the other properties to suit

your needs, verify you're not a robot and click

Create

button.

Generate

API key

and get the

Search engine ID

. (Available after the engine is created)

With

API key

and

Search engine ID

, open

Open WebUI Admin panel

and click

Settings

tab, and then click

Web Search

Enable

Web search

and Set

Web Search Engine

google_pse

Fill

Google PSE API Key

with the

API key

and

Google PSE Engine Id

(# 4)

Click

Save

Note

You have to enable

Web search

in the prompt field, using plus (

) button. Search the web ;-)

Kokoro / Open Webui

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file

apt update

apt upgrade

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

apt install docker.io -y

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2

http://localhost:8880/v1

af_bella

Import fabric prompts

https://raw.githubusercontent.com/freeload101/Python/46317dee34ebb83b01c800ce70b0506352ae2f3c/Fabric_Prompts_Open_WebUI_OpenWebUI.py

Provide feedback on this episode.

...more

View all episodes

By Hacker Public Radio

4.2

3434 ratings

March 18, 2025

HPR4337: Open Web UI

32 minutes

This show has been flagged as Explicit by the host.

OpenWebUI notes ...

Open WebUI installer: https://github.com/freeload101/SCRIPTS/blob/master/Bash/OpenWebUI_Fast.bash

Older Professor synapse prompt you can use: https://raw.githubusercontent.com/freeload101/SCRIPTS/refs/heads/master/Prof%20Synapse%20Old.txt

Fabric prompts you can import into openwebui !!! ( https://github.com/danielmiessler/fabric/tree/main/patterns

) https://github.com/freeload101/SCRIPTS/blob/master/MISC/Fabric_Prompts_Open_WebUI_OpenWebUI_20241112.json

Example AT windows task startup script to make it start and not

die on boot https://github.com/freeload101/SCRIPTS/blob/master/MISC/StartKokoro.xml

Open WebUI RAG fail sause ... https://youtu.be/CfnLrTcnPtY

Open registration

Model list / order

NAME ID SIZE MODIFIED

hf.co/mradermacher/L3-8B-Stheno-v3.2-i1-GGUF:Q4_K_S 017d7a278e7e 4.7 GB 2 days ago

qwen2.5:32b 9f13ba1299af 19 GB 3 days ago

deepsex:latest c83a52741a8a 20 GB 3 days ago

HammerAI/openhermes-2.5-mistral:latest d98003b83e17 4.4 GB 2 weeks ago

Sweaterdog/Andy-3.5:latest d3d9dc04b65a 4.7 GB 2 weeks ago

nomic-embed-text:latest 0a109f422b47 274 MB 2 weeks ago

deepseek-r1:32b 38056bbcbb2d 19 GB 4 weeks ago

psyfighter2:latest c1b3d5e5be73 7.9 GB 2 months ago

CognitiveComputations/dolphin-llama3.1:latest ed9503dedda9 4.7 GB 2 months ago

Disable Arena models

Documents WIP RAG is not good .

Discord notes;

https://discord.com/channels/1170866489302188073/1340112218808909875

Abhi Chaturvedi: @(Operat0r) try this To reduce latency

and improve accuracy, modify the .env file: Enable RAG

ENABLE_RAG=true

Use Hybrid Mode (Retrieval + Reranking for better context)

RAG_MODE=hybrid

Reduce the number of retrieved documents (default: 5)

RETRIEVAL_TOP_K=3

Use a Fast Embedding Model (instead of OpenAI's Ada-002)

EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster and lightweight .

Optimize the Vector Database VECTOR_DB_TYPE=chroma

CHROMA_DB_IMPL=hnsw # Faster search

CHROMA_DB_PATH=/root/open-webui/backend/data/vector_db.

Optimize Backend Performance # Increase Uvicorn worker count

(improves concurrency) UVICORN_WORKERS=4

Increase FastAPI request timeout (prevents RAG failures)

FASTAPI_TIMEOUT=60

Optimize database connection pool (for better query

performance)

SQLALCHEMY_POOL_SIZE=10

So probably the first thing to do is increase the top K value

in admin -> settings -> documents, or you could try the

new "full context mode" for rag documents. You may also need

to increase the context size on the model, but it will make it

slower, so you probably don't want to do that unless you start

seeing the "truncating input" warnings.

@JamesK

So probably the first thing to do is increase the top K value

in admin -> settings -> documents, or you could try the

new "full context mode" for rag documents. You may also need

to increase the context size on the model, but it will make it

slower, so you probably don't want to do that unless you start

seeing the "truncating input" warnings.

JamesK: Ah, I see. The rag didn't work great for you in

this prompt. There are three hits and the first two are

duplicates, so there isn't much data for the model to work

with

[9:12 PM] JamesK: context section

I see a message warning that you are using the default 2048

context length, but not the message saying you've hit that

limit (from my logs the warning looks like

level=WARN source=runner.go:126 msg="truncating input prompt"

limit=32768 prompt=33434 numKeep=5

[6:06 AM] JamesK: If you set the env var OLLAMA_DEBUG=1

before running ollama serve it will dump the full prompt being

sent to the model, that should let you confirm what the rag

has put in the prompt

JamesK: Watch the console output from ollama and check for

warnings about overflowing the context. If you have the

default 2k context you may need to increase it until the

warnings go away

[8:58 PM] JamesK: But also, if you're using the default

rag, it chunks the input into small fragments, then matches

the fragments against your prompt and only inserts a few

fragments into the context, not the entire document. So it's

easily possible for the information you want to not be

present.

Auto updates

echo '0,12 */4 * * * docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui' >> /etc/crontab

red note for API keys

Go to Google Developers, use

Programmable Search Engine

, and log on or create account.

Go to

control panel

and click

Add

button

Enter a search engine name, set the other properties to suit

your needs, verify you're not a robot and click

Create

button.

Generate

API key

and get the

Search engine ID

. (Available after the engine is created)

With

API key

and

Search engine ID

, open

Open WebUI Admin panel

and click

Settings

tab, and then click

Web Search

Enable

Web search

and Set

Web Search Engine

google_pse

Fill

Google PSE API Key

with the

API key

and

Google PSE Engine Id

(# 4)

Click

Save

Note

You have to enable

Web search

in the prompt field, using plus (

) button. Search the web ;-)

Kokoro / Open Webui

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

https://github.com/remsky/Kokoro-FastAPI?tab=readme-ov-file

apt update

apt upgrade

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

apt install docker.io -y

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2

http://localhost:8880/v1

af_bella

Import fabric prompts

https://raw.githubusercontent.com/freeload101/Python/46317dee34ebb83b01c800ce70b0506352ae2f3c/Fabric_Prompts_Open_WebUI_OpenWebUI.py

Provide feedback on this episode.

...more

More shows like Hacker Public Radio

View all

Security Now (Audio)

1,971 Listeners

Off The Hook

117 Listeners

No Agenda Show

5,935 Listeners

The Changelog: Software Development, Open Source

283 Listeners

LINUX Unplugged

265 Listeners

BSD Now

89 Listeners

Open Source Security

43 Listeners

Late Night Linux

154 Listeners

The Linux Cast

35 Listeners

Darknet Diaries

7,864 Listeners

This Week in Linux

36 Listeners

Linux Dev Time

21 Listeners

Hacking Humans

314 Listeners

2.5 Admins

92 Listeners

Linux Matters

20 Listeners

Share HPR4337: Open Web UI

Sign up to save your podcasts

HPR4337: Open Web UI

HPR4337: Open Web UI

More shows like Hacker Public Radio

Security Now (Audio)

Off The Hook

No Agenda Show

The Changelog: Software Development, Open Source

LINUX Unplugged

BSD Now

Open Source Security

Late Night Linux

The Linux Cast

Darknet Diaries

This Week in Linux

Linux Dev Time

Hacking Humans

2.5 Admins

Linux Matters