Having a model running on-premise is only half the job. The other half — the one that decides whether the AI actually gets used — is the UI and integration. In practice there are two entry points: (1) a chat UI where staff ask directly (most commonly Open WebUI), and (2) an OpenAI-compatible API to wire the AI into your existing helpdesk, CRM and internal software. Part 6/8 walks through building both, plus SSO/RBAC authentication and a few real integration scenarios — all still inside your infrastructure boundary.
Quick summary
- Chat UI: Open WebUI gives staff a ChatGPT-like experience — conversation history, model picker, sharing — running internally, connected to the serving layer from the Serving article.
- API integration: an OpenAI-compatible endpoint (
/v1/chat/completions) so code in your helpdesk/CRM calls the AI like it would call OpenAI, but the data stays on-site. - Auth & permissions: SSO for single sign-on, RBAC to control who can see/query which model or documents, scoped by department.
- Scenarios: a process-lookup assistant, email/document summarization, department document Q&A — most of which need a RAG layer behind them.
- Principle: integration is where AI meets real workflows — start from one clear flow, measure, then expand.
A chat UI for staff — Open WebUI
For most employees, "internal AI" is the chat box they type questions into. Open WebUI is the most popular open-source interface for this: a near-ChatGPT experience that runs entirely on your own servers, connected directly to the serving layer (Ollama, vLLM…) you set up in the Serving article.
What staff get out of the box:
- Conversation history: each person has their own chat threads, saved for reference — still inside your infrastructure.
- Model picker: switch between the models being served (e.g. a fast model for short Q&A, a larger one for harder tasks).
- Sharing & collaboration: share a useful conversation with a colleague, or create shared prompt templates for the whole team.
- Attachments & document Q&A: with RAG enabled, users can upload documents and ask about their contents.
Open WebUI is typically deployed via Docker behind an internal reverse proxy. Because it talks to the backend over the OpenAI-compatible API standard, you can point it at any serving layer — which is also the bridge to the API integration below.
Integration via API — an OpenAI-compatible endpoint
The chat UI serves humans; the API serves software. The key thing that makes integrating internal AI easy is that most open-source serving tools expose an endpoint compatible with OpenAI's API — the same path (/v1/chat/completions), the same request/response format. That means any code that already calls OpenAI can switch to calling your internal AI just by changing the base URL and API key, with almost no change to the logic.
A few common embedding scenarios:
- Helpdesk / ticketing: when a new ticket is created, call the API to suggest a category, a summary, or a draft reply for the support agent.
- CRM: a "summarize customer history" or "draft a follow-up email" button right inside the customer record screen.
- Internal software: add a chat widget to your ERP/portal, calling the API with the user's context and (via RAG) department documents.
- Background automation: scheduled jobs that summarize reports, extract data from text, or classify content — no human sitting in a chat needed.
Because the endpoint follows the OpenAI standard, the official SDKs and libraries (Python, Node.js…) are all reusable — you only reconfigure the connection point. That's why integrating internal AI is often much faster than expected.
Authentication & permissions
Once internal AI has many users and touches sensitive documents, authentication and permissions stop being optional. Three layers to set up from the start:
- SSO (single sign-on): let staff use their existing company account (via OIDC/SAML or LDAP) instead of a separate password — convenient, and easy to revoke when someone leaves.
- RBAC (role-based access control): define who can see/query what — which model, which documents, which features. For example, only HR can reach the internal-policy Q&A assistant.
- Department scoping: combine RBAC with data partitioning at the RAG layer — each department only retrieves from its own document store, preventing cross-leaks.
Open WebUI supports multiple users, roles and OAuth/OIDC integration; for API integrations, permissions are usually enforced at a gateway layer before the request reaches serving. Blocking outbound connections and other defense layers are covered in the Internal AI security system article.
A few real integration scenarios
Integration pays off when it's tied to a specific job. Three scenarios teams usually roll out first:
- Process/policy lookup assistant: staff ask "how does the leave-request process work?" and get an answer with citations from internal documents — instead of digging through a handbook or asking a colleague.
- Email/document summarization: a "summarize" button right inside the inbox or document store, condensing long text into key points for a fast read.
- Department document Q&A: each department gets a "Q&A corner" over its own document store (contracts, technical docs, operating guides), scoped so data doesn't mix between departments.
The common thread: almost every useful scenario needs a RAG layer behind it so the AI answers from your real documents instead of making things up. The quality of those answers is then measured and tightened with evals/guardrails in the Evaluation & tuning article.
Because the serving layer exposes an OpenAI-compatible endpoint, a quick test from the command line is a single request:
# call the internal AI's OpenAI-compatible endpoint directly
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INTERNAL_AI_KEY" \
-d '{
"model": "qwen2.5:7b",
"messages": [{"role": "user", "content": "Summarize the leave-request process"}]
}'
Embedding in an internal app (pseudo-code, using the OpenAI SDK pointed at your server):
# only change base URL + key — logic stays the same
client = OpenAI(
base_url="http://internal-ai.local/v1", # on-premise endpoint
api_key=INTERNAL_AI_KEY,
)
resp = client.chat.completions.create(
model="qwen2.5:7b",
messages=[{"role": "user", "content": user_question}],
)
# wire resp into your helpdesk/CRM; add RAG upstream for citations
The Namtech view
Namtech treats UI and integration as the step that "puts AI in real users' hands". We typically start with Open WebUI for a pilot group so staff get comfortable, while exposing an OpenAI-compatible endpoint so the internal dev team can wire AI into existing software without learning a new API. SSO and RBAC are enabled from day one — because with internal AI, convenience and access control have to go together. The principle stays the same: start from one clear flow, measure the impact, then scale across the organization.
Frequently asked questions
Why use an "OpenAI-compatible" endpoint instead of a custom API?
Because it lets you reuse all your existing SDKs, libraries and code written for OpenAI — you only change the base URL and API key to run against internal AI. That drastically cuts integration effort and avoids hard lock-in to any single provider.
Is Open WebUI mandatory, or can I build my own UI?
Not mandatory. Open WebUI is a fast, full-featured option (history, model picker, sharing, users/permissions). If you need a branded UI or deep embedding into an internal portal, you can build your own app and call the same API endpoint.
Does integrating AI into a helpdesk/CRM make data leave the organization?
No, if deployed correctly: requests go to your on-premise endpoint instead of a public API, and the inference path has no outbound connections. Defense layers are detailed in the Security article.
How do I stop one department from reading another department's documents?
Combine RBAC (who can access which feature/model) with data partitioning at the RAG layer (each department only searches its own document store). Configured correctly, a department's question is only answered from documents that department is allowed to see.
Want internal AI without starting from zero?
Namtech deploys private internal AI platforms — open-source models running 100% on your own infrastructure, data never leaving the organization.
Book a free consultationNote: This is a general guide, updated 02/07/2026; tools and models change fast — verify the latest versions when you deploy.