LLM-Chat¶

LLM-Chat is a web-based application for the dialog-based use of large language models. It combines free-text chat, document analysis, and image processing in a single interface and connects to a language-model server through an OpenAI-compatible endpoint. A central technical feature is its model-profile system, through which sampling behavior and reasoning control are adapted to the language-model family in use.

At a glance¶

Address general questions in dialog, generate texts, translate, or revise them.
Load documents into the conversation context and keep them available across multiple sessions.
Attach images, diagrams, or screenshots directly to a message for visual analysis.
Run several parallel chat sessions with separate histories and switch between them by topic.
Export the chat history as a formatted Word document.
Choose the system prompt from predefined role profiles or edit it freely.
Apply prepared prompt templates for recurring tasks (summary, analysis, translation, comparison) with one click.

Highlights¶

Compared to a direct call to a language model or a minimal chat interface, LLM-Chat adds mechanisms for processing larger sets of documents, for the uniform handling of mixed inputs, and for adaptation to different model families — with the goal of producing reproducible and context-appropriate answers.

Model profile system: Sampling parameters, reasoning control, and vLLM-specific extra parameters are stored as profiles per model family. Profile selection happens explicitly through a configuration variable or automatically based on the model name; switching models requires no code changes.
Pre-configured model profiles: Profiles exist for the model families Qwen3, Qwen3.5, Kimi, and GLM; further models are supported through a default profile. Each profile distinguishes between a reasoning mode and an instruct mode with its own parameter set.
Connection to an LLM endpoint: The connection uses the OpenAI Chat Completions API to a language-model server addressed via configuration. External services, tracking providers, or font hosts are not contacted.
Unified upload mechanism: Images and documents are accepted through the same input channel and handled automatically by file type — images as visual content of the respective message, documents as a persistent part of the session context.
Multi-stage document extraction: For each format, a lightweight, format-specific extractor is tried first; only on failure does processing fall back to a universal layout parser. This noticeably reduces processing time, particularly for larger sets of documents.
Global document context: Loaded documents remain available across all chat sessions and are removed neither when switching sessions nor when starting a new session.
Token budget management: A configurable token budget is reserved for documents; usage is displayed continuously, and uploads exceeding the limit are rejected in a controlled way without corrupting the chat state.
Reasoning filtering during streaming: When <think> markers appear in the model output, those segments are filtered transparently and replaced with a compact status indicator; the final answer remains free of internal intermediate steps.
Word export with markdown rendering: The chat history can be exported as a Word document that preserves headings, lists, tables, code blocks, blockquotes, and inline formatting from the model's responses.
Local processing: All data is processed on the application's own server; the interface does not load external resources such as CDNs or font hosts.