LLM-Chat¶
LLM-Chat is a web-based application for the dialog-based use of large language models. It combines free-text chat, document analysis, and image processing in a single interface and connects to a language-model server through an OpenAI-compatible endpoint. A central technical feature is its model-profile system, through which sampling behavior and reasoning control are adapted to the language-model family in use.
At a glance¶
- Address general questions in dialog, generate texts, translate, or revise them.
- Load documents into the conversation context and keep them available across multiple sessions.
- Attach images, diagrams, or screenshots directly to a message for visual analysis.
- Run several parallel chat sessions with separate histories and switch between them by topic.
- Export the chat history as a formatted Word document.
- Choose the system prompt from predefined role profiles or edit it freely.
- Apply prepared prompt templates for recurring tasks (summary, analysis, translation, comparison) with one click.
Highlights¶
Compared to a direct call to a language model or a minimal chat interface, LLM-Chat adds mechanisms for processing larger sets of documents, for the uniform handling of mixed inputs, and for adaptation to different model families — with the goal of producing reproducible and context-appropriate answers.
- Model profile system: Sampling parameters, reasoning control, and vLLM-specific extra parameters are stored as profiles per model family. Profile selection happens explicitly through a configuration variable or automatically based on the model name; switching models requires no code changes.
- Pre-configured model profiles: Profiles exist for the model families Qwen3, Qwen3.5, Kimi, and GLM; further models are supported through a default profile. Each profile distinguishes between a reasoning mode and an instruct mode with its own parameter set.
- Connection to an LLM endpoint: The connection uses the OpenAI Chat Completions API to a language-model server addressed via configuration. External services, tracking providers, or font hosts are not contacted.
- Unified upload mechanism: Images and documents are accepted through the same input channel and handled automatically by file type — images as visual content of the respective message, documents as a persistent part of the session context.
- Multi-stage document extraction: For each format, a lightweight, format-specific extractor is tried first; only on failure does processing fall back to a universal layout parser. This noticeably reduces processing time, particularly for larger sets of documents.
- Global document context: Loaded documents remain available across all chat sessions and are removed neither when switching sessions nor when starting a new session.
- Token budget management: A configurable token budget is reserved for documents; usage is displayed continuously, and uploads exceeding the limit are rejected in a controlled way without corrupting the chat state.
- Reasoning filtering during streaming: When
<think>markers appear in the model output, those segments are filtered transparently and replaced with a compact status indicator; the final answer remains free of internal intermediate steps. - Word export with markdown rendering: The chat history can be exported as a Word document that preserves headings, lists, tables, code blocks, blockquotes, and inline formatting from the model's responses.
- Local processing: All data is processed on the application's own server; the interface does not load external resources such as CDNs or font hosts.