Skip to content

LLM-Chat

LLM-Chat is a web-based application for the dialog-based use of large language models. It combines free-text chat, document analysis, and image processing in a single interface and connects to a language-model server through an OpenAI-compatible endpoint. A central technical feature is its model-profile system, through which sampling behavior and reasoning control are adapted to the language-model family in use.

At a glance

  • Address general questions in dialog, generate texts, translate, or revise them.
  • Load documents into the conversation context and keep them available across multiple sessions.
  • Attach images, diagrams, or screenshots directly to a message for visual analysis.
  • Run several parallel chat sessions with separate histories and switch between them by topic.
  • Export the chat history as a formatted Word document.
  • Choose the system prompt from predefined role profiles or edit it freely.
  • Apply prepared prompt templates for recurring tasks (summary, analysis, translation, comparison) with one click.

Highlights

Compared to a direct call to a language model or a minimal chat interface, LLM-Chat adds mechanisms for processing larger sets of documents, for the uniform handling of mixed inputs, and for adaptation to different model families — with the goal of producing reproducible and context-appropriate answers.

  • Model profile system: Sampling parameters, reasoning control, and vLLM-specific extra parameters are stored as profiles per model family. Profile selection happens explicitly through a configuration variable or automatically based on the model name; switching models requires no code changes.
  • Pre-configured model profiles: Profiles exist for the model families Qwen3, Qwen3.5, Kimi, and GLM; further models are supported through a default profile. Each profile distinguishes between a reasoning mode and an instruct mode with its own parameter set.
  • Connection to an LLM endpoint: The connection uses the OpenAI Chat Completions API to a language-model server addressed via configuration. External services, tracking providers, or font hosts are not contacted.
  • Unified upload mechanism: Images and documents are accepted through the same input channel and handled automatically by file type — images as visual content of the respective message, documents as a persistent part of the session context.
  • Multi-stage document extraction: For each format, a lightweight, format-specific extractor is tried first; only on failure does processing fall back to a universal layout parser. This noticeably reduces processing time, particularly for larger sets of documents.
  • Global document context: Loaded documents remain available across all chat sessions and are removed neither when switching sessions nor when starting a new session.
  • Token budget management: A configurable token budget is reserved for documents; usage is displayed continuously, and uploads exceeding the limit are rejected in a controlled way without corrupting the chat state.
  • Reasoning filtering during streaming: When <think> markers appear in the model output, those segments are filtered transparently and replaced with a compact status indicator; the final answer remains free of internal intermediate steps.
  • Word export with markdown rendering: The chat history can be exported as a Word document that preserves headings, lists, tables, code blocks, blockquotes, and inline formatting from the model's responses.
  • Local processing: All data is processed on the application's own server; the interface does not load external resources such as CDNs or font hosts.