Features¶

LLM-Chat provides a free-text chat with optional enrichment by documents and images. Inputs are accepted through a unified upload mechanism and processed automatically by file type. Loaded documents remain available across all sessions; images are sent as visual content of the respective message. Predefined role profiles, prompt templates, and a Word export round out the feature set.

Use cases¶

Summarize and critically reflect on documents: Longer texts are loaded and questioned in dialog about main statements, lines of argument, and gaps.
Extract data and references from documents: Specific numbers, facts, references, or citations are pulled from loaded documents and returned in structured form.
Explain images and diagrams: Diagrams, screenshots, or photographed blackboards are attached to a message and interpreted in the context of the question asked.
Translate and revise texts: Texts are translated between German and English or revised for style, grammar, and clarity.
Compare multiple documents: Several documents are loaded in parallel and used for structured comparisons — for instance, to contrast concepts, methods, or sources.
Continue research across sessions: Research can be split across several topical sessions while the document pool remains available across sessions.

At a glance¶

Connection to an OpenAI-compatible LLM endpoint with pre-configured profiles for Qwen3, Qwen3.5, Kimi, and GLM, plus a default profile for further models.
Support for 14 document formats (PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, HTML, HTM, TXT, MD, RST, CSV, RTF, PY) and 4 image formats (JPG, PNG, GIF, WEBP).
Streaming responses with reasoning-tag filtering and a stop function during generation.
Management of multiple chat sessions with auto-generated titles; loaded documents are available across sessions.
Four role profiles for the system prompt and 13 thematic prompt templates, of which four are document-specific.
Token-budget display, deduplication on document upload, and controlled rejection when limits are exceeded.
Word export of the chat history with full markdown and LaTeX rendering.

Chat and dialog control¶

The chat is the core of the application. Responses are streamed so that output becomes visible immediately. For models using <think> notation, internal reasoning segments are removed from the output and replaced during processing by a compact status indicator.

Streaming and stop function: Model responses are output token by token; an ongoing generation can be aborted at any time.
System prompt with role profiles: Four predefined profiles (factual assistant, tutor, inspirational, empty) are selectable and freely editable afterwards.
Prompt templates: Clickable templates for summarization, analysis, simple explanation, translation (DE/EN), style revision, comparison, brainstorming, and step-by-step instructions. When documents are loaded, additional document-specific templates appear (What is it about?, Numbers & facts, Critical reflection, Literature & sources).
Auto title generation: After the first message of a session, a short title in the form "Action word: Topic" is generated.
Markdown and LaTeX display: Responses are rendered with markdown formatting; LaTeX formulas are rendered inline ( $…$ ) and as blocks ($$…$$).
Date hint: The current date is automatically included as part of the system prompt.

Data connection¶

LLM-Chat connects to a single external data source: a language-model server.

OpenAI-compatible LLM endpoint: The application addresses a server reachable through the OpenAI Chat Completions API (such as a vLLM or compatible instance). Connection parameters such as base URL, API key, and model name are configurable through environment variables. Pre-configured model profiles exist for Qwen3, Qwen3.5, Kimi, and GLM; further models are supported through a default profile. Each profile maintains sampling parameters (temperature, top-p, top-k, min-p, penalty values) and model-specific reasoning switches (chat_template_kwargs), selected by model name or explicitly through a configuration variable.

Document processing¶

Documents are extracted to plain text on the server and added to the system prompt as a persistent part of the session context. Processing follows a two-stage strategy.

Fast extractors: For each format, a lightweight, format-specific extractor is tried first — pdfminer.six for PDF, python-docx for DOCX (including tables), python-pptx for PPTX (including slide structure and tables), openpyxl for XLSX, a dedicated HTML parser for HTML/HTM, and direct reading for plain text formats (TXT, MD, RST, PY, CSV, RTF).
Universal fallback: If the fast extractor fails or no fast extractor exists for the format (DOC, PPT, XLS), processing falls back to unstructured with format-specific partitioners. For PDF, configuration allows switching between a fast variant and a high-resolution layout analysis.
Cleanup: After extraction, repeated empty lines and whitespace are reduced, page numbers are optionally removed, and overly short paragraphs are filtered out; headings are recognized and kept.
Token estimation: For each document the token requirement is approximated (about three characters per token for German) and checked against the available context budget.

Image processing¶

Images are prepared for multimodal input to the language model.

Validation and conversion: Format and file size are checked (default limit 20 MB). EXIF orientation is corrected, and RGBA/LA/P modes are converted to RGB against a white background.
Scaling: Images are scaled to a maximum edge length when needed (default 2048 px).
Transmission: Images are passed as base64-encoded JPEG data inside the image_url structure expected by the OpenAI API.
Multiple images per message: Several images can be passed to the model together within a single message, for example for comparative analyses.

Session and context management¶

Multiple chat sessions: Up to 20 sessions are kept per browser tab and can be switched in the sidebar.
Global document context: Loaded documents are available regardless of the active session; switching sessions or starting a new chat does not remove them.
Per-tab state: Each browser tab gets its own application instance; reloading the page resets the state.
Persistent images in history: Processed images are cached for display in the chat history so they remain visible after a session is reopened.

Quality assurance¶

Deduplication: When a document with an identical filename is loaded again, it replaces the existing one; the token budget is adjusted accordingly.
Token-limit check: Before adding any document, the remaining budget is checked. If the limit would be exceeded, the document is rejected in a controlled way and the user is notified.
Document-count limit: A configurable upper limit (default 25 documents) prevents unlimited accumulation of context sources.
Reasoning filtering: <think> segments are reliably detected even during character-by-character streaming (including partial tag prefixes) and removed from the final output.
Robust encoding handling: For text files, multiple character sets (UTF-8, Latin-1, CP1252) are tried in order to avoid read errors.
Input validation against XSS: Data originating from the user (filenames, IDs) is HTML-encoded for display in the sidebar.
Meaningful error messages: Connection, model, or load errors of the language-model server are returned as clearly worded, German-language hints.

Export¶

Word export: The chat history of the current session is exported as a .docx file. The markdown renderer covers headings (H1–H4), bold/italic/strikethrough text, inline and block code, ordered and unordered lists with nesting, tables, blockquotes, horizontal rules, and hyperlinks.