Features¶

Talk to Your Documents provides a browser-based workspace in which documents are uploaded, automatically prepared, and then queried via chat. The functional scope is organised around document processing, language-model integration with source references, session and export functions, and mechanisms for ensuring the quality of the returned responses.

Use cases¶

Structured summaries. A coherent summary with a clear structure is produced from one or more documents — at varying levels of detail on demand (bullet list, short overview, chapter-by-chapter description).
Extraction of structured content. Specific figures, data, facts, mathematical formulas, recommendations, or conclusions can be pulled from the documents directly, without manually searching through the original file.
Capture of references and citations. Bibliographic references, source citations, and cross-references within a document are listed in full — useful when preparing scholarly work.
Comparison of multiple documents. Content from different versions or thematically related documents is contrasted and checked for differences, contradictions, or inconsistencies.
Critical reflection. On demand, the model produces a critical examination of the content and identifies positive and negative aspects; multi-step approaches can be controlled via the corresponding prompts.
Detail research. Specific individual questions are answered directly from the document content, with a reference to the relevant passage.

At a glance¶

Processing of up to 25 documents simultaneously within the same session context
Support for common office document formats and structured text formats (PDF, Word, Excel, PowerPoint, plain text, Markdown, HTML, CSV, RTF)
Integration with OpenAI-API-compatible language models (external or locally hosted)
Source references with paragraph and page anchors, optional toggle
Streamed output with cancellation, real-time display of token consumption and context utilisation
Export of chat history as a formatted Word document
Configuration via CLI arguments, environment variables, or Docker Compose

Document processing¶

Uploaded documents are run through a unified processing pipeline before being passed to the language model.

PDF. Extraction of body text, headings, and page information. In addition, interactive form fields (AcroForm), which are typically ignored by standard PDF parsers, are read out. Page numbers are captured for later source references.
Word, Excel, PowerPoint. Reading of Word documents (.docx, .doc), Excel workbooks (.xlsx, .xls), and PowerPoint slides (.pptx, .ppt), including table and list structures.
Structured text formats. Direct processing of Markdown, HTML, plain text, RTF, CSV, and source code files.
Cleaning. Recurring headers, footers, page numbers, watermarks, and typographic separator lines are detected via rule-based patterns and frequency analysis, and removed.
Deduplication. Identical content is detected via hash comparison; similar content is detected via fuzzy matching with a configurable similarity threshold (default 0.95) and removed. In addition, structural repetitions (such as recurring table headers) are reduced.
Structural conversion. The cleaned content is converted to uniform Markdown, preserving tables, lists, headings, and code blocks where applicable.

Connectors and data sources¶

The application accesses only content that a user actively uploads. External services are contacted exclusively for language-model inference.

Local file upload. Upload through the web interface from the user's file system; up to 25 files per session. Supported extensions: .pdf, .docx, .doc, .xlsx, .xls, .pptx, .ppt, .txt, .md, .rst, .html, .htm, .csv, .rtf, .py.
Language-model backend (external). The application communicates with a configurable inference endpoint via the OpenAI API protocol. Both commercial providers such as the OpenAI API and locally hosted servers, e.g. Ollama or vLLM, can be used. Endpoint URL, model name, and API key are freely configurable.

Source references and traceability¶

Before being handed to the language model, every sufficiently long paragraph is assigned a unique reference ID of the form [P1], [P2], [P3] … These markers are inserted into the document text and passed to the model via a system prompt. With source display enabled, responses contain the references inline; the application then lists the cited paragraphs with a preview (up to 300 characters), document name, and — where available — page number. The source display can be toggled per request.

Session and context management¶

Token control. When a document is added, its token consumption is estimated and checked against the context limit (250,000 tokens by default, with a safety margin). Requests that would exceed the context window are rejected.
Session isolation. Each browser session receives its own UUID; documents and chat histories of different sessions are kept separate.
Sessionless storage. Uploaded content and chat histories are kept exclusively in the server's memory and discarded once the session ends. Persistence is created only through the explicitly user-initiated Word export.

Chat and output¶

Streaming. Responses are streamed token by token and updated continuously in the interface.
Cancellation. A response in progress can be cancelled at any time via a stop button.
Mathematical notation. LaTeX expressions are rendered inline and as blocks directly in the chat view.
Example prompts. A built-in collection of typical tasks (summarisation, critical reflection, version comparison, extraction of figures or formulas) eases initial use.
Word export. The full chat history of a session can be exported as a .docx file; source references are included.

Configuration¶

All settings — language-model endpoint, model name, token limits, maximum number of documents, cleaning and deduplication options, default for source display, server port, and sub-path deployment behind a reverse proxy — can be controlled via environment variables or command-line parameters.