Skip to content

Talk to Your Documents

Talk to Your Documents is a locally deployable web application for querying multiple documents at once in natural language. Instead of relying on vector search and chunking, the application passes the full document content to a language model with a large context window (up to 250,000 tokens). An upstream processing pipeline cleans, deduplicates, and structures the content; a downstream reference system tags every paragraph with a unique source identifier.

At a glance

  • Query up to 25 documents of various formats simultaneously, with no manual preparation required
  • Trace specific statements back to their sources — answers include paragraph and page references
  • Compare content across multiple documents and check for inconsistencies
  • Work securely — documents and chat histories remain only in session memory and are discarded once the session ends
  • Export chat sessions as a formatted Word document
  • Monitor context utilisation and token consumption at any time
  • Choose the language model endpoint freely (commercial API or locally hosted model)

Highlights

Compared to a direct prompt to a language model or a simple script, the application addresses the factors that determine the quality of document-based queries in practice: completeness of the captured content, cleanness of the material handed to the model, and traceability of the responses.

  • Long context instead of RAG. Rather than splitting documents into chunks and applying semantic pre-selection (retrieval), the full document content is passed to the language model in a context window of up to 250,000 tokens. This eliminates the risk of relevant passages being filtered out by an upstream selection step.
  • Multi-stage processing pipeline. Before being handed to the model, every document is run through a chain of extraction, removal of headers, footers, page numbers and watermarks, exact and fuzzy deduplication, and structure-preserving Markdown conversion. This reduces noise in the context and improves the quality of responses.
  • Traceable source references. Every sufficiently long paragraph is assigned a unique reference ID ([P1], [P2], …). The model is instructed to include these markers in its responses; the application then resolves them to text passages with the document name and page number.
  • Multi-document processing. Up to 25 documents are managed within a single session. Their content can be analysed jointly and inconsistencies between documents can be uncovered explicitly.
  • Broad format coverage. A unified extraction layer handles PDF (including interactive form fields), Word, Excel, PowerPoint, plain text, Markdown, HTML, CSV, and further formats. Manual pre-conversion is not required.
  • Privacy by sessionless processing. Uploaded documents and chat histories are kept only in the memory of the active session and discarded when it ends. No data remains on the server; persistent copies are created only through the user-initiated Word export.
  • Pluggable language model backend. Any inference endpoint compatible with the OpenAI API can be used — commercial providers as well as locally hosted models (e.g. via Ollama or vLLM). The selection is made through configuration parameters and requires no code changes.
  • Streaming and cancellation. Responses are streamed token by token and can be cancelled manually during generation, which is particularly useful for long answers.
  • Context monitoring. Token consumption and context-window utilisation are visible in the interface at all times; further uploads are blocked rule-based once the limit is reached.