Talk to Your Documents¶

Talk to Your Documents is a locally deployable web application for querying multiple documents at once in natural language. Instead of relying on vector search and chunking, the application passes the full document content to a language model with a large context window (up to 250,000 tokens). An upstream processing pipeline cleans, deduplicates, and structures the content; a downstream reference system tags every paragraph with a unique source identifier.

At a glance¶

Query up to 25 documents of various formats simultaneously, with no manual preparation required
Trace specific statements back to their sources — answers include paragraph and page references
Compare content across multiple documents and check for inconsistencies
Work securely — documents and chat histories remain only in session memory and are discarded once the session ends
Export chat sessions as a formatted Word document
Monitor context utilisation and token consumption at any time
Choose the language model endpoint freely (commercial API or locally hosted model)

Highlights¶

Compared to a direct prompt to a language model or a simple script, the application addresses the factors that determine the quality of document-based queries in practice: completeness of the captured content, cleanness of the material handed to the model, and traceability of the responses.

Long context instead of RAG. Rather than splitting documents into chunks and applying semantic pre-selection (retrieval), the full document content is passed to the language model in a context window of up to 250,000 tokens. This eliminates the risk of relevant passages being filtered out by an upstream selection step.
Multi-stage processing pipeline. Before being handed to the model, every document is run through a chain of extraction, removal of headers, footers, page numbers and watermarks, exact and fuzzy deduplication, and structure-preserving Markdown conversion. This reduces noise in the context and improves the quality of responses.
Traceable source references. Every sufficiently long paragraph is assigned a unique reference ID ([P1], [P2], …). The model is instructed to include these markers in its responses; the application then resolves them to text passages with the document name and page number.
Multi-document processing. Up to 25 documents are managed within a single session. Their content can be analysed jointly and inconsistencies between documents can be uncovered explicitly.
Broad format coverage. A unified extraction layer handles PDF (including interactive form fields), Word, Excel, PowerPoint, plain text, Markdown, HTML, CSV, and further formats. Manual pre-conversion is not required.
Privacy by sessionless processing. Uploaded documents and chat histories are kept only in the memory of the active session and discarded when it ends. No data remains on the server; persistent copies are created only through the user-initiated Word export.
Pluggable language model backend. Any inference endpoint compatible with the OpenAI API can be used — commercial providers as well as locally hosted models (e.g. via Ollama or vLLM). The selection is made through configuration parameters and requires no code changes.
Streaming and cancellation. Responses are streamed token by token and can be cancelled manually during generation, which is particularly useful for long answers.
Context monitoring. Token consumption and context-window utilisation are visible in the interface at all times; further uploads are blocked rule-based once the limit is reached.