Talk to Documents - User Documentation#

Purpose#

Talk to Documents is a local tool for dialogue-based document analysis. The tool allows you to ask questions about uploaded documents and receive precise answers with references.

The basic principle is based on the use of large language models with a context window of 250,000 tokens. Unlike conventional approaches, documents are not broken down into small fragments or stored in vector databases. Instead, the complete document content is provided directly to the language model, which preserves the overall context and enables more precise answers.

Range of functions#

The tool offers the following core functions:

Multi-document processing: Simultaneous analysis of up to 10 documents with automatic management of the available context
Format diversity: Support for PDF, Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt), text files (.txt, .md, .rst), HTML, CSV and RTF
Intelligent content preparation: Automatic removal of page numbers, headers and footers, and duplicate content
Source references: Each answer is provided with specific references to the corresponding text passages in the original documents
Streaming output: Answers are output in real time without having to wait for complete generation
Word export: Save the entire chat history, including source references, as a formatted Word document
Context monitoring: Clear display of token usage and remaining capacity

Operation#

Step 1: Upload documents#

Click on the ‘Upload documents’ area and select up to 10 files. The tool accepts the most common office document formats. Please note that the total size of all documents must not exceed the context limit of 250,000 tokens.

Step 2: Start processing#

After selecting the files, click on ‘Process documents’. The system will now extract the text content, clean it of redundant elements and add reference markers. In the ‘Context overview’ area, you will see:

A list of the processed documents with the respective number of tokens
The total number of tokens for all loaded documents
The percentage utilisation of the available context

Step 3: Ask questions#

Enter your question in the input field and click on ‘Send’ or press the Enter key. The answer is generated step by step and appears in the chat window.

Step 4: Use source references#

Activate the ‘With source references’ option in the settings to receive detailed references at the end of each answer. These show you:

Which document the information comes from
The specific text passage with context
The page number for available documents

Important controls#

Stop button: Interrupts the current response generation
Delete chat: Resets the conversation history but retains the loaded documents
Word export: Saves the entire chat as a .docx file
Copy button: Copies individual responses to the clipboard

Special notes#

The token limit of 250,000 includes both the documents and the conversation history. The limit may be reached for very large documents or long conversations. In this case, you will receive a warning. Reduce the number of documents or start a new session.

Scanned PDF documents without extractable text are currently not supported. Text recognition (OCR) is planned for a future version.

Application example#

Initial situation: You have received three scientific papers on a research topic and would like to compare the key findings.

Procedure:

You upload all three PDF files at the same time
After processing, you will see the following in the overview: Document A (45,000 tokens), Document B (38,000 tokens), Document C (52,000 tokens) – Total: 135,000 tokens (54% utilisation)
You ask the question: ‘What methods are used in the three studies and where are there methodological differences?’
The system generates a structured answer with references such as [P42], [P127], [P203]
In the references, you will see the exact passages from each document to which the statements refer
You export the entire chat as a Word document for your research documentation

Result: You receive a precise comparative analysis with traceable sources without having to work through the documents manually.

Recommendations for efficient use#

Formulate precise questions: The more specific your question, the more targeted the answer
Use the sample prompts: In the ‘Frequent prompts’ section, you will find tried-and-tested questions for various tasks.
Activate source references: This allows you to verify the information and increases traceability.
Be aware of the context limit: Only load the documents that are really relevant at first.
Structure complex analyses: For extensive tasks, ask several questions that build on each other instead of one very long query
Export important results: Save valuable analyses as Word documents
Use the stop button: If the answers are unsatisfactory, you can cancel the generation and rephrase the question

System limitations#

The tool is subject to the following restrictions:

Number of documents: A maximum of 10 documents can be processed at the same time
Context size: The total size of all documents and the conversation history is limited to 250,000 tokens
Scanned documents: PDF files without extractable text (pure image documents) cannot currently be processed
Session persistence: All data is lost when the browser is closed; there is no automatic saving
Language model dependency: The quality of the answers depends on the language model used
No real-time access: The tool cannot retrieve current information from the internet, but works exclusively with the uploaded documents
Local processing: All operations take place locally; no internet connection is required for operation (except for the LLM API)

Summary#

Talk to Documents transforms static documents into an interactive knowledge base. The tool combines the advantages of large context windows in modern language models with intelligent document preparation and precise source references.

For you as a user, this means that you retain control over the analysis process, can question connections and receive comprehensible answers. The tool does not replace your professional expertise, but supports you in efficiently extracting information from extensive document collections.