User documentation: Document Translation System#

Purpose#

The Document Translation System is a web-based tool for translating texts and documents. It uses locally operated large language models via standardised interfaces and enables the processing of extensive documents while retaining their original structure and formatting.

The basic principle is based on a multi-stage processing process: documents are first converted into a uniform format, intelligently divided into translatable segments, translated in parallel and then merged again. This ensures both context preservation and high processing speed.

Range of functions#

The system offers the following core functions:

  • Text translation: Direct translation of entered text with configurable translation styles (professional, academic, simple, technical)
  • Document processing: Processing of structured documents in .pdf, .docx, .txt and .md formats with automatic format recognition
  • Automatic language recognition: Identification of the source language without manual selection
  • Glossary management: Definition of consistent translations for technical terminology
  • Intelligent chunking: Division of large documents into context-preserving translation units
  • Parallel processing: Simultaneous translation of multiple segments for faster processing
  • Comparison view: Side-by-side display of original and translation with statistics
  • Multi-format export: Output in Markdown, Word, HTML, PDF or JSON with optional metadata
  • Progress tracking: Real-time display of processing status

Operation#

System requirements and start#

Before using for the first time, ensure that an OpenAI-compatible LLM server is running. Ollama with a suitable model is recommended. After configuring the connection parameters in config/config.yaml, start the application with python -m src.app and open the web interface at http://localhost:7860.

Translation process#

Step 1: Connection check Click on ‘Test Connection’ to verify the connection to the LLM server. A successful connection is a prerequisite for all translation processes.

Step 2: Prepare input Choose between text input or document upload. For documents, activate the relevant processing options (table detection, heading detection, formatting preservation). Click on ‘Process Input’ to prepare the document.

Step 3: Set translation parameters Select the source and target languages. ‘Auto-detect’ is available for the source language. Define the desired translation style and, if necessary, add glossary entries in the format source term, target term.

Step 4: Perform translation Click on ‘Translate’. For large documents, progress is displayed in real time. The translation appears in the output field on the right.

Step 5: Export Switch to the Export tab, select the desired output format and optional metadata. Download the finished file.

Important controls#

  • Input Type Toggle: Switch between text and document mode
  • Processing Options: Control document analysis (only for document uploads)
  • Style Selector: Determine the translation style
  • Advanced Settings: Adjust chunk size and parallelisation
  • Comparison Tab: Analyse translation quality through direct comparison

Special notes#

The system processes documents up to 20 MB in size. Legacy formats such as .doc must be converted to .docx before uploading. Translations are token-based with configurable limits (default: 2000 tokens per chunk). The processing speed depends on the LLM performance and the selected parallelisation.

Application example#

Initial situation: You have a 15-page technical report in English in .docx format and need a German translation that retains all formatting, tables and technical terms.

Procedure:

  1. After successfully checking the connection, select ‘Document’ as the input type.
  2. Upload the file and activate ‘Extract Tables’, ‘Detect Headers’ and ‘Preserve Formatting’.
  3. After processing, select ‘English’ as the source language and “German” as the target language.
  4. Select the ‘Technical’ style for a professional translation.
  5. Add project-specific terminology as a glossary (e.g. ‘Machine Learning,Maschinelles Lernen’).
  6. Start the translation and monitor the progress (approx. 47 chunks for 15 pages).
  7. Check the translation quality in the Comparison tab by comparing critical sections.
  8. Export the result as a Word document with metadata

Result: You will receive a German document with identical formatting and consistently translated technical terminology in approximately 3-5 minutes processing time.

Recommendations for efficient use#

  • Use glossaries consistently for technical or subject-specific texts to ensure terminological consistency
  • Select the translation style appropriate for the document type: ‘Academic’ for scientific papers, ‘Professional’ for business communication
  • Increase the chunk size for documents with high contextual dependency (e.g. philosophical texts)
  • Enable all processing options for structured documents with tables and lists
  • Use the comparison view for quality control in critical translations
  • Increase parallel processing for large documents to a maximum of 10 workers (depending on hardware)
  • Export to JSON for further automated processing of translation data
  • Check the LLM connection regularly during longer work sessions

System limitations#

The Document Translation System cannot perform the following tasks:

  • Translation of image-based PDFs without OCR pre-processing
  • Processing of password-protected documents
  • Translation of content in images or scanned documents
  • Preservation of complex layout elements such as multi-column typesetting or text frames
  • Real-time translation or streaming translation during input
  • Cloud-based translation (requires local LLM server)
  • Automatic quality assessment or correction suggestions
  • Translation of programme code while preserving functionality
  • Processing of documents larger than 20 MB

There are technical limitations regarding the maximum number of tokens per LLM request. The translation quality is directly dependent on the performance of the language model used.

Summary#

The Document Translation System is a professional tool for the structure-preserving translation of large documents using locally operated language models. It is suitable for users who want to retain control over their data while benefiting from advanced translation features such as intelligent chunking and glossary management.

Your role as a user is to configure the translation parameters appropriately, provide relevant glossaries and quality control the results. The system supports you with automated processing and parallel translation, but the final responsibility for the accuracy of the content remains with you.