Features¶

The feature set covers the ingestion of texts and documents in multiple input formats, their translation with selectable style and address form, quality-assurance mechanisms for structure and terminology, and export into several output formats. Operation takes place via a two-column web interface with original and translation side by side.

Use cases¶

Translation of academic texts — Longer essays, reports or manuscripts are rendered into another language in academic style; tables, footnotes and structural elements are preserved.
Administrative documents and business correspondence — Letters, policies or forms from administrative contexts can be translated into German in the formal Sie form, or from German into other languages.
Translation into German for international students — Materials in English, Spanish or French can be rendered into German, either in the formal Sie form (for official documents) or in the informal du form (for tutorials and informal communication).
Technical documentation — Manuals and technical descriptions are translated in the "technical" mode; established English specialist terms can be retained unchanged on request, and code blocks are preserved structurally.
Translation into plain language — Complex source texts can be simplified during translation, so that content is available in a more accessible form.
Ad-hoc text translations — Shorter texts can be entered directly into the input field, without the need to create a file.

At a glance¶

Input formats: PDF, DOCX, TXT, Markdown, and direct text input
Export formats: Word (.docx), Markdown (.md), HTML
Twelve supported languages plus automatic language detection
Two translation modes: style-preserving or style-adapting with five target styles
Formal/informal (Sie/du) control for German target translations
Glossary support for user-defined term pairs
Structure preservation for tables, code blocks, lists and headings
Side-by-side view of original and translation with progress indicator

Input and output¶

Inputs are provided either through a file upload (drag-and-drop) or via a text field. The following input formats are supported:

PDF: Processed via pymupdf4llm, which produces an LLM-oriented Markdown extraction; if the library is not available, the application falls back to PyMuPDF. Tables, headings and lists are recognised during conversion.
DOCX (Word): Structured extraction via python-docx, including paragraph formatting, tables, lists and headings.
Markdown (.md) and plain text (.txt): Used as the Markdown source without further conversion; structural elements are derived directly from the Markdown markup.
Direct text input: Texts can be entered into a text field in the user interface and follow the same processing path as file inputs.

The following formats are available on the output side:

Word (.docx): Structured Word file with a configurable default font and the document structure preserved (headings, lists, tables).
Markdown (.md): Pure Markdown output, optionally with metadata about the translation run.
HTML: Output as an HTML document; usable as a print- and PDF-ready alternative.

LLM connection¶

The translation service is accessed via an LLM endpoint using the OpenAI Chat Completions format. The application is therefore compatible with locally or centrally hosted model servers — for example Ollama, vLLM, LM Studio, Text Generation Web UI, and the OpenAI API itself. Endpoint URL, model name, temperature, token limits, timeout and retry behaviour are freely configurable.

Translation modes and style control¶

The application offers two basic modes:

Direct translation: The style, tone and level of formality of the original are carried over to the target language as far as possible.
Style-adapted translation: The target text is additionally adapted to a style chosen by the user. The available styles are academic, professional (Sie), professional (du), plain language and technical.

For German as the target language, the chosen style also affects the address form (Sie or du). The underlying translation prompt is stored as a text file and contains detailed instructions on accuracy, structure preservation, address form, cultural adaptation, and the handling of proper names, URLs, numbers and acronyms.

Quality assurance¶

Several mechanisms safeguard the quality and completeness of long translations:

Token-aware chunking: Texts are split into sections whose length is calibrated to the language-specific characters-per-token ratio. Split points follow sentence and paragraph boundaries so that sentences are not cut.
Table-aware handling: Markdown tables are recognised as cohesive units; their structure and column count are validated before and after translation. Headers and data rows are translated separately and then reassembled faithfully.
Code-block preservation: Code blocks are recognised as such and treated separately during translation; only comments and identifiers are translated where appropriate, while the code itself remains unchanged.
Context propagation between sections: When translating a section, the LLM receives summaries of the most recently translated sections as well as the current heading as context. This keeps terminology, references and style consistent across the entire document.
Glossary application: Stored term pairs are passed to the LLM with every translation request and applied consistently throughout.
Retry logic with backoff: If a single translation request fails, it is retried with exponentially increasing delays before the overall run is aborted. Partially successful runs return as much of the translation as possible.
Rate limiting: A token-bucket mechanism limits the number of concurrent requests to the LLM endpoint, avoiding overload and throttling responses.
Validation of the Markdown output: The reassembled translation is checked for consistent structure before it is exported.
Progress indicator and status feedback: During the run, the current position, the type of section being processed (text, table, code block) and an estimated remaining time are displayed.

User interface¶

The user interface uses a two-column layout, with the original view on the left and the translation view on the right. The language selection includes a swap function that exchanges source and target language with a single click. The selected languages, mode and style are mirrored in URL parameters, so a configuration can be shared via a link or saved as a bookmark.