Architecture¶

The architecture follows a layered structure with clearly separated responsibilities for the user interface, document processing, chunking, translation engine and LLM connection. Processing is fully asynchronous; individual translation requests are sent to the LLM endpoint in parallel and coordinated through a rate limiter. A defining trait is the unified pipeline: regardless of the input format, a conversion layer first converts the content into a structured Markdown representation, so that all downstream steps work on the same representation.

At a glance¶

Layered architecture: user interface → document processing → chunking → translation engine → LLM client
Unified Markdown intermediate representation for all input formats
Asynchronous execution via asyncio with a configurable number of workers
Rate limiter (token bucket) to coordinate requests to the LLM endpoint
Table- and code-block-aware chunking with boundary detection at sentence and paragraph borders
Context propagation between consecutive chunk translations
YAML-based configuration with Pydantic validation; standalone operation as a single Python process

Components and workflow¶

Layers and their roles¶

The application consists of the following main layers:

User interface (src/ui, src/app.py): Gradio-based web interface with input, output and control areas, complemented by export handlers and URL parameter tracking.
Document processing (src/document_processing): Format-specific processors (PDF, Word, text/Markdown) and a Markdown converter that maps the various inputs into a uniform Markdown representation.
Chunking (src/chunking): Token-based segmentation with boundary detection, table awareness and code-block detection.
Translation engine (src/translation): Table-aware engine with parallel execution, retry logic and per-section context management.
LLM client (src/llm_client.py): Asynchronous HTTP client for OpenAI-compatible endpoints with thread-safe session management and exponential backoff.
Configuration (src/config.py, config/): Pydantic-validated settings loaded from YAML files for LLM, chunking, parallelism, languages and export.

Data flow¶

flowchart TB
    User["User"] --> UI["User interface<br/>(Gradio)"]

    UI --> Dispatch{"Input?"}
    Dispatch -->|"File"| DocProc["Document processing"]
    Dispatch -->|"Text"| Direct["Direct ingestion"]

    DocProc --> PDF["PDF processor<br/>(pymupdf4llm / PyMuPDF)"]
    DocProc --> DOCX["Word processor<br/>(python-docx)"]
    DocProc --> MDTXT["Markdown / text<br/>processor"]

    PDF --> MDConv["Markdown converter"]
    DOCX --> MDConv
    MDTXT --> MDConv
    Direct --> MDConv

    MDConv --> Chunker["Chunker<br/>(token- and table-aware)"]

    Chunker --> Engine["Translation engine"]
    Engine --> Context["Context management<br/>(glossary, prior context, style)"]
    Engine --> RateLimiter["Rate limiter"]
    RateLimiter --> LLMClient["LLM client"]
    LLMClient --> LLM["OpenAI-compatible<br/>LLM endpoint"]
    LLM --> LLMClient
    LLMClient --> Engine

    Engine --> Reassembly["Reassembly<br/>(structure preservation, table reconstruction)"]
    Reassembly --> Export["Export handler"]
    Export --> Word["Word"]
    Export --> MD["Markdown"]
    Export --> HTML["HTML"]
    Word --> User
    MD --> User
    HTML --> User

    Config["YAML configuration"] -.-> Engine
    Config -.-> Chunker
    Config -.-> LLMClient

Workflow explanation¶

The user supplies inputs either as a file or as text. For file inputs, a factory selects the appropriate processor (PDF, DOCX or Markdown/text) based on the file extension. The processor produces a Markdown representation of the document; PDF inputs are preferentially processed via pymupdf4llm, which yields LLM-oriented Markdown output. Text inputs go directly into the Markdown converter.

The Markdown representation is then passed to the chunker, which splits the content into token-based sections. Tables are treated as cohesive units; boundary detection ensures that split points follow sentence and paragraph borders. The result is a list of typed chunks (text, table, code block) with token counts and metadata.

The translation engine orchestrates parallel translation of the chunks. Each chunk is paired with its own context object containing the glossary, the style, the address form, the current heading and summaries of preceding chunks. The requests are paced through a token-bucket rate limiter and sent by the LLM client to the OpenAI-compatible endpoint. Table chunks follow a specialised path: headers and data rows are translated separately and then reassembled faithfully. If a request fails, it is retried with exponential backoff.

Once all chunks have been translated, the engine reassembles them in their original order and passes the result to the export layer. The export layer renders the chosen output format (Word, Markdown or HTML) and provides it for download.

Concurrency and robustness¶

The pipeline uses a single event loop per server process. The LLM client manages a pool of HTTP sessions with configurable size and keepalive duration and monitors the availability of the event loop. Reusing sessions reduces connection overhead. A rate limiter caps the number of requests per second to the LLM endpoint. In case of errors (timeouts, transient connection drops, rate-limit responses), retry logic with exponential backoff takes effect. If a chunk fails permanently, it is marked as "failed"; the remaining chunks are still returned, so that as much of the translation as possible is preserved.

Configuration and operation¶

All settings — LLM endpoint, model, token limits, chunk sizes, worker count, rate limit, languages, available styles, logging and export options — are stored in YAML files (config/config.yaml, config/languages.yaml) and validated at runtime via Pydantic models. Environment variables can override individual values. The application is started as an ordinary Python process (python main.py); the Gradio user interface is then available on a configurable port.

Technology overview¶

User interface: Gradio (≥ 6.0)
Asynchronous HTTP client: aiohttp
PDF processing: pymupdf4llm (preferred), PyMuPDF (fallback)
Word processing: python-docx
Markdown and HTML processing: markdown2, beautifulsoup4, lxml
Configuration and validation: Pydantic (≥ 2.0), pyyaml
Retry logic: tenacity
Token counting: tiktoken
Language detection: langdetect
System monitoring: psutil
Language and runtime: Python (≥ 3.8), asyncio-based