Skip to content

Architecture

The architecture follows a layered structure with clearly separated responsibilities for the user interface, document processing, chunking, translation engine and LLM connection. Processing is fully asynchronous; individual translation requests are sent to the LLM endpoint in parallel and coordinated through a rate limiter. A defining trait is the unified pipeline: regardless of the input format, a conversion layer first converts the content into a structured Markdown representation, so that all downstream steps work on the same representation.

At a glance

  • Layered architecture: user interface → document processing → chunking → translation engine → LLM client
  • Unified Markdown intermediate representation for all input formats
  • Asynchronous execution via asyncio with a configurable number of workers
  • Rate limiter (token bucket) to coordinate requests to the LLM endpoint
  • Table- and code-block-aware chunking with boundary detection at sentence and paragraph borders
  • Context propagation between consecutive chunk translations
  • YAML-based configuration with Pydantic validation; standalone operation as a single Python process

Components and workflow

Layers and their roles

The application consists of the following main layers:

  • User interface (src/ui, src/app.py): Gradio-based web interface with input, output and control areas, complemented by export handlers and URL parameter tracking.
  • Document processing (src/document_processing): Format-specific processors (PDF, Word, text/Markdown) and a Markdown converter that maps the various inputs into a uniform Markdown representation.
  • Chunking (src/chunking): Token-based segmentation with boundary detection, table awareness and code-block detection.
  • Translation engine (src/translation): Table-aware engine with parallel execution, retry logic and per-section context management.
  • LLM client (src/llm_client.py): Asynchronous HTTP client for OpenAI-compatible endpoints with thread-safe session management and exponential backoff.
  • Configuration (src/config.py, config/): Pydantic-validated settings loaded from YAML files for LLM, chunking, parallelism, languages and export.

Data flow

flowchart TB
    User["User"] --> UI["User interface<br/>(Gradio)"]

    UI --> Dispatch{"Input?"}
    Dispatch -->|"File"| DocProc["Document processing"]
    Dispatch -->|"Text"| Direct["Direct ingestion"]

    DocProc --> PDF["PDF processor<br/>(pymupdf4llm / PyMuPDF)"]
    DocProc --> DOCX["Word processor<br/>(python-docx)"]
    DocProc --> MDTXT["Markdown / text<br/>processor"]

    PDF --> MDConv["Markdown converter"]
    DOCX --> MDConv
    MDTXT --> MDConv
    Direct --> MDConv

    MDConv --> Chunker["Chunker<br/>(token- and table-aware)"]

    Chunker --> Engine["Translation engine"]
    Engine --> Context["Context management<br/>(glossary, prior context, style)"]
    Engine --> RateLimiter["Rate limiter"]
    RateLimiter --> LLMClient["LLM client"]
    LLMClient --> LLM["OpenAI-compatible<br/>LLM endpoint"]
    LLM --> LLMClient
    LLMClient --> Engine

    Engine --> Reassembly["Reassembly<br/>(structure preservation, table reconstruction)"]
    Reassembly --> Export["Export handler"]
    Export --> Word["Word"]
    Export --> MD["Markdown"]
    Export --> HTML["HTML"]
    Word --> User
    MD --> User
    HTML --> User

    Config["YAML configuration"] -.-> Engine
    Config -.-> Chunker
    Config -.-> LLMClient

Workflow explanation

The user supplies inputs either as a file or as text. For file inputs, a factory selects the appropriate processor (PDF, DOCX or Markdown/text) based on the file extension. The processor produces a Markdown representation of the document; PDF inputs are preferentially processed via pymupdf4llm, which yields LLM-oriented Markdown output. Text inputs go directly into the Markdown converter.

The Markdown representation is then passed to the chunker, which splits the content into token-based sections. Tables are treated as cohesive units; boundary detection ensures that split points follow sentence and paragraph borders. The result is a list of typed chunks (text, table, code block) with token counts and metadata.

The translation engine orchestrates parallel translation of the chunks. Each chunk is paired with its own context object containing the glossary, the style, the address form, the current heading and summaries of preceding chunks. The requests are paced through a token-bucket rate limiter and sent by the LLM client to the OpenAI-compatible endpoint. Table chunks follow a specialised path: headers and data rows are translated separately and then reassembled faithfully. If a request fails, it is retried with exponential backoff.

Once all chunks have been translated, the engine reassembles them in their original order and passes the result to the export layer. The export layer renders the chosen output format (Word, Markdown or HTML) and provides it for download.

Concurrency and robustness

The pipeline uses a single event loop per server process. The LLM client manages a pool of HTTP sessions with configurable size and keepalive duration and monitors the availability of the event loop. Reusing sessions reduces connection overhead. A rate limiter caps the number of requests per second to the LLM endpoint. In case of errors (timeouts, transient connection drops, rate-limit responses), retry logic with exponential backoff takes effect. If a chunk fails permanently, it is marked as "failed"; the remaining chunks are still returned, so that as much of the translation as possible is preserved.

Configuration and operation

All settings — LLM endpoint, model, token limits, chunk sizes, worker count, rate limit, languages, available styles, logging and export options — are stored in YAML files (config/config.yaml, config/languages.yaml) and validated at runtime via Pydantic models. Environment variables can override individual values. The application is started as an ordinary Python process (python main.py); the Gradio user interface is then available on a configurable port.

Technology overview

  • User interface: Gradio (≥ 6.0)
  • Asynchronous HTTP client: aiohttp
  • PDF processing: pymupdf4llm (preferred), PyMuPDF (fallback)
  • Word processing: python-docx
  • Markdown and HTML processing: markdown2, beautifulsoup4, lxml
  • Configuration and validation: Pydantic (≥ 2.0), pyyaml
  • Retry logic: tenacity
  • Token counting: tiktoken
  • Language detection: langdetect
  • System monitoring: psutil
  • Language and runtime: Python (≥ 3.8), asyncio-based