Architecture¶

Plone-Migration is built as a containerised web application with clearly separated areas of responsibility. A UI layer serves the browser front end, a domain layer processes Plone data and produces structured Markdown representations, an API layer encapsulates the connection to the external LLM interface, and an export layer generates Markdown and Word documents. Processing is synchronous per request; longer operations are accompanied by progress messages. Deployment runs as a Docker container behind a reverse proxy.

At a glance¶

Four-layer structure: UI, domain logic (Plone processing), LLM API connection, export
Sequential, two-stage LLM processing: content analysis → revision suggestions with the analysis as context
External connection exclusively via an OpenAI-compatible Chat Completions API
In-memory processing with temporary files for downloads; no persistent storage
Container operation (Docker) behind a reverse proxy with a configurable root path
Configuration entirely via environment variables (endpoint, model, timeouts, server binding)
Optional components (Word export) with graceful-degradation behaviour

Components and workflow¶

Processing follows a linear data flow from file upload through structuring to LLM-supported preparation and export. Each stage stores its results in an internal state, so that downstream steps (e.g. multiple export formats) are possible without reprocessing.

flowchart TB
  Browser[Browser]

  subgraph App[Application container]
    UI[UI layer<br/>Gradio Blocks]
    Domain[Domain logic<br/>Plone parser, hierarchy, Markdown generation]
    LLMClient[LLM client<br/>retry, timeout, chunking]
    Exporter[Export layer<br/>Markdown, ZIP, Word]
    State[(In-memory state<br/>pages, hierarchy, results)]
  end

  LLM[OpenAI-compatible<br/>LLM API]
  Files[(Temporary<br/>download files)]

  Browser -- HTTP/reverse proxy --> UI
  UI --> Domain
  Domain --> State
  State --> Exporter
  UI --> LLMClient
  LLMClient -- Chat Completions --> LLM
  LLM -- response --> LLMClient
  LLMClient --> State
  Exporter --> Files
  Files --> UI
  UI --> Browser

UI layer¶

The UI is implemented with a web framework that provides input fields, buttons, file outputs, and Markdown previews directly in the browser. It forwards user input to the domain logic and renders intermediate results from the application state. Longer operations are signalled in the interface via a progress callback.

Domain layer (Plone processing)¶

The domain layer parses the uploaded Plone JSON export, normalises fields, and builds the page hierarchy via Plone UIDs. From the HTML content of the pages a unified Markdown representation is generated; embedded Base64 images are replaced by placeholders. The layer provides both a consolidated overall view and per-page representations. Paths and file names are sanitised for cross-platform use (umlaut substitution, length limitation, conflict resolution via suffixes).

LLM connection¶

The LLM client encapsulates communication with the external Chat Completions API. It holds a dedicated prompt for each stage (content analysis, revision suggestions) and performs two sequential calls: the first call analyses the Markdown; the second call returns the analysis result as context together with the original content to the model in order to generate concrete revision suggestions. Very large content is truncated before submission; a helper for splitting into chunks is available for future extensions. For transient errors (HTTP 429, timeouts, connection failures) requests are retried with exponential backoff up to a configurable number of attempts.

Export layer¶

The export layer turns the application state into the user-facing downloadable artefacts. Markdown exports are generated directly from the cached content; ZIP exports mirror the Plone hierarchy as a directory tree and add a _structure.md file with overviews and statistics. The Word export is optional: when the dependency is available, .docx files with proper metadata are produced; if it is missing, the Markdown path remains usable without disruption. All result files are placed in temporary directories and managed by the web framework.

State management and concurrency¶

Processed pages, the reconstructed hierarchy, the aggregated Markdown content, and the most recent LLM results are kept in process-local state. Persistence beyond the lifetime of a container is not provided; a restart clears the state. Processing per request is synchronous, but is embedded in the framework such that the UI continues to receive status messages during longer operations.

Configuration and deployment¶

The application runs as a Docker container. Build and runtime parameters (LLM endpoint, model name, API key, timeout, retry count, server binding, root path) are set via environment variables. For operation behind a reverse proxy, an example configuration for Nginx is provided which, among other things, raises the upload limit to 100 MB and forwards WebSocket upgrades for the live-updating UI.

Robustness¶

Robustness is addressed at several layers: retries with exponential backoff in the LLM connection, sanitisation and length limitation of paths, conflict resolution for duplicate file names, truncation of overlong inputs with a notice, and graceful degradation for optional components.

Technology overview¶

Language and runtime: Python 3.13
Web framework: Gradio (UI, file handling, progress callbacks)
HTTP client: requests
Document generation: python-docx (Word), BeautifulSoup4 and lxml (HTML parsing)
External interface: OpenAI-compatible Chat Completions API
Containerisation: Docker, Docker Compose
Reverse proxy: Nginx (example configuration included)