Architecture¶

PPT-Werkstatt is built as a containerised application with clear layer separation. A web UI accepts input, an orchestration layer drives four asynchronous pipelines, a set of specialised agents handles the domain-specific tasks, and a client layer talks to external model services through OpenAI-compatible REST endpoints.

At a glance¶

Container-based deployment, placed behind a reverse proxy under a configurable path.
Web UI built on a Python web framework; chat-centred layout with auxiliary slide and inventory overviews.
Session-based in-memory state without persistence; the state is serialised between UI events.
Asynchronous pipelines with an event stream for UI updates; a stop mechanism allows running operations to be cancelled.
Four pipelines: inventarisation, briefing, plan generation, detail editing.
More than ten domain agents, each with its own prompt and a narrow scope.
Connection to external model services through a unified async client.

Layers and components¶

The application is split into five conceptual layers:

UI layer — Web interface with chat area, file upload, slide and inventory overviews, preview, and action bar. It holds no state of its own; the application state is serialised between UI events.
Orchestration layer — Four asynchronous pipelines bundle the workflows: inventarisation of uploaded documents, briefing dialogue, multi-phase plan generation of the slide deck, and detail editing per chat instruction.
Agent layer — Independent agents, each with a narrowly defined responsibility: inventory analyzer, intent router, intent QA, briefing agent, structure planner, layout advisor, content extractor, content balancer, icon picker, grafix selector, layout polisher, homogenizer, and validator.
Data model layer — Inventory of typed entries with embedding vectors, layout registry, application state with slides, scope, versions, and chat history.
Client layer — Adapters to four external model services (primary LLM, fast LLM, embedder, reranker), document readers for the import formats, renderers, and exporters for the target formats.

Workflow¶

flowchart TD
    Upload[Document upload] --> InvPipe
    Briefing[Briefing dialogue] --> PlanPipe

    subgraph InvPipe [Inventarisation pipeline]
        IA[Inventory analyzer]
    end

    InvPipe --> Inventory[(Inventory)]
    Inventory --> PlanPipe

    subgraph PlanPipe [Plan pipeline]
        direction TB
        Struct[Structure planner] --> LayoutA[Layout advisor]
        LayoutA --> Content[Content extractor parallel]
        Content --> Validator[Validator three-stage]
        Validator -. Correction loop .-> Content
        Validator --> Homog[Homogenizer]
    end

    PlanPipe --> Deck[(Slide deck)]
    Deck --> DetailPipe
    DetailPipe --> Deck

    subgraph DetailPipe [Detail pipeline]
        direction TB
        Router[Intent router] --> QA[Intent QA]
        QA --> Dispatcher[Dispatcher]
        Dispatcher --> Mini[Mini-pipelines]
    end

    Deck --> Export[Export PPTX, DOCX, Markdown]

    PlanPipe -.-> Versions[(Version store)]
    DetailPipe -.-> Versions

    subgraph Models [External model services]
        direction LR
        Primary[Primary LLM]
        Fast[Fast LLM]
        Embed[Embedder]
        Rerank[Reranker]
    end

    InvPipe -.-> Models
    PlanPipe -.-> Models
    DetailPipe -.-> Models

Workflow explanation¶

Uploaded documents first run through the inventarisation pipeline: the inventory analyzer (running on the fast LLM) segments each document into typed entries — text, table, list, decision, quote, metric, heading. Each entry is assigned a semantic vector through the embedder. The result is an in-memory inventory with a semantic search structure.

In parallel, the briefing dialogue clarifies the framing parameters of the presentation with the user. Once these are complete, the plan pipeline runs in five phases:

The structure planner drafts a coarse slide structure from briefing and inventory (slide titles, sequence, slide roles).
The layout advisor assigns each slide a suitable layout from the catalogue.
The content extractor fills the slots of each slide from the assigned material; multiple slides are processed in parallel. If material coverage is low, a second pass is triggered with additional retrieval.
The validator checks each slide in three stages (rule, embedding, LLM check). Flagged slides enter a correction loop with up to two iterations.
The homogenizer smooths out inconsistencies in tone and terminology across the whole deck.

Optionally, an icon picker and a grafix selector add icons and graphical diagram layouts.

After the plan pipeline, the detail pipeline handles chat instructions: the intent router classifies the instruction into one of 28 action types. When confidence is low, multiple actions are detected, or a context conflict is found, the intent QA layer steps in, accepting, correcting, or triggering a clarification question. The dispatcher then hands over to a mini-pipeline that, depending on the action type, modifies only the affected slide or the deck as a whole. Before any modifying operation, a version snapshot is stored; the last 15 states can be reverted through undo.

Role of the AI components¶

The primary LLM (with thinking enabled) serves the quality-critical agents: routing, QA, briefing, structure planning, layout recommendation, homogenization, validation.
The fast LLM (with thinking disabled) serves bulk and pattern-matching tasks: inventory analysis, parallel slot filling, content balancing, icon and diagram selection.
The embedder provides vectors for inventory entries, slide content, and queries — the basis for retrieval, semantic scope, and material coverage.
The reranker orders the embedder hits more precisely: a top-N selection is reduced to a final top-K ranking.

The split into two LLM roles with different speed and reasoning effort is intentional: heavyweight reasoning goes to the tasks that benefit from it, while high-frequency bulk tasks run quickly and in parallel.

Concurrency, robustness, configuration¶

The pipelines are implemented as asynchronous generators; events are streamed to the UI so that progress is visible and operations can be cancelled through a stop action.
Slot filling in the plan pipeline runs in parallel; the number of concurrently processed slides is configurable.
The model clients include retry logic; a per-endpoint cap on concurrent requests guards against overload.
Sessions are in-memory and discarded on container restart — a deliberate data-protection property.
Configuration is supplied via environment variables (endpoints, model names, pipeline behaviour); a self-test verifies the reachability of all model services.

Deployment¶

The application ships as a Docker container (base: Python 3.12 with system libraries for SVG rendering and document processing). It is placed behind a reverse proxy under a configurable path. The external model services are institution-internal endpoints addressed through OpenAI-compatible REST interfaces.

Technology overview¶

Web UI: Gradio 6.
Model integration: OpenAI-compatible async client (openai SDK), extended with model-specific thinking parameters for the Qwen, Kimi, GLM, and Gemma families.
Document processing: python-pptx, python-docx, pypdf.
Numerics and vectors: NumPy.
SVG and bitmap rendering: cairosvg for diagram graphics (with Cairo/Pango as system dependencies).
Data models: pydantic and dataclasses.
Configuration: python-dotenv.
Icon set: Bootstrap Icons (MIT licence).
Diagram templates: an integrated external template project for diagram graphics (Fabric.js JSON format).
Models in use: Kimi K2.5 as the primary LLM, Qwen 3 (or comparable model) as the fast LLM, BGE-M3 as the embedder, BGE-Reranker-v2-M3 as the reranker.