Architecture¶
PPT-Werkstatt is built as a containerised application with clear layer separation. A web UI accepts input, an orchestration layer drives four asynchronous pipelines, a set of specialised agents handles the domain-specific tasks, and a client layer talks to external model services through OpenAI-compatible REST endpoints.
At a glance¶
- Container-based deployment, placed behind a reverse proxy under a configurable path.
- Web UI built on a Python web framework; chat-centred layout with auxiliary slide and inventory overviews.
- Session-based in-memory state without persistence; the state is serialised between UI events.
- Asynchronous pipelines with an event stream for UI updates; a stop mechanism allows running operations to be cancelled.
- Four pipelines: inventarisation, briefing, plan generation, detail editing.
- More than ten domain agents, each with its own prompt and a narrow scope.
- Connection to external model services through a unified async client.
Layers and components¶
The application is split into five conceptual layers:
- UI layer — Web interface with chat area, file upload, slide and inventory overviews, preview, and action bar. It holds no state of its own; the application state is serialised between UI events.
- Orchestration layer — Four asynchronous pipelines bundle the workflows: inventarisation of uploaded documents, briefing dialogue, multi-phase plan generation of the slide deck, and detail editing per chat instruction.
- Agent layer — Independent agents, each with a narrowly defined responsibility: inventory analyzer, intent router, intent QA, briefing agent, structure planner, layout advisor, content extractor, content balancer, icon picker, grafix selector, layout polisher, homogenizer, and validator.
- Data model layer — Inventory of typed entries with embedding vectors, layout registry, application state with slides, scope, versions, and chat history.
- Client layer — Adapters to four external model services (primary LLM, fast LLM, embedder, reranker), document readers for the import formats, renderers, and exporters for the target formats.
Workflow¶
flowchart TD
Upload[Document upload] --> InvPipe
Briefing[Briefing dialogue] --> PlanPipe
subgraph InvPipe [Inventarisation pipeline]
IA[Inventory analyzer]
end
InvPipe --> Inventory[(Inventory)]
Inventory --> PlanPipe
subgraph PlanPipe [Plan pipeline]
direction TB
Struct[Structure planner] --> LayoutA[Layout advisor]
LayoutA --> Content[Content extractor parallel]
Content --> Validator[Validator three-stage]
Validator -. Correction loop .-> Content
Validator --> Homog[Homogenizer]
end
PlanPipe --> Deck[(Slide deck)]
Deck --> DetailPipe
DetailPipe --> Deck
subgraph DetailPipe [Detail pipeline]
direction TB
Router[Intent router] --> QA[Intent QA]
QA --> Dispatcher[Dispatcher]
Dispatcher --> Mini[Mini-pipelines]
end
Deck --> Export[Export PPTX, DOCX, Markdown]
PlanPipe -.-> Versions[(Version store)]
DetailPipe -.-> Versions
subgraph Models [External model services]
direction LR
Primary[Primary LLM]
Fast[Fast LLM]
Embed[Embedder]
Rerank[Reranker]
end
InvPipe -.-> Models
PlanPipe -.-> Models
DetailPipe -.-> Models
Workflow explanation¶
Uploaded documents first run through the inventarisation pipeline: the inventory analyzer (running on the fast LLM) segments each document into typed entries — text, table, list, decision, quote, metric, heading. Each entry is assigned a semantic vector through the embedder. The result is an in-memory inventory with a semantic search structure.
In parallel, the briefing dialogue clarifies the framing parameters of the presentation with the user. Once these are complete, the plan pipeline runs in five phases:
- The structure planner drafts a coarse slide structure from briefing and inventory (slide titles, sequence, slide roles).
- The layout advisor assigns each slide a suitable layout from the catalogue.
- The content extractor fills the slots of each slide from the assigned material; multiple slides are processed in parallel. If material coverage is low, a second pass is triggered with additional retrieval.
- The validator checks each slide in three stages (rule, embedding, LLM check). Flagged slides enter a correction loop with up to two iterations.
- The homogenizer smooths out inconsistencies in tone and terminology across the whole deck.
Optionally, an icon picker and a grafix selector add icons and graphical diagram layouts.
After the plan pipeline, the detail pipeline handles chat instructions: the intent router classifies the instruction into one of 28 action types. When confidence is low, multiple actions are detected, or a context conflict is found, the intent QA layer steps in, accepting, correcting, or triggering a clarification question. The dispatcher then hands over to a mini-pipeline that, depending on the action type, modifies only the affected slide or the deck as a whole. Before any modifying operation, a version snapshot is stored; the last 15 states can be reverted through undo.
Role of the AI components¶
- The primary LLM (with thinking enabled) serves the quality-critical agents: routing, QA, briefing, structure planning, layout recommendation, homogenization, validation.
- The fast LLM (with thinking disabled) serves bulk and pattern-matching tasks: inventory analysis, parallel slot filling, content balancing, icon and diagram selection.
- The embedder provides vectors for inventory entries, slide content, and queries — the basis for retrieval, semantic scope, and material coverage.
- The reranker orders the embedder hits more precisely: a top-N selection is reduced to a final top-K ranking.
The split into two LLM roles with different speed and reasoning effort is intentional: heavyweight reasoning goes to the tasks that benefit from it, while high-frequency bulk tasks run quickly and in parallel.
Concurrency, robustness, configuration¶
- The pipelines are implemented as asynchronous generators; events are streamed to the UI so that progress is visible and operations can be cancelled through a stop action.
- Slot filling in the plan pipeline runs in parallel; the number of concurrently processed slides is configurable.
- The model clients include retry logic; a per-endpoint cap on concurrent requests guards against overload.
- Sessions are in-memory and discarded on container restart — a deliberate data-protection property.
- Configuration is supplied via environment variables (endpoints, model names, pipeline behaviour); a self-test verifies the reachability of all model services.
Deployment¶
The application ships as a Docker container (base: Python 3.12 with system libraries for SVG rendering and document processing). It is placed behind a reverse proxy under a configurable path. The external model services are institution-internal endpoints addressed through OpenAI-compatible REST interfaces.
Technology overview¶
- Web UI: Gradio 6.
- Model integration: OpenAI-compatible async client (
openaiSDK), extended with model-specific thinking parameters for the Qwen, Kimi, GLM, and Gemma families. - Document processing:
python-pptx,python-docx,pypdf. - Numerics and vectors: NumPy.
- SVG and bitmap rendering:
cairosvgfor diagram graphics (with Cairo/Pango as system dependencies). - Data models:
pydanticanddataclasses. - Configuration:
python-dotenv. - Icon set: Bootstrap Icons (MIT licence).
- Diagram templates: an integrated external template project for diagram graphics (Fabric.js JSON format).
- Models in use: Kimi K2.5 as the primary LLM, Qwen 3 (or comparable model) as the fast LLM, BGE-M3 as the embedder, BGE-Reranker-v2-M3 as the reranker.