Architecture¶
KI-Umfrage is implemented as a monolithic Python application with a web UI, deployed in a single container. The internal structure follows a clear separation between the user interface, the orchestration of the survey flow, the LLM pipeline, and an underlying configuration and persistence layer. Calls to the language model are asynchronous, protected by timeouts and retry logic, and shielded against failure by deterministic fallback paths.
At a glance¶
- Layered architecture: UI layer (tab-based), orchestration layer (conversation manager), pipeline layer (survey agent with three LLM stages), infrastructure layer (LLM client, configuration, persistence).
- Asynchronous LLM calls with configurable timeout, retry logic, and JSON mode for structured responses.
- Typed data structures for all pipeline transitions; validation of LLM outputs against allowed value ranges.
- Configuration via YAML files, with optional overrides via environment variables.
- Persistence of sessions, conversation histories, and final answers as JSON; logging to a dedicated log file.
- Containerized with Docker; a single web port exposed externally, health check included.
- Multi-level fallback mechanisms at the prompt, pipeline, and processing levels.
Architecture overview¶
The application is divided into four layers:
User interface. The UI is built with Gradio and consists of six tabs (playground, prompt engineering, batch testing, question editor, session demo, performance). UI code is consolidated in a central GradioInterface class; handler functions mediate between UI events and the orchestration layer.
Orchestration. The ConversationManager starts sessions, holds the state of the current question, and forwards user answers to the survey agent. A session bundles all interactions of a survey and is the basis for later persistence.
Pipeline (survey agent). The agent encapsulates three independent LLM stages — answer evaluation, follow-up generation, and answer structuring. Each stage calls the LLM client with its own prompt template, parses the JSON result, validates it, and returns a typed result object. On error or timeout, a rule-based replacement takes over.
Infrastructure. The LLMClient encapsulates the OpenAI-compatible HTTP communication including timeouts, retries, and JSON parsing. The ConfigLoader reads YAML configuration and survey definitions and applies environment overrides. Results are stored as JSON files; logs are written via Loguru.
Workflow¶
flowchart TD
User[User]
UI[Gradio Tabs]
CM[Conversation Manager]
Agent[Survey Agent]
Eval[Stage 1: Evaluation]
Follow[Stage 2: Follow-up]
Struct[Stage 3: Structuring]
Prompts[Prompt Templates]
LLM[LLM Client]
Fallback[Rule-based Fallback]
LLMAPI[OpenAI-compatible API]
Config[YAML Configuration]
Store[JSON Result Store]
Log[Log File]
User --> UI
UI --> CM
CM --> Agent
Agent --> Eval
Eval -->|low clarity score| Follow
Eval -->|sufficiently clear| Struct
Follow --> CM
CM -->|next answer| Agent
Agent --> Struct
Struct --> CM
CM --> Store
Eval -.-> Prompts
Follow -.-> Prompts
Struct -.-> Prompts
Prompts --> LLM
LLM --> LLMAPI
LLM -.->|timeout / error| Fallback
Fallback --> Agent
Config --> CM
Config --> Agent
Config --> LLM
Agent --> Log
LLM --> Log
The flow begins with a question that the conversation manager hands to the survey agent. Stage 1 (evaluation) calls the LLM client with the evaluation prompt and receives a clarity score, reasoning, and problem types. If the score is below the configured threshold and the maximum follow-up depth has not yet been reached, stage 2 generates a concrete follow-up. This follow-up is passed back through the conversation manager into the agent as a new answer round. As soon as an answer is judged sufficiently clear or the follow-up depth has been exhausted, stage 3 condenses the entire conversation into a structured final answer with a main category, specific items, and a confidence value.
Role of the LLM in the pipeline¶
The language model is used in each pipeline stage for a clearly bounded task: evaluation, follow-up generation, and structuring. All calls use a low temperature for deterministic results and are run in JSON mode so the outputs can be mapped directly into typed data objects. The application maintains a strict separation between model output and business logic: values are validated against allowed ranges (clarity score clamped to 0–1, problem types matched against a whitelist), and invalid or missing fields are replaced without blocking downstream processing.
Concurrency and robustness¶
All LLM calls are asynchronous and bounded by a per-call configurable timeout. A retry policy absorbs transient endpoint errors. If a stage ultimately fails, rule-based fallbacks take over the processing — for example, a heuristic clarity score based on word count and known indicator terms, or a pre-built follow-up question. As a result, the user-facing flow remains uninterrupted even with an unstable LLM connection. Performance metrics are recorded per operation and phase and presented in the UI as aggregated statistics.
Configuration and deployment¶
Configuration is split into two YAML files: a central file for the LLM connection, evaluation thresholds, logging, and persistence; a second one for the survey definition itself. Environment variables with the SURVEY_ prefix can override individual values at runtime. The application ships as a Docker container that exposes a single HTTP port (Gradio, 7860) and runs a periodic health check against the root path.
Technology overview¶
- Language and runtime: Python 3.11.
- Web UI: Gradio (tab layout, chatbot component, live updates).
- Data models and validation: Pydantic.
- LLM connectivity: OpenAI Python SDK against an OpenAI-compatible chat-completion API; asynchronous calls via
asyncio. - Configuration: YAML (PyYAML),
python-dotenvfor environment variables. - Logging: Loguru, with rotation and retention.
- Containerization: Docker; single container, health check on the web port.
- Persistence: JSON files for sessions and final answers, log file for runtime information.