Skip to content

Architecture

KI-Umfrage is implemented as a monolithic Python application with a web UI, deployed in a single container. The internal structure follows a clear separation between the user interface, the orchestration of the survey flow, the LLM pipeline, and an underlying configuration and persistence layer. Calls to the language model are asynchronous, protected by timeouts and retry logic, and shielded against failure by deterministic fallback paths.

At a glance

  • Layered architecture: UI layer (tab-based), orchestration layer (conversation manager), pipeline layer (survey agent with three LLM stages), infrastructure layer (LLM client, configuration, persistence).
  • Asynchronous LLM calls with configurable timeout, retry logic, and JSON mode for structured responses.
  • Typed data structures for all pipeline transitions; validation of LLM outputs against allowed value ranges.
  • Configuration via YAML files, with optional overrides via environment variables.
  • Persistence of sessions, conversation histories, and final answers as JSON; logging to a dedicated log file.
  • Containerized with Docker; a single web port exposed externally, health check included.
  • Multi-level fallback mechanisms at the prompt, pipeline, and processing levels.

Architecture overview

The application is divided into four layers:

User interface. The UI is built with Gradio and consists of six tabs (playground, prompt engineering, batch testing, question editor, session demo, performance). UI code is consolidated in a central GradioInterface class; handler functions mediate between UI events and the orchestration layer.

Orchestration. The ConversationManager starts sessions, holds the state of the current question, and forwards user answers to the survey agent. A session bundles all interactions of a survey and is the basis for later persistence.

Pipeline (survey agent). The agent encapsulates three independent LLM stages — answer evaluation, follow-up generation, and answer structuring. Each stage calls the LLM client with its own prompt template, parses the JSON result, validates it, and returns a typed result object. On error or timeout, a rule-based replacement takes over.

Infrastructure. The LLMClient encapsulates the OpenAI-compatible HTTP communication including timeouts, retries, and JSON parsing. The ConfigLoader reads YAML configuration and survey definitions and applies environment overrides. Results are stored as JSON files; logs are written via Loguru.

Workflow

flowchart TD
    User[User]
    UI[Gradio Tabs]
    CM[Conversation Manager]
    Agent[Survey Agent]
    Eval[Stage 1: Evaluation]
    Follow[Stage 2: Follow-up]
    Struct[Stage 3: Structuring]
    Prompts[Prompt Templates]
    LLM[LLM Client]
    Fallback[Rule-based Fallback]
    LLMAPI[OpenAI-compatible API]
    Config[YAML Configuration]
    Store[JSON Result Store]
    Log[Log File]

    User --> UI
    UI --> CM
    CM --> Agent
    Agent --> Eval
    Eval -->|low clarity score| Follow
    Eval -->|sufficiently clear| Struct
    Follow --> CM
    CM -->|next answer| Agent
    Agent --> Struct
    Struct --> CM
    CM --> Store

    Eval -.-> Prompts
    Follow -.-> Prompts
    Struct -.-> Prompts
    Prompts --> LLM
    LLM --> LLMAPI
    LLM -.->|timeout / error| Fallback
    Fallback --> Agent

    Config --> CM
    Config --> Agent
    Config --> LLM
    Agent --> Log
    LLM --> Log

The flow begins with a question that the conversation manager hands to the survey agent. Stage 1 (evaluation) calls the LLM client with the evaluation prompt and receives a clarity score, reasoning, and problem types. If the score is below the configured threshold and the maximum follow-up depth has not yet been reached, stage 2 generates a concrete follow-up. This follow-up is passed back through the conversation manager into the agent as a new answer round. As soon as an answer is judged sufficiently clear or the follow-up depth has been exhausted, stage 3 condenses the entire conversation into a structured final answer with a main category, specific items, and a confidence value.

Role of the LLM in the pipeline

The language model is used in each pipeline stage for a clearly bounded task: evaluation, follow-up generation, and structuring. All calls use a low temperature for deterministic results and are run in JSON mode so the outputs can be mapped directly into typed data objects. The application maintains a strict separation between model output and business logic: values are validated against allowed ranges (clarity score clamped to 0–1, problem types matched against a whitelist), and invalid or missing fields are replaced without blocking downstream processing.

Concurrency and robustness

All LLM calls are asynchronous and bounded by a per-call configurable timeout. A retry policy absorbs transient endpoint errors. If a stage ultimately fails, rule-based fallbacks take over the processing — for example, a heuristic clarity score based on word count and known indicator terms, or a pre-built follow-up question. As a result, the user-facing flow remains uninterrupted even with an unstable LLM connection. Performance metrics are recorded per operation and phase and presented in the UI as aggregated statistics.

Configuration and deployment

Configuration is split into two YAML files: a central file for the LLM connection, evaluation thresholds, logging, and persistence; a second one for the survey definition itself. Environment variables with the SURVEY_ prefix can override individual values at runtime. The application ships as a Docker container that exposes a single HTTP port (Gradio, 7860) and runs a periodic health check against the root path.

Technology overview

  • Language and runtime: Python 3.11.
  • Web UI: Gradio (tab layout, chatbot component, live updates).
  • Data models and validation: Pydantic.
  • LLM connectivity: OpenAI Python SDK against an OpenAI-compatible chat-completion API; asynchronous calls via asyncio.
  • Configuration: YAML (PyYAML), python-dotenv for environment variables.
  • Logging: Loguru, with rotation and retention.
  • Containerization: Docker; single container, health check on the web port.
  • Persistence: JSON files for sessions and final answers, log file for runtime information.