Architecture¶

Bildgenerierung is implemented as a containerized web application that provides its own frontend and delegates all image generation tasks to an external, OpenAI-compatible inference service. The application itself holds no persistent state and is limited to input collection, call coordination, image preprocessing, and the preparation of results for display and download.

At a glance¶

Gradio-based frontend in a slim Docker container
Synchronous calls to an OpenAI-compatible inference service over HTTPS
Clear separation of interface, application logic, image preprocessing, and HTTP communication
Dynamic endpoint selection depending on the presence of reference images (image generation API or chat completion API)
Configuration exclusively via environment variables (endpoint, model, API key)
Containerized operation with health check and execution as an unprivileged user

Architecture description¶

Components and layers¶

The application is structured into four logical layers within a single process:

Frontend layer (Gradio UI): Provides input fields, presets, the accordion with extended parameters, the result gallery, and the example prompts. It translates user input into function calls and reacts to preset changes via event handlers.
Application logic: Decides on the API path to use based on the input, validates basic input (e.g., a mandatory prompt), and orchestrates multiple calls when several variants are requested in parallel.
Image preprocessing: Scales reference images to a maximum edge length, converts them to JPEG or PNG depending on color mode, and produces data URIs for use in the chat completion path. Server responses are recognized in different formats (Base64, embedded image data, data URIs, external URLs) and turned into usable image objects.
HTTP communication layer: Calls the inference service over HTTPS with a configured timeout and active certificate verification. It adds the API key in the Authorization header and distinguishes connection errors, timeouts, and HTTP status errors.

Workflow¶

flowchart TB
    Browser[Browser]

    subgraph App["Application container"]
        UI["Gradio UI<br/>Inputs, presets, gallery"]
        Logic["Application logic<br/>Endpoint selection, variant control"]
        Pre["Image preprocessing<br/>Scaling, encoding"]
        HTTP["HTTPS client"]
    end

    subgraph Local["Local infrastructure"]
        Server["Inference server<br/>OpenAI-compatible"]
        Models[("Image models<br/>Qwen-Image-2512<br/>FLUX.2-dev")]
    end

    Browser -->|inputs| UI
    UI --> Logic
    Logic -->|with reference images| Pre
    Pre --> HTTP
    Logic -->|without reference images| HTTP
    HTTP -->"/v1/images/generations"| Server
    HTTP -->"/v1/chat/completions"| Server
    Server --> Models
    Models --> Server
    Server -->|image data| HTTP
    HTTP -->|decoding| Logic
    Logic -->|gallery| UI
    UI -->|display, download| Browser

A single run follows a linear chain: the interface receives a prompt, optional reference images, and parameters, and passes them to the application logic. Based on the input, the logic decides whether to use the image generation endpoint (text-to-image) or the chat completion endpoint (image-to-image). In the image-to-image path, reference images are scaled beforehand and embedded into the request as inline data. The HTTPS client calls the inference server in the local infrastructure, which in turn forwards the request to the configured image model. The response is decoded according to its format, stored as a PNG file with a timestamp-based name, and shown in the gallery. A result can subsequently be adopted as a reference for the next run via a button.

Image generation and models¶

The actual image generation takes place exclusively on the inference server. The application itself does not load any model and does not perform GPU-backed computation. Through the central model configuration, switching between the supported image models Qwen-Image-2512 and black-forest-labs/FLUX.2-dev is possible without restarting or modifying the application; the selection is made via the model identifier in the configuration of the inference server.

Concurrency and robustness¶

The application uses Gradio's built-in request queue, so concurrent calls are processed in order. HTTP calls are synchronous and bounded by a timeout. Three error classes are handled separately: connection errors (server unreachable), timeouts (with a hint to reduce steps or resolution), and HTTP error responses with a truncated representation of the server reply. Responses with missing or unrecognized image data structures are detected and acknowledged with a meaningful message.

Configuration and deployment¶

Configuration is done entirely through environment variables, optionally loaded from an .env file. Key parameters are the base URL of the inference service, the API key, and the model identifier; in addition, the bind address, port, and a root path for operation behind a reverse proxy can be set. The provided container is based on a slim Python image, installs dependencies without cache, copies the application, creates an unprivileged user, and exposes the service on port 7860. A built-in health check periodically verifies the reachability of the web server.

Technology overview¶

Language and runtime: Python 3.12
Frontend framework: Gradio (version 6)
HTTP client: httpx with TLS verification and configured timeout
Image processing: Pillow (PIL)
Configuration: python-dotenv
Containerization: Docker, slim Python base image, unprivileged user, health check
Backend interface: OpenAI-compatible inference service (endpoints /v1/images/generations and /v1/chat/completions)
Image models: Qwen-Image-2512 and black-forest-labs/FLUX.2-dev
Protocol: HTTPS, JSON-based requests and responses according to the OpenAI specification