Architecture¶
Bildgenerierung is implemented as a containerized web application that provides its own frontend and delegates all image generation tasks to an external, OpenAI-compatible inference service. The application itself holds no persistent state and is limited to input collection, call coordination, image preprocessing, and the preparation of results for display and download.
At a glance¶
- Gradio-based frontend in a slim Docker container
- Synchronous calls to an OpenAI-compatible inference service over HTTPS
- Clear separation of interface, application logic, image preprocessing, and HTTP communication
- Dynamic endpoint selection depending on the presence of reference images (image generation API or chat completion API)
- Configuration exclusively via environment variables (endpoint, model, API key)
- Containerized operation with health check and execution as an unprivileged user
Architecture description¶
Components and layers¶
The application is structured into four logical layers within a single process:
- Frontend layer (Gradio UI): Provides input fields, presets, the accordion with extended parameters, the result gallery, and the example prompts. It translates user input into function calls and reacts to preset changes via event handlers.
- Application logic: Decides on the API path to use based on the input, validates basic input (e.g., a mandatory prompt), and orchestrates multiple calls when several variants are requested in parallel.
- Image preprocessing: Scales reference images to a maximum edge length, converts them to JPEG or PNG depending on color mode, and produces data URIs for use in the chat completion path. Server responses are recognized in different formats (Base64, embedded image data, data URIs, external URLs) and turned into usable image objects.
- HTTP communication layer: Calls the inference service over HTTPS with a configured timeout and active certificate verification. It adds the API key in the Authorization header and distinguishes connection errors, timeouts, and HTTP status errors.
Workflow¶
flowchart TB
Browser[Browser]
subgraph App["Application container"]
UI["Gradio UI<br/>Inputs, presets, gallery"]
Logic["Application logic<br/>Endpoint selection, variant control"]
Pre["Image preprocessing<br/>Scaling, encoding"]
HTTP["HTTPS client"]
end
subgraph Local["Local infrastructure"]
Server["Inference server<br/>OpenAI-compatible"]
Models[("Image models<br/>Qwen-Image-2512<br/>FLUX.2-dev")]
end
Browser -->|inputs| UI
UI --> Logic
Logic -->|with reference images| Pre
Pre --> HTTP
Logic -->|without reference images| HTTP
HTTP -->"/v1/images/generations"| Server
HTTP -->"/v1/chat/completions"| Server
Server --> Models
Models --> Server
Server -->|image data| HTTP
HTTP -->|decoding| Logic
Logic -->|gallery| UI
UI -->|display, download| Browser
A single run follows a linear chain: the interface receives a prompt, optional reference images, and parameters, and passes them to the application logic. Based on the input, the logic decides whether to use the image generation endpoint (text-to-image) or the chat completion endpoint (image-to-image). In the image-to-image path, reference images are scaled beforehand and embedded into the request as inline data. The HTTPS client calls the inference server in the local infrastructure, which in turn forwards the request to the configured image model. The response is decoded according to its format, stored as a PNG file with a timestamp-based name, and shown in the gallery. A result can subsequently be adopted as a reference for the next run via a button.
Image generation and models¶
The actual image generation takes place exclusively on the inference server. The application itself does not load any model and does not perform GPU-backed computation. Through the central model configuration, switching between the supported image models Qwen-Image-2512 and black-forest-labs/FLUX.2-dev is possible without restarting or modifying the application; the selection is made via the model identifier in the configuration of the inference server.
Concurrency and robustness¶
The application uses Gradio's built-in request queue, so concurrent calls are processed in order. HTTP calls are synchronous and bounded by a timeout. Three error classes are handled separately: connection errors (server unreachable), timeouts (with a hint to reduce steps or resolution), and HTTP error responses with a truncated representation of the server reply. Responses with missing or unrecognized image data structures are detected and acknowledged with a meaningful message.
Configuration and deployment¶
Configuration is done entirely through environment variables, optionally loaded from an .env file. Key parameters are the base URL of the inference service, the API key, and the model identifier; in addition, the bind address, port, and a root path for operation behind a reverse proxy can be set. The provided container is based on a slim Python image, installs dependencies without cache, copies the application, creates an unprivileged user, and exposes the service on port 7860. A built-in health check periodically verifies the reachability of the web server.
Technology overview¶
- Language and runtime: Python 3.12
- Frontend framework: Gradio (version 6)
- HTTP client: httpx with TLS verification and configured timeout
- Image processing: Pillow (PIL)
- Configuration: python-dotenv
- Containerization: Docker, slim Python base image, unprivileged user, health check
- Backend interface: OpenAI-compatible inference service (endpoints
/v1/images/generationsand/v1/chat/completions) - Image models: Qwen-Image-2512 and black-forest-labs/FLUX.2-dev
- Protocol: HTTPS, JSON-based requests and responses according to the OpenAI specification