Architecture¶

CodeDocumentation is structured as an asynchronous three-phase pipeline integrated behind a Gradio interface. The separation between static inspection, model-based code analysis, and narrative generation follows a layered division; the application runs as a single container and can be operated behind a reverse proxy under a path prefix.

At a glance¶

Three clearly separated pipeline phases with different model requirements
Asynchronous processing during the analysis phase with semaphore-bounded parallelism
Two separate language model clients in a shared wrapper class with retry and thinking block handling
Uniform data models (dataclasses) as the interface between phases
Plug-in point for framework-specific endpoint extractors via a registry
Containerised via a Dockerfile based on Python 3.11 with a configurable port
Six independent document generators plus a Mermaid builder and a project classifier

Architecture description¶

Layers and components¶

The application is divided into five logical layers:

Presentation — A Gradio Blocks interface with two tabs ("Input & Configuration", "Results"), progress display, file preview (rendered and as source), and ZIP download. Source selection and model configuration are grouped into accordions.
Orchestration — The pipeline function run_pipeline coordinates the workflow, passes progress callbacks to the phases, and encapsulates error handling. It is implemented asynchronously and uses Gradio progress reporting.
Inspection (Phase 1) — A source loader takes the chosen input source and provides a working directory. A directory walker and several detectors (language, framework, configuration, dependencies, build files, CI, existing documentation) then run without invoking a model.
Analysis (Phase 2) — A file analyzer calls the fast model in parallel per file; the responses are parsed defensively as JSON. Framework-specific endpoint extractors run deterministically alongside. An importance scorer and an aggregator condense the results into package structures and an API catalog.
Generation (Phase 3) — A project classifier and six individual document generators call the thinking model for their respective sections. A Mermaid builder produces the architecture diagram. Finally, all documents are bundled into a ZIP archive.

Component and data flow diagram¶

flowchart TB
    UI["Gradio interface"]
    ORCH["Pipeline orchestration"]
    UI --> ORCH

    subgraph Sources
        SRC1["Local directory"]
        SRC2["ZIP upload"]
        SRC3["GitLab REST API"]
    end

    LOADER["Source Loader"]
    SRC1 --> LOADER
    SRC2 --> LOADER
    SRC3 --> LOADER
    ORCH --> LOADER

    subgraph P1["Phase 1 — Inspection"]
        WALK["Directory Walker"]
        DETLANG["Language and framework"]
        DETDEP["Dependencies"]
        DETCFG["Configuration"]
        DETBUILD["Build and CI"]
    end
    LOADER --> WALK
    WALK --> DETLANG
    WALK --> DETDEP
    WALK --> DETCFG
    WALK --> DETBUILD

    META["ProjectMetadata"]
    DETLANG --> META
    DETDEP --> META
    DETCFG --> META
    DETBUILD --> META

    subgraph P2["Phase 2 — Code analysis"]
        FILEAN["Per-file LLM analysis"]
        FRAMEX["Framework extractors"]
        SCORE["Importance scorer"]
        AGG["Aggregator"]
    end
    META --> FILEAN
    META --> FRAMEX
    FILEAN --> SCORE
    FRAMEX --> AGG
    SCORE --> AGG

    LLMFAST["LLM client - fast"]
    FILEAN <--> LLMFAST

    P2RES["Phase2Result"]
    AGG --> P2RES

    subgraph P3["Phase 3 — Generation"]
        CLASS["Project classifier"]
        GEN["Document generators"]
        MERM["Mermaid builder"]
        REPORT["Generation report"]
    end
    META --> CLASS
    P2RES --> CLASS
    META --> GEN
    P2RES --> GEN
    CLASS --> GEN
    GEN --> MERM

    LLMSLOW["LLM client - thinking"]
    CLASS <--> LLMSLOW
    GEN <--> LLMSLOW

    OUT["Markdown files"]
    GEN --> OUT
    MERM --> OUT
    REPORT --> OUT

    ZIPOUT["ZIP archive"]
    OUT --> ZIPOUT
    ZIPOUT --> UI

Workflow¶

The user selects a source in the interface, configures both language model endpoints, and starts the run. The source loader provides the project in a working directory. Phase 1 inspects this directory without invoking a model and produces a metadata object. Phase 2 uses this object to first run the framework-specific extractors per language and to apply the fast model to each file in parallel. The detected endpoints are linked to the file analyses, deduplicated, and aggregated into an API catalog. Phase 3 first asks the thinking model for a project characterisation and then produces the individual documents, embedded in a deterministic file structure. The Mermaid builder inserts the architecture diagram into the corresponding document. Finally, a generation report with runtime, model calls, and token usage is produced, and everything is packed into a ZIP archive that is offered for download in the interface.

Role of the language models¶

The application uses two model endpoints with clearly separated roles.

Fast model (Phase 2) — One call per analysed file, parallelised via a configurable semaphore. The expected response is structured JSON with purpose, public API, and notable properties. This phase produces between a few dozen and a few hundred calls per run.
Thinking model (Phase 3) — A small number of calls per run, each with a longer response length for prose sections and the Mermaid diagram. The non-standard enable_thinking parameter is forwarded via extra_body to the OpenAI-compatible interface.

Both call paths go through the same wrapper, which strips <think> blocks, extracts JSON robustly, retries on failure with exponential backoff, and counts calls and tokens per role.

Concurrency and robustness¶

The pipeline is implemented asynchronously throughout. The file analysis in Phase 2 runs in parallel via asyncio.gather, bounded by a semaphore (default value adjustable in the interface). Model responses are parsed defensively: JSON wrapped in Markdown fences is unwrapped automatically, and embedded JSON is extracted via a regular expression. Errors in individual files cause the affected analysis to be marked with status error without aborting the overall run. Missing or empty model responses are logged; endpoints are deduplicated by the (method, path) tuple.

Configuration and deployment¶

Configuration is provided via environment variables or via the interface; both model endpoints are configured separately (URL, model name, optional API key, thinking mode). Bind address, port, and reverse proxy path prefix are also set via environment variables. The application is operated as a single container (base image python:3.11-slim) and exposes port 7870 by default. Temporary working directories are created per run and cleaned up by the operating system afterwards.

Extensibility¶

A new framework is integrated by adding an extractor class with three required methods (name, language, extract_endpoints), an entry in the extractor registry, and a detection signal in the framework detector. The rest of the pipeline and the interface remain unchanged.

Technology overview¶

Language and runtime — Python 3.11
Interface — Gradio (gr.Blocks interface, asynchronous event handlers)
Language model integration — openai async client against OpenAI-compatible endpoints; extra_body for non-standard parameters such as enable_thinking
HTTP client — httpx for GitLab REST API access (streaming download)
Data formats — YAML (PyYAML) and TOML (tomllib/tomli) for Phase 1 parsers; JSON for model responses; Markdown as output
Diagrams — Mermaid, embedded as a code block
Templating — Jinja2 for document building blocks
Containerisation — Dockerfile based on python:3.11-slim, exposed port 7870