Architecture¶
CodeDocumentation is structured as an asynchronous three-phase pipeline integrated behind a Gradio interface. The separation between static inspection, model-based code analysis, and narrative generation follows a layered division; the application runs as a single container and can be operated behind a reverse proxy under a path prefix.
At a glance¶
- Three clearly separated pipeline phases with different model requirements
- Asynchronous processing during the analysis phase with semaphore-bounded parallelism
- Two separate language model clients in a shared wrapper class with retry and thinking block handling
- Uniform data models (dataclasses) as the interface between phases
- Plug-in point for framework-specific endpoint extractors via a registry
- Containerised via a Dockerfile based on Python 3.11 with a configurable port
- Six independent document generators plus a Mermaid builder and a project classifier
Architecture description¶
Layers and components¶
The application is divided into five logical layers:
- Presentation — A Gradio
Blocksinterface with two tabs ("Input & Configuration", "Results"), progress display, file preview (rendered and as source), and ZIP download. Source selection and model configuration are grouped into accordions. - Orchestration — The pipeline function
run_pipelinecoordinates the workflow, passes progress callbacks to the phases, and encapsulates error handling. It is implemented asynchronously and uses Gradio progress reporting. - Inspection (Phase 1) — A source loader takes the chosen input source and provides a working directory. A directory walker and several detectors (language, framework, configuration, dependencies, build files, CI, existing documentation) then run without invoking a model.
- Analysis (Phase 2) — A file analyzer calls the fast model in parallel per file; the responses are parsed defensively as JSON. Framework-specific endpoint extractors run deterministically alongside. An importance scorer and an aggregator condense the results into package structures and an API catalog.
- Generation (Phase 3) — A project classifier and six individual document generators call the thinking model for their respective sections. A Mermaid builder produces the architecture diagram. Finally, all documents are bundled into a ZIP archive.
Component and data flow diagram¶
flowchart TB
UI["Gradio interface"]
ORCH["Pipeline orchestration"]
UI --> ORCH
subgraph Sources
SRC1["Local directory"]
SRC2["ZIP upload"]
SRC3["GitLab REST API"]
end
LOADER["Source Loader"]
SRC1 --> LOADER
SRC2 --> LOADER
SRC3 --> LOADER
ORCH --> LOADER
subgraph P1["Phase 1 — Inspection"]
WALK["Directory Walker"]
DETLANG["Language and framework"]
DETDEP["Dependencies"]
DETCFG["Configuration"]
DETBUILD["Build and CI"]
end
LOADER --> WALK
WALK --> DETLANG
WALK --> DETDEP
WALK --> DETCFG
WALK --> DETBUILD
META["ProjectMetadata"]
DETLANG --> META
DETDEP --> META
DETCFG --> META
DETBUILD --> META
subgraph P2["Phase 2 — Code analysis"]
FILEAN["Per-file LLM analysis"]
FRAMEX["Framework extractors"]
SCORE["Importance scorer"]
AGG["Aggregator"]
end
META --> FILEAN
META --> FRAMEX
FILEAN --> SCORE
FRAMEX --> AGG
SCORE --> AGG
LLMFAST["LLM client - fast"]
FILEAN <--> LLMFAST
P2RES["Phase2Result"]
AGG --> P2RES
subgraph P3["Phase 3 — Generation"]
CLASS["Project classifier"]
GEN["Document generators"]
MERM["Mermaid builder"]
REPORT["Generation report"]
end
META --> CLASS
P2RES --> CLASS
META --> GEN
P2RES --> GEN
CLASS --> GEN
GEN --> MERM
LLMSLOW["LLM client - thinking"]
CLASS <--> LLMSLOW
GEN <--> LLMSLOW
OUT["Markdown files"]
GEN --> OUT
MERM --> OUT
REPORT --> OUT
ZIPOUT["ZIP archive"]
OUT --> ZIPOUT
ZIPOUT --> UI
Workflow¶
The user selects a source in the interface, configures both language model endpoints, and starts the run. The source loader provides the project in a working directory. Phase 1 inspects this directory without invoking a model and produces a metadata object. Phase 2 uses this object to first run the framework-specific extractors per language and to apply the fast model to each file in parallel. The detected endpoints are linked to the file analyses, deduplicated, and aggregated into an API catalog. Phase 3 first asks the thinking model for a project characterisation and then produces the individual documents, embedded in a deterministic file structure. The Mermaid builder inserts the architecture diagram into the corresponding document. Finally, a generation report with runtime, model calls, and token usage is produced, and everything is packed into a ZIP archive that is offered for download in the interface.
Role of the language models¶
The application uses two model endpoints with clearly separated roles.
- Fast model (Phase 2) — One call per analysed file, parallelised via a configurable semaphore. The expected response is structured JSON with purpose, public API, and notable properties. This phase produces between a few dozen and a few hundred calls per run.
- Thinking model (Phase 3) — A small number of calls per run, each with a longer response length for prose sections and the Mermaid diagram. The non-standard
enable_thinkingparameter is forwarded viaextra_bodyto the OpenAI-compatible interface.
Both call paths go through the same wrapper, which strips <think> blocks, extracts JSON robustly, retries on failure with exponential backoff, and counts calls and tokens per role.
Concurrency and robustness¶
The pipeline is implemented asynchronously throughout. The file analysis in Phase 2 runs in parallel via asyncio.gather, bounded by a semaphore (default value adjustable in the interface). Model responses are parsed defensively: JSON wrapped in Markdown fences is unwrapped automatically, and embedded JSON is extracted via a regular expression. Errors in individual files cause the affected analysis to be marked with status error without aborting the overall run. Missing or empty model responses are logged; endpoints are deduplicated by the (method, path) tuple.
Configuration and deployment¶
Configuration is provided via environment variables or via the interface; both model endpoints are configured separately (URL, model name, optional API key, thinking mode). Bind address, port, and reverse proxy path prefix are also set via environment variables. The application is operated as a single container (base image python:3.11-slim) and exposes port 7870 by default. Temporary working directories are created per run and cleaned up by the operating system afterwards.
Extensibility¶
A new framework is integrated by adding an extractor class with three required methods (name, language, extract_endpoints), an entry in the extractor registry, and a detection signal in the framework detector. The rest of the pipeline and the interface remain unchanged.
Technology overview¶
- Language and runtime — Python 3.11
- Interface — Gradio (
gr.Blocksinterface, asynchronous event handlers) - Language model integration —
openaiasync client against OpenAI-compatible endpoints;extra_bodyfor non-standard parameters such asenable_thinking - HTTP client —
httpxfor GitLab REST API access (streaming download) - Data formats — YAML (PyYAML) and TOML (
tomllib/tomli) for Phase 1 parsers; JSON for model responses; Markdown as output - Diagrams — Mermaid, embedded as a code block
- Templating — Jinja2 for document building blocks
- Containerisation — Dockerfile based on
python:3.11-slim, exposed port 7870