Features¶

CodeDocumentation covers the entire path from source code submission to the downloaded Markdown bundle. Three input sources are available; the subsequent analysis captures languages, frameworks, endpoints, configuration keys, and dependencies, and produces six coherent documents along with an accompanying generation report.

Application scenarios¶

Initial documentation of an existing project — A previously undocumented codebase receives a complete Markdown baseline with architecture overview, API reference, and installation guide, which can then be refined manually.
Standardisation across multiple projects — Several projects receive uniformly structured documentation with the same file layout, the same diagram convention, and the same tone.
Onboarding of new staff members — Architecture and API overviews serve as an entry point into an unfamiliar codebase, without the team having to write a complete introductory documentation from scratch.
Periodic regeneration as a baseline — After significant code changes, the generated version is regenerated and compared with the manually maintained version to identify outdated sections.
Migration of a GitLab codebase — A repository is loaded directly from GitLab without a local checkout, analysed, and converted into a searchable Markdown document set.

At a glance¶

Three input sources: local directory, ZIP upload, GitLab API
Deep support for four frameworks; generic fallback for other Python and PHP codebases
Six output documents plus a generation report, individually selectable
Two separately configurable LLM endpoints with optional thinking mode
Test files excluded by default, with opt-in toggle
Preview of individual documents in the interface, both rendered and as source
ZIP bundle as a single download with a date-stamped filename

Input sources and authentication¶

Three sources are accessed via a shared source loader component. The source is switched in the interface; input fields adapt to the selection.

Local directory — A project that has already been checked out is analysed directly from the file system. Suitable for codebases that are already available on the same system.
ZIP upload — A ZIP file is extracted into a temporary working directory. If the archive contains a single root directory, that directory is automatically used as the project root.
GitLab repository — A project is loaded via the GitLab REST API, both from gitlab.com and from self-hosted instances. Input is provided as a path (group/project) or as a numeric ID; branch or tag are optional. The personal access token with the read_repository scope remains exclusively in the session state, is not persisted, and can be cleared from the interface at any time.

Language model integration¶

CodeDocumentation connects to two OpenAI-compatible endpoints that are configured separately.

Fast model — Applied to individual files in parallel during the analysis phase, typically with thinking mode disabled.
Thinking model — Used in the generation phase for the prose sections and the project characterisation.
Connection test — Both endpoints can be tested individually before a run, in order to detect configuration and authentication errors early.
Thinking extensions — The non-standard enable_thinking parameter is forwarded to the OpenAI interface via extra_body. Endpoints without support for the parameter ignore or reject it; in that case, the mode can be disabled via a checkbox.

Captured structures¶

The analysis distinguishes between deterministic inspection and model-based evaluation.

Languages and frameworks — Languages are weighted by file extension and lines of code; frameworks are detected via dependency files and source code signals.
Endpoints and routes — FastAPI, Flask, Laravel, and Symfony are evaluated by dedicated extractors, including path parameters, methods, and tags. For other codebases, a generic regex-based extractor provides baseline coverage.
Configuration keys — Environment variables from .env files, configuration files (e.g. config/*.yml), and Dockerfile ENV instructions are captured.
Dependencies — composer.json/composer.lock, requirements.txt, pyproject.toml, and Poetry manifests are evaluated.
Build and runtime environment — Dockerfile (base image, ports, entrypoint, env), Docker Compose services, Makefile targets, and CI configurations for GitLab CI and GitHub Actions are parsed.
Per file — Purpose, public classes and functions, notable properties, and an importance score are determined with model support.

Generated documents¶

Each run produces up to six documents plus a generation report.

README — Overview, description, key features, quick start, tech stack, and links to the other documents.
Architecture — Overview, Mermaid component diagram, module structure as a table, external dependencies.
API reference — Endpoint catalog grouped by resource, with method, path, parameters, and schemas where they can be derived from the code.
Configuration — Table of environment variables, list of configuration files, Dockerfile variables.
Installation — Prerequisites, installation steps, Docker variant, CI overview.
Highlights — Notable technology choices, unusual implementations, and a compact set of project metrics.
Generation report — Runtime, model calls, and token usage per model role, detected languages and frameworks, recommendations for manual review.

Individual documents can be deselected before a run.

Quality assurance mechanisms¶

Combination of deterministic and model-based sources — Endpoints from framework-specific extractors and those from model analyses are merged and deduplicated by method and path.
Importance scoring — Each file is assigned a score that drives the ordering in the generated documents and the selection of files used for project characterisation.
Package-level aggregation — File analyses are condensed into package overviews with file count, lines of code, public API, and endpoints per package.
JSON robustness — Model responses are parsed even when wrapped in Markdown code fences or embedded in surrounding text; on failure, an empty result is returned rather than passing on malformed data.
Thinking block handling — <think> sections from thinking models are removed before further processing, including truncated, incomplete blocks.
Retry logic — LLM calls are retried up to three times on failure with exponential backoff; token usage is tracked separately per model role.
Bounded concurrency — The number of concurrent model calls during the analysis phase is bounded by a semaphore and can be adjusted in the interface.
Exclusions and filters — Generic exclusion patterns (e.g. virtual environments, build artefacts, cache directories) and test files are skipped by default; additional glob patterns can be specified. .gitignore entries are evaluated as well.

Additional features¶

Preview in the interface — Generated files can be viewed as rendered Markdown or as source, without downloading the ZIP first.
Date-stamped ZIP bundle — The download contains all enabled documents and the report under a date-stamped filename.
Configurable bind address and port — Bind address, port, and reverse proxy path prefix are set via environment variables.