Features¶
CodeDocumentation covers the entire path from source code submission to the downloaded Markdown bundle. Three input sources are available; the subsequent analysis captures languages, frameworks, endpoints, configuration keys, and dependencies, and produces six coherent documents along with an accompanying generation report.
Application scenarios¶
- Initial documentation of an existing project — A previously undocumented codebase receives a complete Markdown baseline with architecture overview, API reference, and installation guide, which can then be refined manually.
- Standardisation across multiple projects — Several projects receive uniformly structured documentation with the same file layout, the same diagram convention, and the same tone.
- Onboarding of new staff members — Architecture and API overviews serve as an entry point into an unfamiliar codebase, without the team having to write a complete introductory documentation from scratch.
- Periodic regeneration as a baseline — After significant code changes, the generated version is regenerated and compared with the manually maintained version to identify outdated sections.
- Migration of a GitLab codebase — A repository is loaded directly from GitLab without a local checkout, analysed, and converted into a searchable Markdown document set.
At a glance¶
- Three input sources: local directory, ZIP upload, GitLab API
- Deep support for four frameworks; generic fallback for other Python and PHP codebases
- Six output documents plus a generation report, individually selectable
- Two separately configurable LLM endpoints with optional thinking mode
- Test files excluded by default, with opt-in toggle
- Preview of individual documents in the interface, both rendered and as source
- ZIP bundle as a single download with a date-stamped filename
Input sources and authentication¶
Three sources are accessed via a shared source loader component. The source is switched in the interface; input fields adapt to the selection.
- Local directory — A project that has already been checked out is analysed directly from the file system. Suitable for codebases that are already available on the same system.
- ZIP upload — A ZIP file is extracted into a temporary working directory. If the archive contains a single root directory, that directory is automatically used as the project root.
- GitLab repository — A project is loaded via the GitLab REST API, both from gitlab.com and from self-hosted instances. Input is provided as a path (group/project) or as a numeric ID; branch or tag are optional. The personal access token with the
read_repositoryscope remains exclusively in the session state, is not persisted, and can be cleared from the interface at any time.
Language model integration¶
CodeDocumentation connects to two OpenAI-compatible endpoints that are configured separately.
- Fast model — Applied to individual files in parallel during the analysis phase, typically with thinking mode disabled.
- Thinking model — Used in the generation phase for the prose sections and the project characterisation.
- Connection test — Both endpoints can be tested individually before a run, in order to detect configuration and authentication errors early.
- Thinking extensions — The non-standard
enable_thinkingparameter is forwarded to the OpenAI interface viaextra_body. Endpoints without support for the parameter ignore or reject it; in that case, the mode can be disabled via a checkbox.
Captured structures¶
The analysis distinguishes between deterministic inspection and model-based evaluation.
- Languages and frameworks — Languages are weighted by file extension and lines of code; frameworks are detected via dependency files and source code signals.
- Endpoints and routes — FastAPI, Flask, Laravel, and Symfony are evaluated by dedicated extractors, including path parameters, methods, and tags. For other codebases, a generic regex-based extractor provides baseline coverage.
- Configuration keys — Environment variables from
.envfiles, configuration files (e.g.config/*.yml), and DockerfileENVinstructions are captured. - Dependencies —
composer.json/composer.lock,requirements.txt,pyproject.toml, and Poetry manifests are evaluated. - Build and runtime environment — Dockerfile (base image, ports, entrypoint, env), Docker Compose services, Makefile targets, and CI configurations for GitLab CI and GitHub Actions are parsed.
- Per file — Purpose, public classes and functions, notable properties, and an importance score are determined with model support.
Generated documents¶
Each run produces up to six documents plus a generation report.
- README — Overview, description, key features, quick start, tech stack, and links to the other documents.
- Architecture — Overview, Mermaid component diagram, module structure as a table, external dependencies.
- API reference — Endpoint catalog grouped by resource, with method, path, parameters, and schemas where they can be derived from the code.
- Configuration — Table of environment variables, list of configuration files, Dockerfile variables.
- Installation — Prerequisites, installation steps, Docker variant, CI overview.
- Highlights — Notable technology choices, unusual implementations, and a compact set of project metrics.
- Generation report — Runtime, model calls, and token usage per model role, detected languages and frameworks, recommendations for manual review.
Individual documents can be deselected before a run.
Quality assurance mechanisms¶
- Combination of deterministic and model-based sources — Endpoints from framework-specific extractors and those from model analyses are merged and deduplicated by method and path.
- Importance scoring — Each file is assigned a score that drives the ordering in the generated documents and the selection of files used for project characterisation.
- Package-level aggregation — File analyses are condensed into package overviews with file count, lines of code, public API, and endpoints per package.
- JSON robustness — Model responses are parsed even when wrapped in Markdown code fences or embedded in surrounding text; on failure, an empty result is returned rather than passing on malformed data.
- Thinking block handling —
<think>sections from thinking models are removed before further processing, including truncated, incomplete blocks. - Retry logic — LLM calls are retried up to three times on failure with exponential backoff; token usage is tracked separately per model role.
- Bounded concurrency — The number of concurrent model calls during the analysis phase is bounded by a semaphore and can be adjusted in the interface.
- Exclusions and filters — Generic exclusion patterns (e.g. virtual environments, build artefacts, cache directories) and test files are skipped by default; additional glob patterns can be specified.
.gitignoreentries are evaluated as well.
Additional features¶
- Preview in the interface — Generated files can be viewed as rendered Markdown or as source, without downloading the ZIP first.
- Date-stamped ZIP bundle — The download contains all enabled documents and the report under a date-stamped filename.
- Configurable bind address and port — Bind address, port, and reverse proxy path prefix are set via environment variables.