CodeDocumentation¶
CodeDocumentation generates a complete set of Markdown documents from source code repositories. The output is a first draft that is committed to the project repository and refined manually from there. The application processes Python and PHP projects supplied as a local directory, a ZIP file, or via the GitLab API, and combines deterministic code analysis with two separate language model stages: a fast model for parallel per-file analysis and a thinking model for the cohesive generation of the documents.
At a glance¶
- Generate a complete documentation set in a single run, including README, architecture, API reference, configuration and installation guide, and highlights
- Provide the source code as a local directory, an uploaded ZIP archive, or directly from GitLab, without checking out the code locally
- Cover multi-language projects (Python and PHP, individually or combined) in a single coherent document set
- Capture endpoints and routes from FastAPI, Flask, Laravel, and Symfony codebases automatically, with method, path, parameters, and the associated handler
- Connect to two separately configurable language model endpoints for analysis and generation, including endpoints with thinking mode
- Preview generated documents in the interface as rendered Markdown and download them as a single ZIP archive
- Trace each run via an accompanying generation report with runtime, model calls, and token usage
Highlights¶
In contrast to a direct LLM prompt over a codebase, CodeDocumentation separates deterministic structural analysis, parallelised per-file evaluation, and narrative text generation into a multi-stage pipeline. This separation improves the accuracy of the captured structures, keeps runtime manageable for medium-sized projects, and allows different models to be selected for different tasks.
- Three-stage pipeline instead of a monolithic prompt — A deterministic inspection phase captures languages, frameworks, dependencies, and configurations without invoking a model. Only the subsequent phases call language models. As a result, basic factual statements do not depend on a model output.
- Two separately configurable language models — A fast model processes file analyses in parallel; a slower model with thinking mode generates the prose sections of the documents. Both endpoints are configured independently; their interaction reduces overall runtime considerably compared to a thinking-only run.
- Framework-specific extractors — Dedicated extractors for FastAPI, Flask, Laravel, and Symfony produce structured endpoint lists including method, path, path parameters, and handler. Other Python and PHP projects are covered by a generic regex-based extractor.
- Multi-language projects as the standard case — Codebases with mixed stacks such as Flask plus Laravel are handled in a single coherent document set. Languages and frameworks are weighted, and endpoints from multiple sources are merged.
- Three input sources — Local directory, ZIP upload, and GitLab repository via REST API, including the choice of branch or tag.
- Session-bound token handling — GitLab personal access tokens remain exclusively in the interface session state. They are not written to disk, not included in logs, and not embedded in the generated documents.
- Endpoint deduplication and merging — Endpoints originating from deterministic extraction and from model analysis are deduplicated by method and path and grouped in a single API catalog.
- Mermaid diagrams without manual rework — Architecture and component overviews are embedded directly as Mermaid code in the documents and rendered in the preview.
- Extensibility via a documented interface — New frameworks are integrated by adding a new extractor class and a detection entry, without changes to the rest of the pipeline.
- Traceable runs — Each generation run additionally produces a report with runtime, number of model calls per model role, and tokens consumed.