LLM-supported development of a diagram generator: Insights into code validation#

Project context#

The AI diagram generator was created as a learning project to explore multi-agent architectures in LLM-supported software development. The primary learning objective was to increase the reliability of LLM-generated syntax from an initial 70% to nearly 100%—specifically for Mermaid diagram code. A secondary practical reason was the need for reliably generated diagrams.

The finished tool comprises 13,000 lines of code (12,400 lines of Python in 27 files) and supports 12 diagram types with over 40 templates. Development took place in short sessions (10-15 minutes) over two weeks, with a total effort of about 6 hours.

Technical implementation#

The architecture is based on a multi-agent system with specialized components:

  • ChatAgent: Analyzes user requests and coordinates diagram generation
  • DiagramAgent: Generates Mermaid code based on structured data
  • ValidationAgent: Checks and corrects generated code through iterative LLM-supported repair

The Gradio-based UI is divided into four main areas: chat interface, data panel for structured input, live code editor with undo function, and diagram gallery. This modularization arose organically during development and proved conducive to LLM collaboration.

A central technical element is the validation pipeline: Pre-validation → Rendering test → LLM correction (max. 3 attempts). If this fails, there is a fallback to predefined templates.

Five-phase development process#

Development was carried out by local LLMs in a structured process:

  1. Specification phase (90 min): Intensive architecture discussion with the LLM, pattern selection based on KISS principles
  2. Basic implementation (30 min): Initial implementation of core components
  3. ValidationAgent integration (30-45 min): Addition of automatic code validation
  4. Pattern integration (30-45 min): Extended error correction patterns
  5. Mermaid syntax reference (30-45 min): Complete syntax specification in the system prompt

Each phase involved a major iteration. The preliminary architecture discussion and step-by-step refinement over four rounds proved to be more effective than a big bang approach.

Key methodological insights#

Proven approaches:

  • Multi-agent architectures seem well suited for LLM-assisted development as they create clear responsibilities
  • Syntax-intensive tasks benefit significantly from complete references in the system prompt
  • Iterative validation loops with LLM correction can significantly improve code quality (here: from ~70% to 95% validity)
  • Clear specifications and explicit pattern guidelines lead to more reliable results

Remaining challenges:

  • Subsequent correction of faulty LLM code remains challenging and is not always successful
  • Intent recognition from natural language queries does not yet work reliably enough for this tool
  • The fundamental code quality depends heavily on the reliability of the LLM used

Important limitation: The findings come from a single project and should not be understood as universally valid.

Practical testing#

The tool is currently being tested and used occasionally. After four iterations, the success rate is around 95%—most of the generated diagrams are valid. The integrated code editor allows for easy manual adjustments where necessary. However, Mermaid does not cover all diagram requirements, and intent recognition still needs improvement. Findings from this project were incorporated into a follow-up project (“Chart Tool”) that further develops intent processing.

Conclusion#

The project shows that systematic validation and syntax integration can significantly increase the reliability of LLM-generated syntax. The multi-agent architecture proved to be a structuring framework for LLM collaboration. At the same time, subsequent code correction remains a key challenge that has not been fully resolved. The tool functions as a learning vehicle and occasional aid, but has not yet reached production maturity for all use cases.