LLM-Enabled Development - Developing a Tool with LLMs#

Code Analysis#

The experiment: Developing a code analysis tool for Java projects with realistic complexity.

The result: In 2-3 hours, 5000-6000 lines of code were created, which were used to analyse a project with 1.2 million lines.

The questions#

  • How does the interplay between specification and code generation work as complexity increases?
  • Which architecture patterns are necessary?
  • Where are the practical limits?
  • How do you deal with hallucinations and overengineering tendencies?
  • How can the results of such an analysis be presented in a readable way?

The technical approach#

The tool developed is based on a hierarchical analysis approach that takes into account the limitations of LLM contexts:

  • File level: Detailed analysis of individual files
  • Package level: Aggregation at package level
  • Module level: Module-based analysis
  • System level: Overall architecture

Four specialised LLM analyses were performed for each Java file: business logic, technical aspects, interfaces and issue detection. This division developed iteratively, as a single prompt was not sufficiently focused.

The technical implementation was carried out with Python, Gradio and Mistral Small 2506 as the local LLM. The decision to use a local model was necessary in order to be able to execute the large number of LLM queries without API limits. The final tool comprises 5000-6000 lines of code in 6 Python files.

The development process#

Specification phase (1 hour): Requirements and constraints were developed in close collaboration with various LLMs. The dialogue was deliberately designed so that the LLM could ask questions and request clarifications. A key aspect was actively avoiding overengineering (KISS principle: Keep It Small and Simple), as LLMs tend to suggest rather complex solutions.

Implementation phase (2 hours): Code generation took place in 4-5 iterations without subsequent refactoring. The workflow followed a clear pattern: functional requirements → UI structure → technology selection → architecture. Technical framework conditions were partly specified and partly developed through discussion.

Key findings#

Specification as a success factor: LLM-supported development works via specifications. The clearer, more consistent and more complete the specification, the better the generated code. This clarity can be achieved in dialogue with the LLM, as the LLM can uncover gaps by asking specific questions.

Observed limitations: LLMs reliably generated code up to about 1000-1500 lines per component. Beyond that, the quality declined. The process also required a thorough understanding of software architecture – not to code yourself, but to specify realistic and controllable solutions.

Overengineering problem: LLMs often overestimate their development capabilities and propose overly complex solutions. Consistent application of the KISS principle (Keep It Small and Simple) proved helpful.

Changed workflow: The role of developers is shifting: requirement clarification and architecture decisions are becoming core skills, while implementation details can be delegated. This enabled subject matter experts without in-depth programming knowledge to develop specialised tools for their domain.

Technical validation#

The tool was applied to a real Java project with approximately 1.2 million lines of code (including XML and XSD). The analysis worked:

  • Code ran completely without crashes
  • Project structure with architectural style was recognised
  • Package hierarchy with layer assignment was generated
  • REST and SOAP interfaces were catalogued
  • Issues were categorised

The LLM-generated descriptions were linguistically readable, and the recognised architectural patterns appeared plausible. Detailed technical validation was not the goal – the focus was on methodological insights, not productive software.

Conclusion: Insights from the project#

The experiment exceeded expectations: a tool developed in three hours delivered practical results. This demonstrates the potential of LLM-based development for exploratory development and rapid prototyping.

Important clarification: In professional software development, a critical production tool would not be developed in three hours – that is precisely why this was an experiment. The question was: how far can you actually get? The answer: further than expected, but with clear limits.

The approach is suitable for prototypes, analysis tools, internal utilities, feasibility studies and exploratory development, but not so much for production systems and safety-critical components.

A sound understanding of software architecture remains a prerequisite – not for coding yourself, but for specifying realistic solutions. LLM-supported development expands the possibilities for subject matter experts without in-depth programming knowledge, but it does not replace professional software development for critical systems; rather, it complements it with a new dimension of exploration.

Technical details#

  • Technology stack: Python, Gradio, Mistral Small 2506 (local)
  • Architecture: Asynchronous processing with semaphore-based rate limiting
  • Data storage: YAML-based storage for further LLM processing
  • Time required: 3 hours (1 hour specification + 2 hours implementation), spread over several days
  • Code size: 5000-6000 LOC in 6 files
  • Project analysed: 1.2 million LOC