LLM-supported data visualisation: code generation as a solution approach#

Context and motivation#

Large language models (LLMs) have known weaknesses when dealing with tabular data and numerical calculations. This experiment investigated an alternative approach: instead of letting the LLM work directly with numbers, it generates Python code for data analysis and visualisation, which is executed in a secure environment.

The functional question was pragmatic: Can a tool be developed that provides quick and easy support in selecting suitable graphics for data and creates simple visualisations largely automatically? The project served as a learning vehicle to explore the feasibility of a multi-agent system with code generation.

The tool: Interactive Chart Generator#

The developed chart generator processes CSV and Excel files (including multi-sheet support) and enables the creation of interactive Plotly visualisations via natural language chat requests. For example, a user can enter: ‘Create bar charts for all sheets’ or ‘Colour the bars green for positive values’.

The system is based on a web interface (Gradio) and uses local LLMs via an OpenAI-compatible API. The architecture comprises ~8,500 lines of code in 27 Python files and was developed entirely with LLM support.

Technical architecture#

The system architecture developed iteratively in three main stages:

Version 1: A chat agent and a chart agent worked directly together. This approach proved to be too unstable – the system could not reliably distinguish between discussion and execution requests.

Version 2: Introduction of intent recognition. The IntentService analyses user requests and classifies them (modification, single chart, multiple charts, analysis). This significantly improved accuracy.

Version 3 (Final): A three-step workflow: Intent → Plan → Execute. The PlanService translates recognised intents into detailed execution plans with concrete chart specifications. The ExecutionService coordinates execution with retry logic and error correction.

Code execution takes place in a controlled environment with restricted built-ins (without __import__) and predefined safe globals for Pandas, Plotly and NumPy.

Critical development decision: pattern libraries#

A key insight was that pure LLM code generation for chart creation did not work reliably enough. The solution: integration of extensive pattern libraries with 44 code patterns.

These include:

23 working implementation patterns (e.g. PATTERN_SIMPLE_BAR, PATTERN_TIME_SERIES_GROUPED)
7 anti-patterns to avoid unstable constructs (e.g. ANTI_PATTERN_COMPLEX_CATEGORICAL)
9 modification patterns for chart adjustments
5 semantic and selection patterns

The LLM selects and adapts these patterns based on data types and user queries. This hybrid approach (LLM intelligence + templates) proved to be significantly more robust than pure generation.

Methodological findings#

1. Specification-driven development#

The development process was highly specification-driven: for each iteration stage, a comprehensive specification was created (~1 hour), implemented (~1 hour), and adapted. The total development time was approximately 3 hours plus 30 minutes for Docker deployment.

High-quality, detailed specifications proved to be essential. Small-scale micro-prompting led to fragmented, inconsistent development. The specifications included: technical architecture, UI design, component interactions, and functional goals.

2. Intent→Plan→Execute as a proven pattern#

The separation of intent recognition, planning strategy, and execution enabled:

Clearer distinction between chat discussion and action requests
Better testability of individual components
More targeted error handling at each stage

3. LLM control vs. heuristics#

Despite imperfect reliability, LLM-controlled processing proved superior to rule-based heuristics. Heuristics only work for very specific cases; LLMs generalise better across different formulations.

4. Retry logic and self-correction#

The implemented fix_code() method enables LLM-based error correction. In the event of code errors, the error message is returned to the LLM, which generates a corrected code. This reduced the error rate moderately, but did not achieve 100% reliability.

5. Examples compensate for model weaknesses#

The local LLMs (HU models) used can handle large contexts, but show weaknesses in code generation. The extensive pattern libraries successfully compensated for this and significantly improved performance.

Challenges and limitations#

Multi-sheet handling: Distinguishing between requests for single vs. multiple graphics was initially problematic. Intent recognition with explicit trigger words (‘all sheets’, ‘for each’) solved this.

Semantic colour mapping: Natural language colour specifications such as ‘green for positive values’ required additional logic (SemanticColorHelper) to recognise ordinal scales (Likert scales) and category-based colour mappings.

Reliability: It remains difficult to implement solutions that work 100% of the time. The current implementation works reliably in the majority of cases, but not universally.

Status and outlook#

The tool is currently in the testing phase with a wider user base. Initial tests with various Excel files (including complex multi-sheet structures) were successful. Further feedback is needed to identify edge cases and further improve robustness.

The key transferable insights are: multi-agent architectures with clear phase separation work well for complex tasks; specification-driven development is superior to micro-iterative approaches; pattern libraries significantly stabilise code generation; and code indirection is a promising approach for LLM limitations in numerical tasks.