280 lines of code, several hundred uses per day: Getting started with LLM coding#

What happens when you tackle a practical problem with LLM-based coding for the first time? An experiment to develop a document summarisation tool almost a year ago marks the starting point of a methodical learning journey from simple to increasingly complex LLM-supported programmes.

The starting point: first experiment, concrete need#

The practical need was clear: quick summaries of arXiv papers according to different questions, even for large documents. At the same time, this was the first conscious attempt to systematically explore LLM-assisted coding. Three key challenges needed to be understood: dealing with context constraints, avoiding hallucinations, and enforcing prompt following.

This first experiment is the beginning of a series documenting the path from simple tools to increasingly complex programmes. The basic principles developed here formed the foundation for all subsequent projects.

The technical solution#

The resulting tool processes PDF, DOCX, and ODT files using a two-stage chunking strategy: documents are broken down into 10,000-character segments, summarised individually, and then synthesised into a meta-summary. The implementation comprises 280 lines of Python code with Gradio as the UI framework and Mistral Small 2506 (32K context) as the backend LLM.

A critical success factor was pymupdf4llm, which converts PDF content directly into LLM-friendly Markdown. In addition, the tool explicitly extracts structural elements such as headings, tables and captions with tags, which significantly improves the identification of key content.

Development process and iterations#

Development took place over several weeks, with a total of 5-6 hours of work. Noteworthy was the short specification phase of only 30 minutes, followed by iterative co-development with various LLMs. The architecture required 4-5 main iterations, primarily to optimise the chunking strategy.

Overengineering was avoided through incremental feature adding: each function was added separately instead of specifying a complex overall architecture in advance.

Key methodological insights#

Architecture ownership remained human: The architecture proposals of the LLMs were not very usable. Coding with LLMs worked well in this project in small, narrowly defined steps, while strategic architecture decisions required human expertise.

Prompt engineering for production environments: Two techniques proved particularly effective:

Capitalisation in prompts significantly reinforced critical instructions, especially when the prompt-to-content ratio was unfavourable.
Explicit prompts for additional instructions with double reinforcement: ‘If there is not enough information and nothing can be described, do not write anything and do not add anything.’

The chunking strategy: The two-stage summary (chunk → meta) worked surprisingly well, despite its simplicity. Chunk size proved to be a critical parameter: reducing it improved quality but increased processing time (sometimes several minutes for large documents).

A key insight: An important workflow change was the observation that more time was invested in preliminary specification and architecture definition after it became apparent that leaving these decisions to the iterative process was less effective.

Possibilities and practical use#

The tool is used several hundred times a day in the VPN. The feedback is positive, but not excellent. Challenges arise with:

Graphically complex PDFs where text extraction reaches its limits
The balance between quality (smaller chunks) and performance (longer processing time)
The fundamental LLM limitations: context restrictions, occasional hallucinations, insufficient prompt following

The surprising simplicity#

What is most surprising is that such a compact code (280 lines, 10 functions, 7 core libraries) generates high-quality and helpful summaries with six different analysis modes, from short versions to critical reflections. The solution is far simpler than initially expected, but functionally sufficient for intensive productive use.

Historical classification and series context#

This tool was created over a year ago as the first systematic LLM coding experiment. Since then, all the technologies involved have evolved considerably: LLMs with larger contexts, better libraries, more sophisticated prompt engineering techniques. The findings documented here should therefore be understood in their historical context.

However, within the scope of this series of articles, this project marks an important starting point: from here, the path to increasingly complex programmes and refined methods developed. The fundamental principles – architecture ownership, simplicity, incremental approach – remained relevant in later, more complex projects.

Observations on the approach#

This experiment shows that LLM-assisted coding can be highly efficient in clearly defined contexts. In this specific case, the following aspects proved helpful:

Planning the architecture ourselves and using LLMs for narrowly defined implementation steps
Choosing libraries that already take LLM integration into account (such as pymupdf4llm)
Understanding prompt engineering as a critical part of the system architecture
Prioritising simplicity over premature optimisation

A key observation from this first experiment: LLM coding enabled the rapid development of a practical tool, with responsibility for strategic technical decisions remaining with humans and work being carried out iteratively in small steps. These basic principles formed the foundation for all subsequent, more complex projects in the series. Further experiments will show whether these observations can be transferred to other contexts.

Part 1 of the series ‘Methodological insights from LLM coding experiments’ – documenting a learning journey from simple tools to complex programmes. This first experiment took place almost a year ago; since then, the technologies have evolved considerably, but the basic methodological principles remain relevant.