Exploration of an LLM chat interface: findings from the specification phase#

Experiences in developing a chat interface for local large language models.

The experiment#

The project aimed to develop a simple chat interface that offers a few convenience features without persisting data. The central question was: What value-added features can be implemented in such an interface? The focus was on exploring a history feature with automatic conversation titling – a feature that proved to be an interesting learning field.

The technical basis was provided by Gradio and an OpenAI-compatible API, technologies with which we already had previous experience. The architecture comprised three main classes (ChatSession, StreamingChat, ChatInterface) and was deliberately designed to be modular in order to facilitate later adjustments.

The role of the specification#

Approximately 45 minutes were invested in developing a detailed specification in which functionalities and their technical implementations were clearly coordinated. This investment proved valuable: after the complete specification was handed over to the LLM, the implementation proceeded largely directly and also took about 45 minutes.

The development required only two iterations – an initial implementation and a second one to fix a specific problem and improve the history function. This efficiency seems to be directly related to the quality of the specification.

Technical challenges#

Automatic title generation for conversations proved to be a functional necessity: without meaningful titles, it was not possible to distinguish between entries in the history in a meaningful way. The solution – the LLM generates titles for new conversations itself during runtime – worked well in practice and demonstrated an approach for nested LLM calls.

An unexpected challenge arose from a Gradio-specific problem: a visual ‘flashing’ when sending messages, which proved to be a difficult bug to identify and was not well described in the documentation. The solution required a two-step process with state management to clear the input area before the actual message processing begins.

The result#

The finished interface comprises approximately 800 lines of Python code in two files and offers eleven main functions: from streaming responses and various system prompt presets to a template system for frequent queries. The total development time of 90 minutes over two days seems efficient in relation to the range of functions.

The tool is currently in a broader testing phase and, according to initial observations, is functioning stably. If the tests are successful, productive use will be possible.

Transferable observations#

Some insights from this project could be relevant for similar projects:

Regarding the specification phase: Clear coordination between the desired functionalities and their technical implementation in the specification seems to have a direct influence on the implementation process. The 45 minutes invested resulted in an implementation that largely did not require extensive reworking.

On overengineering: The tendency of LLMs to suggest more complex solutions than necessary can possibly be curbed by precise specifications. In this project, there were no situations in which deliberate simplification was necessary.

On modularisation: Outsourcing the prompt configuration to a separate file proved to be maintenance-friendly and facilitated later adjustments.

Classification and outlook#

This project represents a data point in the ongoing exploration of LLM-assisted coding. The question of how much investment in the specification phase is worthwhile for different types of projects remains an interesting one.

This article is part of a series on methodological insights from LLM-supported development projects. The focus is on reproducible observations for the professional community, not on promoting individual tools.