Development of an LLM Chat Interface: Insights from an Exploratory Project#

Introduction and Context#

This article documents observations and insights from the development of a chat interface for local large language models. The focus was less on the specific functions of the developed tool and more on transferable insights about the development process itself.

The documentation is aimed at people who are carrying out similar projects or want to integrate LLM-supported coding into their workflows. The focus is on methodological observations, technical details and a reflection on the process.

Initial situation and objectives#

Motivation for the experiment#

The initial question for this project was: What convenience features can be implemented in a chat interface that does not persistently store data? This question offered several interesting aspects for a learning project:

Technically manageable complexity with relevant challenges
Clear functional requirements without extensive domain logic
Potential for real-world reuse in a university context
Opportunity to explore different architectural approaches

From the outset, the project was designed as an experiment, with subsequent productive use as a possible but not mandatory option.

Choice of technology and prior knowledge#

The decision to use Gradio as the front-end framework and an OpenAI-compatible API as the back-end interface was based on existing experience with both technologies. This choice allowed us to focus on the conceptual and methodological aspects of LLM-supported development instead of spending time familiarising ourselves with new frameworks.

Gradio offered specific advantages for the learning objective:

Rapid prototyping capabilities
Built-in components for chat interfaces
Relatively low abstraction, which makes it easier to understand the underlying mechanisms

The OpenAI-compatible API specification allowed flexibility in the choice of the actual LLM backend used.

Technical architecture and implementation#

Structural design#

The application was structured into three main classes, each with specific responsibilities:

ChatSession: Manages a single conversation with its messages, a title and metadata. This class encapsulates the data structure of a session and provides methods for formatting for various purposes (API calls, chatbot display).

StreamingChat: Handles technical communication with the LLM backend. This is where the actual streaming of responses takes place, including error handling and the ability to stop generation. An interesting feature: this class also performs automatic title generation – a nested LLM call within the chat logic.

ChatInterface: Orchestrates the entire system and connects the UI components with the business logic. This is where session management is coordinated, the sidebar is controlled, and the various user interactions are processed.

This division was planned during the specification phase and proved to be viable for the entire development. There was no need for fundamental architectural changes during implementation.

Core functionalities#

The finished interface comprises eleven main functions:

Chat with local LLM via OpenAI-compatible API
Streaming responses with the option to cancel
System prompt editor with four presets (standard HU, tutor, inspirational, blank)
Multiline input with integrated buttons
Mode selector with four modes (balanced, creative, precise, short)
History sidebar with session-based management
Clear function for new conversations
Template system with nine pre-built prompts
Automatic title generation for conversations
Dark/Light mode toggle
Modular prompt configuration in a separate file

The modularisation of the prompt configuration into a separate prompts.py file arose from considerations of maintainability. Instead of having system prompts, modes and templates in the main code, this structure allows for easier customisation without having to touch the core code.

The history function as a central learning field#

The history with automatic titling was deliberately chosen as the primary learning field for this project. The challenge was to create a clear display of past conversations without using a database or persistent storage.

The solution: session-based management in memory, where each session is identified by a UUID. However, it was clear that UUIDs or timestamps alone would not suffice for display in the sidebar – users must be able to distinguish conversations at a glance.

Automatic title generation proved to be an elegant solution: after the first exchange of messages, the LLM itself generates a short, descriptive title in the format ‘action word: main topic’ (e.g. “Explain: quantum physics‘ or ’Analyse: Shakespeare’s Hamlet”). This nesting – an LLM call during the runtime of an LLM chat interface – worked surprisingly smoothly and required minimal prompt engineering effort.

The development process#

The specification phase#

The specification phase took about 45 minutes and focused on clearly aligning the desired functionalities with their technical implementation. Specifically, this meant:

Defining the features with precise functional descriptions
Determining the architecture components and their responsibilities
Clarifying technical details such as state management and event handling
Specification of data structures and API interactions

The specification was not handed over to the LLM in several steps, but as a complete document. This enabled a consistent overall view and avoided potential inconsistencies that can arise during step-by-step development.

An important aspect of this phase was the conscious decision against excessive complexity. The specification did not contain any elaborate patterns or complex abstractions, but focused on a direct, understandable implementation. This seems to have played a central role in avoiding over-engineering.

Implementation and iteration#

After the specification was handed over, the initial implementation proceeded largely directly. The LLM generated working code that implemented the specified features. The total implementation time was also around 45 minutes, resulting in a 1:1 ratio between specification and implementation.

The project required a total of two iterations:

First iteration: Initial implementation of all features according to the specification. This version was basically functional, but still had one specific problem.

Second iteration: Fixing the ‘anti-blink’ problem (see below) and improvements to the history function. This iteration also included fine-tuning the title generation and optimising the UI performance.

The development was spread over two days, which provided an opportunity for reflection and testing between sessions.

Technical challenges and solutions#

The ‘anti-blink’ problem#

An unexpected challenge arose due to a Gradio-specific behaviour: when sending a message, the input field ‘blinked’ – it briefly emptied, then displayed the old text again, and only then emptied completely. This visually distracting behaviour was difficult to debug, as it was not described in the Gradio documentation.

The cause lay in the way Gradio processes updates. The solution required a two-step approach:

An additional state variable (stored_message) stores the message
The send_btn.click handler is split into two steps:

First: store message in state and clear input field immediately (without queue)
- Then: Actual message processing with the stored message

send_btn.click(
    store_and_clear,
    inputs=[message_input],
    outputs=[stored_message, message_input],
    queue=False  # Immediate execution
) .then(
    send_wrapper,
    inputs=[chat_state, stored_message, chatbot, system_prompt, mode_selector],
    outputs=[chat_state, chatbot, send_btn, stop_btn, session_selector]
)

This solution demonstrates an interesting feature of the debugging process for LLM-generated code: The problem was not in the generated code itself, but in understanding the specifics of the framework. Here, human expertise on Gradio’s event system was crucial.

Session isolation for multi-user operation#

Another technical aspect concerned session isolation. The interface should support multiple simultaneous users without their conversations influencing each other. The solution utilises Gradio’s state management:

chat_state = gr.State(init_chat)

When the page is loaded, each user receives their own ChatInterface instance, which is stored completely isolated in the browser state. This architecture works for the current requirements (no persistence), but would require more extensive changes if persistent storage were required.

Nested LLM calls#

Automatic title generation required an interesting approach: during a chat session with the LLM, another independent LLM call is made to generate the title. This worked without any problems because:

Title generation is encapsulated as a separate method
It works with its own specific parameters (low temperature for consistency)
It is executed asynchronously without blocking the main chat session This approach could be relevant for other projects that require meta functions – such as automatic summaries, tags or categorisations.

Methodological findings#

The role of detailed specifications#

A key observation from this project concerns the value of time invested in the specification phase. The 45 minutes spent on a detailed, clear specification resulted in:

Direct implementability without extensive rework
Avoidance of over-engineering through clear specifications
Reduced number of necessary iterations
Consistent architecture across all components

This observation suggests that in LLM-supported development, the relationship between specification and implementation time may be different than in traditional development processes. While classic development often involves ‘learning by doing’ and iterative refinement in the code itself, in LLM development it seems sensible to shift this work to the specification phase.

Further projects will show to what extent this pattern is confirmed.

Avoiding over-engineering#

A frequently discussed problem with LLM-generated code is the tendency towards overly complex solutions. This problem did not arise in this project – there were no situations in which deliberate simplification or the removal of unnecessary abstractions was necessary.

This observation probably correlates with the clarity of the specification. Where there is a precise description of what is to be implemented and how it is to be implemented technically, there seems to be less room for ‘creative overinterpretation’ by the LLM.

Debugging patterns#

The ‘anti-blink’ problem illustrates an interesting aspect of debugging LLM-generated code: the problem was not in the code logic itself, but in the interaction with framework specifics that were not sufficiently described in the documentation.

This suggests that in LLM-assisted development, the debugging skill set may need to focus more on:

Framework expertise and understanding of event systems
Recognising framework-specific patterns
Integrating different components

while classic code logic errors may become less common. This hypothesis requires further investigation.

Modularity and maintainability#

Outsourcing the prompt configuration to a separate file proved to be practical for:

Quick adjustments to system prompts without code changes
Easily adding new templates
Clearly separating configuration and logic
Improving the clarity of the main code

This approach could be particularly relevant for projects where subject-matter stakeholders (such as lecturers or researchers) want to customise prompts without having to intervene in Python code.

Results and validation#

Technical metrics#

The finished tool comprises:

795 lines of Python code (612 in app.py, 183 in prompts.py)
2 Python files plus configuration files
3 main classes with clear responsibilities
11 main functions
Total development time: 90 minutes (45 minutes specification, 45 minutes implementation)
2 main iterations over 2 days

These figures can be interpreted as follows: For a chat interface with this range of functions, the development time appears to be efficient. A comparison with traditional development is difficult, but experience suggests that a comparable implementation could have taken several working days.

Functional validation#

The tool is currently in a broader testing phase. Initial tests show:

Stable functionality of all core features
Smooth integration with various LLM backends
Good performance with streaming responses

Successful testing will enable productive use.

Conclusion#

The development of this LLM chat interface provided interesting insights into methodological aspects of LLM-supported coding. The central observation – the value of time invested in detailed specifications – appears to be a transferable principle, but requires further validation in different project contexts.

The project shows that LLM-supported development can work efficiently if the framework conditions are right: clear requirements, well-thought-out architecture and appropriate technical expertise. At the same time, it becomes clear that this type of development requires its own patterns and workflows that differ from traditional approaches.

The observations presented here are intended as a contribution to a larger discourse on best practices in LLM-supported coding. Further systematic research and the exchange of experiences within the professional community will help to develop more robust insights.

This documentation is part of a series on methodological insights from LLM-supported development projects.