LLM coding experiment: Personnel cost calculator with hybrid architecture#

Summary#

This experiment investigated the feasibility of a personnel cost calculation tool that combines natural language input with precise calculations. The central question was: How can LLMs be used effectively when exact figures are required? The solution developed demonstrates a hybrid architecture in which the LLM is solely responsible for parameter extraction, while all calculations are performed deterministically. The experiment provides transferable insights into the combination of language processing and classical programming.

1. Initial situation and motivation#

1.1 The occasion#

A request from a department at Humboldt University in Berlin provided the impetus: the manual calculation of personnel costs for third-party funded projects proved to be time-consuming and error-prone. Multiple calculation bases, manual input errors in job shares and wage increases, and the need to calculate annual slices correctly made the process complex.

The question was: Can a tool be developed that automates these calculations and accepts natural language input?

1.2 The challenge#

LLMs and precise calculations are considered a problematic combination. Language models tend to use approximations and can be unreliable when performing mathematical operations. At the same time, they offer significant advantages when interpreting unstructured inputs.

The experiment was designed to explore whether and how the two worlds could be combined—not by compromising on accuracy, but by using an architecture that specifically leverages the strengths of both approaches.

1.3 Requirements gathering#

A systematic requirements process was carried out prior to the actual development. A specially developed prompt guided the department through structured questions on problem understanding, functional requirements, use cases, and technical constraints.

The core requirements gathered were:

Calculations for all TV-L pay grades (E9-E15, SHK)
Consideration of job shares, promotions, and wage increases
Output in annual slices with Excel export
Accuracy requirement: deviation of less than 3% (actually achieved: exact match)
Primary user group: research officers and administrators

2. Overview of the tool#

2.1 How it works#

The personnel cost calculator allows project requirements to be entered in natural language:

“We need 2 doctoral students E13/2 at 67% and one postdoc E14/4, term April 2026 to March 2029.”

The system processes this input in several steps:

Parameter extraction: A local LLM analyzes the input and extracts structured parameters – pay grades, levels, job shares, time periods, number of positions.
Validation: The extracted parameters are validated against defined schemas (valid pay grades, plausible time periods, correct grade ranges).
Queries: If information is missing or unclear, the system generates specific queries.
Calculation: The deterministic calculation component calculates monthly costs, annual increments, wage increases, and grade promotions.
Output: The result is displayed as a structured table with a summary and can be exported as an Excel file.

2.2 Supported input formats#

The system interprets various linguistic formulations:

Input	Interpretation
“half-time position,” “50%”	Position share 0.5
“2/3 position,” “67%”	Position share 0.67
“Summer semester 2026”	Start April 2026
“3 years from April 2026”	April 2026 to March 2029
“Doctoral student”	E13 (level will be requested)
“PostDoc E14”	E14 (level will be requested)

2.3 Basis for calculation#

The calculation is based on:

TV-L 2025 salary table (tariff area East)
Employer gross factor: 1.2849 (28.49% employer share)
Contribution rates: 74.35% (E9-E11), 46.47% (E12-E13), 32.53% (E14-E15)
Collective agreement increase: 3% annually from 2026
SHK hourly rate 2025: €14.32

3. Technical architecture#

3.1 The central design decision#

The architecture is based on a strict separation: The LLM is exclusively responsible for language processing; all calculations are performed deterministically without LLM involvement.

This decision was made during the specification phase. Initial considerations to also use the LLM for calculations were rejected – the requirement for exact results was non-negotiable.

3.2 Component structure#

personnel cost calculator/
├── app.py                    # Gradio web application
├── config/                   # Static configuration
│   ├── paytable.py     # TV-L 2025 pay
│   ├── allowances.py        # Annual special payment
│   ├── employer.py        # Employer contributions
│   ├── shk.py                # SHK hourly rates
│   └── settings.py           # Application settings
├── parser/                   # LLM integration
│   ├── llm_client.py         # API connection
│   ├── prompt_templates.py   # System prompts
│   ├── extractor.py          # Parameter extraction
│   └── validator.py          # Pydantic models
├── calculator/               # Deterministic calculation
│   ├── core.py               # Main calculation logic
│   ├── period.py           # Monthly/annual calculations
│   └── levelup.py     # Level up logic
├── session/                  # Dialog control
│   ├── manager.py            # Session management
│   └── state.py              # State machine
├── output/                   # Output generation
│   ├── tables.py             # Table formatting
│   ├── summary.py            # Summaries
│   └── excel_export.py       # Excel export
└── tools/
    └── excel_importer.py     # Tariff data import

3.3 Data flow#

The data flow illustrates the separation of responsibilities:

User input (natural language) →
LLM parser (extracts JSON with parameters) →
Pydantic validation (type-safe checking) →
Calculator (deterministic calculation with Decimal) →
Output generator (formatting, Excel export)

The LLM has no access to the calculation logic or the tariff data. It only provides structured parameters, which are then processed by classic code.

3.4 State machine for dialog control#

The dialog is controlled by a state machine with defined states:

INITIAL: Waiting for first input
PARSING: LLM analyzes input
CLARIFYING: Questions are asked
CALCULATING: Calculation is running
COMPLETE: Result ready
FALLBACK: Manual form active
ERROR: Error occurred

The state machine ensures that the dialog remains consistent and adheres to defined transitions.

3.5 Fallback mechanism#

A fallback mechanism was planned from the outset: After three failed parsing attempts, a manual input form is automatically displayed. Parameters that have already been successfully parsed are pre-filled.

This decision reflected uncertainty about the reliability of LLM extraction. In practice, parsing proved to be robust, but the fallback still provides important protection.

3.6 Technology stack#

Frontend: Gradio (chosen based on existing experience)
Backend: Python 3.10+
LLM: Local model with OpenAI-compatible API
Validation: Pydantic for type-safe data models
Calculation: Python Decimal for exact arithmetic
Export: openpyxl for Excel generation

4. Development process#

4.1 Phase model#

The development process followed a clear phase model:

Phase 1: Requirements gathering (upstream) A specially developed prompt guided the department through structured questions. The result was a detailed description of the problem, requirements, and framework conditions.

Phase 2: Specification discussion (50 minutes) Intensive coordination of all aspects of the solution:

Architecture decision (hybrid separation of LLM/calculation)
Data flows and interfaces
Workflows and state transitions
UI concept and fallback strategy

The focus was deliberately on the architecture, not on the implementation. Overengineering was prevented by checking each component against the actual requirements.

Phase 3: Specification The complete technical specification included:

Functional requirements with examples
Data models and validation rules
System prompts for the LLM
Calculation formulas and configuration data
Component structure and interfaces

Phase 4: Implementation (approx. 40 minutes) The LLM generated the entire code in one round. The quality of the specification enabled this efficient implementation.

Phase 5: Refinement One iteration was necessary to make detailed adjustments.

4.2 Time required#

Phase	Duration
Requirements gathering	Upstream
Specification discussion	50 minutes
LLM implementation	Approx. 40 minutes
Refinement	1 iteration
Total	< 2 hours

4.3 Result#

The result: 5700 lines of code in 29 Python files, a functional tool with a web interface, dialog guidance, deterministic calculation, and Excel export.

5. Methodological findings#

5.1 Specification as a success factor#

The most important finding: The quality of the specification determines the success of LLM-supported development. Thorough preparatory work—architecture, data flows, workflows, UI—must be completed before code is generated.

The 50-minute specification discussion enabled the 40-minute implementation. Without this preparatory work, multiple iterations and modifications would have been necessary.

Specific aspects of the specification:

Clear component boundaries and responsibilities
Defined interfaces between modules
Explicit data models with validation rules
Examples of inputs and outputs
Defined error handling and fallback strategies

5.2 Hybrid architecture as a pattern#

The experiment establishes a transferable architecture pattern: LLM as an intelligent interface to classic code.

Not everything has to be done by the LLM. For many tasks – especially precise calculations – classic programming techniques remain superior. The LLM acts as a bridge between unstructured human input and structured machine processing.

This pattern is suitable for applications with the following characteristics:

Flexible, natural language input desired
Exact, reproducible results required
Clearly defined calculation logic available
Domain-specific vocabulary to be interpreted

5.3 JSON as interface format#

The structured output of the LLM as JSON with subsequent Pydantic validation proved to be a robust solution. The LLM generates only data structures, no calculations or logic.

Advantages of this approach:

Clear contract definition between LLM and backend
Type-safe validation before processing
Simple error handling for invalid output
Testability of components independently of each other

5.4 Robustness through fallback#

The fallback mechanism planned from the outset reflects an important insight: when LLM reliability is uncertain, alternative input paths are required.

The manual form is not a stopgap solution, but an integral part of the concept. It ensures that the tool remains usable even in the event of LLM failures or difficult inputs.

5.5 Limits of LLM use#

The experiment revealed limitations:

Calculations: LLMs are unsuitable for exact arithmetic – strict separation was necessary.
JSON extraction: Reliable generation of valid JSON structures requires precise prompts and post-processing.
Consistency: The LLM must consistently provide either complete parameters or meaningful queries – this required careful prompt design.

6. Validation and status#

6.1 Current status#

The tool is functional and is currently in the evaluation phase. Initial tests show that the application works as planned:

Parameter extraction works reliably
Calculations match reference values
Dialog behavior is consistent
Fallback mechanism is operational but has rarely been needed

6.2 Validation approach#

Validation is performed by:

Comparison with manual Excel calculations
Test cases against reference values from the official tariff data Excel file
User tests with the department

6.3 Outlook#

The question of whether LLMs and exact calculations are compatible can be answered in a differentiated way: Yes – if the architecture specifically combines the strengths of both worlds and respects their respective limitations.

The experiment is part of a series of investigations into the connection between LLMs and numerical applications. Further experiments (including on diagram and chart generation) explore additional approaches.

7. Metrics#

Aspect	Value
Code size	5700 lines
Files	29 Python files
Modules	6 (config, parser, calculator, session, output, tools)
Specification time	50 minutes
Implementation time	approx. 40 minutes
Total time	< 2 hours
Main iterations	1
Revisions	1
Sessions	1
State machine states	7
Supported pay groups	12 (E9a-E15Ü, SHK)

8. Conclusion#

The experiment demonstrates that LLM-supported development is also suitable for applications with high accuracy requirements—provided that the architecture respects the limitations of the technology.

The key success factors:

Thorough specification prior to implementation
Hybrid architecture with clear separation of tasks
LLM as an interface, not as a universal solution
Robustness through fallback mechanisms
Focus on architecture instead of hasty implementation

The resulting tool efficiently solves a real problem and at the same time demonstrates a transferable pattern for similar use cases.

This article is part of a series documenting methodological findings from LLM-supported development projects.