LLM coding experiment: Personnel cost calculator with hybrid architecture#
Summary#
This experiment investigated the feasibility of a personnel cost calculation tool that combines natural language input with precise calculations. The central question was: How can LLMs be used effectively when exact figures are required? The solution developed demonstrates a hybrid architecture in which the LLM is solely responsible for parameter extraction, while all calculations are performed deterministically. The experiment provides transferable insights into the combination of language processing and classical programming.
1. Initial situation and motivation#
1.1 The occasion#
A request from a department at Humboldt University in Berlin provided the impetus: the manual calculation of personnel costs for third-party funded projects proved to be time-consuming and error-prone. Multiple calculation bases, manual input errors in job shares and wage increases, and the need to calculate annual slices correctly made the process complex.
The question was: Can a tool be developed that automates these calculations and accepts natural language input?
1.2 The challenge#
LLMs and precise calculations are considered a problematic combination. Language models tend to use approximations and can be unreliable when performing mathematical operations. At the same time, they offer significant advantages when interpreting unstructured inputs.
The experiment was designed to explore whether and how the two worlds could be combinedβnot by compromising on accuracy, but by using an architecture that specifically leverages the strengths of both approaches.
1.3 Requirements gathering#
A systematic requirements process was carried out prior to the actual development. A specially developed prompt guided the department through structured questions on problem understanding, functional requirements, use cases, and technical constraints.
The core requirements gathered were:
- Calculations for all TV-L pay grades (E9-E15, SHK)
- Consideration of job shares, promotions, and wage increases
- Output in annual slices with Excel export
- Accuracy requirement: deviation of less than 3% (actually achieved: exact match)
- Primary user group: research officers and administrators
2. Overview of the tool#
2.1 How it works#
The personnel cost calculator allows project requirements to be entered in natural language:
βWe need 2 doctoral students E13/2 at 67% and one postdoc E14/4, term April 2026 to March 2029.β
The system processes this input in several steps:
Parameter extraction: A local LLM analyzes the input and extracts structured parameters β pay grades, levels, job shares, time periods, number of positions.
Validation: The extracted parameters are validated against defined schemas (valid pay grades, plausible time periods, correct grade ranges).
Queries: If information is missing or unclear, the system generates specific queries.
Calculation: The deterministic calculation component calculates monthly costs, annual increments, wage increases, and grade promotions.
Output: The result is displayed as a structured table with a summary and can be exported as an Excel file.
2.2 Supported input formats#
The system interprets various linguistic formulations:
| Input | Interpretation |
|---|---|
| βhalf-time position,β β50%β | Position share 0.5 |
| β2/3 position,β β67%β | Position share 0.67 |
| βSummer semester 2026β | Start April 2026 |
| β3 years from April 2026β | April 2026 to March 2029 |
| βDoctoral studentβ | E13 (level will be requested) |
| βPostDoc E14β | E14 (level will be requested) |
2.3 Basis for calculation#
The calculation is based on:
- TV-L 2025 salary table (tariff area East)
- Employer gross factor: 1.2849 (28.49% employer share)
- Contribution rates: 74.35% (E9-E11), 46.47% (E12-E13), 32.53% (E14-E15)
- Collective agreement increase: 3% annually from 2026
- SHK hourly rate 2025: β¬14.32
3. Technical architecture#
3.1 The central design decision#
The architecture is based on a strict separation: The LLM is exclusively responsible for language processing; all calculations are performed deterministically without LLM involvement.
This decision was made during the specification phase. Initial considerations to also use the LLM for calculations were rejected β the requirement for exact results was non-negotiable.
3.2 Component structure#
personnel cost calculator/
βββ app.py # Gradio web application
βββ config/ # Static configuration
β βββ paytable.py # TV-L 2025 pay
β βββ allowances.py # Annual special payment
β βββ employer.py # Employer contributions
β βββ shk.py # SHK hourly rates
β βββ settings.py # Application settings
βββ parser/ # LLM integration
β βββ llm_client.py # API connection
β βββ prompt_templates.py # System prompts
β βββ extractor.py # Parameter extraction
β βββ validator.py # Pydantic models
βββ calculator/ # Deterministic calculation
β βββ core.py # Main calculation logic
β βββ period.py # Monthly/annual calculations
β βββ levelup.py # Level up logic
βββ session/ # Dialog control
β βββ manager.py # Session management
β βββ state.py # State machine
βββ output/ # Output generation
β βββ tables.py # Table formatting
β βββ summary.py # Summaries
β βββ excel_export.py # Excel export
βββ tools/
βββ excel_importer.py # Tariff data import3.3 Data flow#
The data flow illustrates the separation of responsibilities:
- User input (natural language) β
- LLM parser (extracts JSON with parameters) β
- Pydantic validation (type-safe checking) β
- Calculator (deterministic calculation with Decimal) β
- Output generator (formatting, Excel export)
The LLM has no access to the calculation logic or the tariff data. It only provides structured parameters, which are then processed by classic code.
3.4 State machine for dialog control#
The dialog is controlled by a state machine with defined states:
- INITIAL: Waiting for first input
- PARSING: LLM analyzes input
- CLARIFYING: Questions are asked
- CALCULATING: Calculation is running
- COMPLETE: Result ready
- FALLBACK: Manual form active
- ERROR: Error occurred
The state machine ensures that the dialog remains consistent and adheres to defined transitions.
3.5 Fallback mechanism#
A fallback mechanism was planned from the outset: After three failed parsing attempts, a manual input form is automatically displayed. Parameters that have already been successfully parsed are pre-filled.
This decision reflected uncertainty about the reliability of LLM extraction. In practice, parsing proved to be robust, but the fallback still provides important protection.
3.6 Technology stack#
- Frontend: Gradio (chosen based on existing experience)
- Backend: Python 3.10+
- LLM: Local model with OpenAI-compatible API
- Validation: Pydantic for type-safe data models
- Calculation: Python Decimal for exact arithmetic
- Export: openpyxl for Excel generation
4. Development process#
4.1 Phase model#
The development process followed a clear phase model:
Phase 1: Requirements gathering (upstream) A specially developed prompt guided the department through structured questions. The result was a detailed description of the problem, requirements, and framework conditions.
Phase 2: Specification discussion (50 minutes) Intensive coordination of all aspects of the solution:
- Architecture decision (hybrid separation of LLM/calculation)
- Data flows and interfaces
- Workflows and state transitions
- UI concept and fallback strategy
The focus was deliberately on the architecture, not on the implementation. Overengineering was prevented by checking each component against the actual requirements.
Phase 3: Specification The complete technical specification included:
- Functional requirements with examples
- Data models and validation rules
- System prompts for the LLM
- Calculation formulas and configuration data
- Component structure and interfaces
Phase 4: Implementation (approx. 40 minutes) The LLM generated the entire code in one round. The quality of the specification enabled this efficient implementation.
Phase 5: Refinement One iteration was necessary to make detailed adjustments.
4.2 Time required#
| Phase | Duration |
|---|---|
| Requirements gathering | Upstream |
| Specification discussion | 50 minutes |
| LLM implementation | Approx. 40 minutes |
| Refinement | 1 iteration |
| Total | < 2 hours |
4.3 Result#
The result: 5700 lines of code in 29 Python files, a functional tool with a web interface, dialog guidance, deterministic calculation, and Excel export.
5. Methodological findings#
5.1 Specification as a success factor#
The most important finding: The quality of the specification determines the success of LLM-supported development. Thorough preparatory workβarchitecture, data flows, workflows, UIβmust be completed before code is generated.
The 50-minute specification discussion enabled the 40-minute implementation. Without this preparatory work, multiple iterations and modifications would have been necessary.
Specific aspects of the specification:
- Clear component boundaries and responsibilities
- Defined interfaces between modules
- Explicit data models with validation rules
- Examples of inputs and outputs
- Defined error handling and fallback strategies
5.2 Hybrid architecture as a pattern#
The experiment establishes a transferable architecture pattern: LLM as an intelligent interface to classic code.
Not everything has to be done by the LLM. For many tasks β especially precise calculations β classic programming techniques remain superior. The LLM acts as a bridge between unstructured human input and structured machine processing.
This pattern is suitable for applications with the following characteristics:
- Flexible, natural language input desired
- Exact, reproducible results required
- Clearly defined calculation logic available
- Domain-specific vocabulary to be interpreted
5.3 JSON as interface format#
The structured output of the LLM as JSON with subsequent Pydantic validation proved to be a robust solution. The LLM generates only data structures, no calculations or logic.
Advantages of this approach:
- Clear contract definition between LLM and backend
- Type-safe validation before processing
- Simple error handling for invalid output
- Testability of components independently of each other
5.4 Robustness through fallback#
The fallback mechanism planned from the outset reflects an important insight: when LLM reliability is uncertain, alternative input paths are required.
The manual form is not a stopgap solution, but an integral part of the concept. It ensures that the tool remains usable even in the event of LLM failures or difficult inputs.
5.5 Limits of LLM use#
The experiment revealed limitations:
- Calculations: LLMs are unsuitable for exact arithmetic β strict separation was necessary.
- JSON extraction: Reliable generation of valid JSON structures requires precise prompts and post-processing.
- Consistency: The LLM must consistently provide either complete parameters or meaningful queries β this required careful prompt design.
6. Validation and status#
6.1 Current status#
The tool is functional and is currently in the evaluation phase. Initial tests show that the application works as planned:
- Parameter extraction works reliably
- Calculations match reference values
- Dialog behavior is consistent
- Fallback mechanism is operational but has rarely been needed
6.2 Validation approach#
Validation is performed by:
- Comparison with manual Excel calculations
- Test cases against reference values from the official tariff data Excel file
- User tests with the department
6.3 Outlook#
The question of whether LLMs and exact calculations are compatible can be answered in a differentiated way: Yes β if the architecture specifically combines the strengths of both worlds and respects their respective limitations.
The experiment is part of a series of investigations into the connection between LLMs and numerical applications. Further experiments (including on diagram and chart generation) explore additional approaches.
7. Metrics#
| Aspect | Value |
|---|---|
| Code size | 5700 lines |
| Files | 29 Python files |
| Modules | 6 (config, parser, calculator, session, output, tools) |
| Specification time | 50 minutes |
| Implementation time | approx. 40 minutes |
| Total time | < 2 hours |
| Main iterations | 1 |
| Revisions | 1 |
| Sessions | 1 |
| State machine states | 7 |
| Supported pay groups | 12 (E9a-E15Γ, SHK) |
8. Conclusion#
The experiment demonstrates that LLM-supported development is also suitable for applications with high accuracy requirementsβprovided that the architecture respects the limitations of the technology.
The key success factors:
- Thorough specification prior to implementation
- Hybrid architecture with clear separation of tasks
- LLM as an interface, not as a universal solution
- Robustness through fallback mechanisms
- Focus on architecture instead of hasty implementation
The resulting tool efficiently solves a real problem and at the same time demonstrates a transferable pattern for similar use cases.
This article is part of a series documenting methodological findings from LLM-supported development projects.