Architecture overview#
Introduction#
AI mapping at universities is a system for recording, analysing and visualising AI activities at German universities. The application allows users to submit URLs for AI projects, automatically extracts relevant information, creates structured profiles and displays them in a clear web interface.
System Architecture#
The application is designed as a modern web application with a clear separation between backend and frontend. It follows an API-first approach with static site generation for the presentation of data.
Architecture diagram#
graph TD
User -->|Submits URL| API[FastAPI backend]
Admin -->|Manages data| API
API -->|Stores data| DB[(SQLite/PostgreSQL)] API -->|Extracts text| Crawler[Web crawler]
Crawler -->|Processes content| LLM[LLM client]
API -->|Generates profile| LLM
API -->|Generates static pages| Generator[Site generator]
Generator -->|Generates HTML| Static [Static website]
User -->|Views results| Static
subgraph Backend
API
Crawler
LLM
Generator
end
subgraph Database
DB
end
subgraph Frontend
Static
endMain components#
1. FastAPI backend#
The backend is implemented with FastAPI and forms the core of the application. It provides RESTful API endpoints for:
- Submitting URLs
- Extracting and analysing website content
- Managing submissions and projects
- Generating profiles
- Generating the static website
2. Web Crawler & Extractor#
This component is responsible for extracting relevant information from submitted web pages:
- SimpleExtractor: Extracts basic text and metadata from web pages
- FeatureAwareExtractor: Advanced extraction with a focus on AI-specific content
3. LLM integration#
The application uses local language models (LLM) via an OpenAI-compatible client to:
- Analyse the extracted texts
- Identify AI-relevant information
- Generate structured profiles
- Recognise relationships between different projects
4. Database#
The application uses SQLite for development and PostgreSQL for production:
- Submissions: Stores submitted URLs and their status
- Projects: Stores approved and structured project data
- Users: Manages administrator accounts for the backend
5. Site generator#
The site generator:
- Generates static HTML pages from the project data
- Uses Jinja2 templates for consistent design
- Creates index, project list and detail pages
- Generates metadata for search engines
6. Frontend#
The frontend consists of:
- Static HTML pages with CSS and JavaScript
- User-friendly forms for submitting URLs
- Visualisations and filters for project data
- Responsive design for various end devices
Technology stack#
Backend#
- Python 3.11+: Basic programming language
- FastAPI: Web framework for modern API development
- SQLAlchemy: ORM for database access
- Pydantic: Data validation and conversion
- Alembic: Database migration tool
- OpenAI client: Communication with local LLMs
- BeautifulSoup/Trafilatura: HTML parsing and text extraction
Frontend#
- HTML5/CSS3: Markup and styling
- JavaScript: Client-side interactivity
- Jinja2: Template engine for site generation
Database#
- SQLite: For development and smaller deployments
- PostgreSQL: For production environments
Deployment#
- Docker: Containerisation
- Docker Compose: Container orchestration
Data flow#
- User submits URL to an AI project via the web form
- The URL is validated and stored as a submission in the database
- The crawler extracts text and metadata from the URL
- The FeatureAwareExtractor analyses the content and identifies relevant structures
- The LLM client generates a structured profile based on the extracted text
- An administrator reviews and approves the submission
- The approved project is stored in the database
- The site generator creates updated static HTML pages
- Users can view and search the projects on the website
Security and authentication concept#
- API key-based authentication for admin endpoints
- Validation and cleansing of all user input
- CORS configuration for front-end security
- Server-based CSRF tokens
Scaling concept#
The application is designed for different scaling levels:
- Simple deployment: Single server with SQLite for smaller instances
- Medium scaling: PostgreSQL database and Docker deployment
- Advanced scaling: Distributed crawlers and LLM processing for high request volumes
Since the front-end presentation is done via static pages, scaling the website deployment is easy via CDNs or static hosting services.