Changelog for AH2K25 Project
This document outlines the incremental improvements and changes across three versions of the AH2K25 project: Version 1a, Version 1b, and the Final Backend.
Final Backend Overview
The Final Backend is a microservices-based platform with unified middleware, robust timeouts, and observability hooks. Tasks 1a and 1b were independent efforts; we leveraged both sets of work to deliver this production-ready backend.
Architecture
- Microservices with clear boundaries and APIs.
- Unified middleware in
backend/core:middleware.py,exception_handler.py,error_handler.py,timeout_middleware.py. - Request timeouts across services to prevent hangs.
- Observability-ready: structured logs/metrics compatible with Grafana/Prometheus/Loki.
Key Services
chat_service,collection_summary_service,insights_servicepersona,podcast_service,relevance_servicechunk_builder,embeddings,indexer,snippetscore(logging + middleware)
Operational Readiness
- Strict timeouts via
timeout_middleware.py. - Global exception and error handling with consistent JSON responses.
- Metrics/logs structured for drop-in dashboards and alerts (Grafana/Prometheus/Loki).
- Tests for error/timeout handling in backend validate resilience.
File Structure (Final Backend)
├── LOGGING_README.md ├── README.md ├── chat_service/ ├── chunk_builder/ ├── collection_summary_service/ ├── core/ ├── embeddings/ ├── indexer/ ├── insights_service/ ├── lexicons/ ├── main.py ├── outline_extractor/ ├── persona/ ├── podcast_service/ ├── relevance_service/ ├── scripts/ ├── snippets/ ├── uploads/ ├── vector_store/ └── web_static/
Foundations from Prior Tasks
Task 1a (Independent)
- PDF heading extraction with heuristics + XGBoost.
- PyMuPDF (fitz) text extraction and heading level assignment.
- Title extraction and multilingual support.
Task 1b (Independent)
- RAG pipeline with embeddings, retrieval, and cross-encoder reranking.
- Persona-aware filtering and dynamic exclude lists.
- Standardized I/O and analysis of relevant sections.