Overview
RepoSense automates the creation of software documentation by reading a local Git repository's commit history through a pipeline of specialized agents. Users describe what they want in natural language — the system classifies the intent, fetches the relevant commits from a PostgreSQL store, and generates a structured document using a locally-running Mistral model.
Built as part of the CS 5704 Software Engineering course at Virginia Tech, the project demonstrates a practical multi-agent architecture using LangGraph, with heuristic and LLM-based classification working in tandem.
final_code branch.
What it generates
Release Notes (RN)
Narrative-style release notes for a given version tag. Structured into thematic sections with highlights and engineering fixes — modeled after VS Code release notes.
Person-centric KT (PKT)
Author-scoped documentation covering a contributor's commits, areas of ownership, and knowledge — designed for team handovers.
Feature-centric KT (FKT)
Maps changes to individual product features or modules. Planned capability, in active development.
Invalid detection
Off-topic or unrecognizable prompts are rejected with a clear explanation rather than producing incorrect documentation.
System Architecture
The system is divided into three main layers: a React frontend, an Express backend that hosts the LangGraph agent graph, and a PostgreSQL database that stores pre-indexed commit data. An Ollama instance running Mistral handles on-device text generation.
High-level flow
LangGraph state machine
The backend exposes a LangGraph graph defined in categorizer.graph.mjs. Each node reads from and writes to a shared state object. Routing between nodes is handled by conditional edges based on confidence scores and document type.
START
└── categorizerNode (heuristic scoring + context validation)
├── confidence >= 0.85 ──► finalizeNode
└── confidence < 0.85 ──► llmNode
└── finalizeNode
└── determineOutcomeNode
├── RN + tags found ──► rnAgentNode
├── RN + no tags ──► rnMissingTagsNode (error)
├── PKT + person found ──► pktAgentNode
├── PKT + no person ──► pktMissingPersonNode (error)
└── INVALID ──► invalidPromptNode (error)
└── END
Agent roles
| Agent / Node | File | Responsibility |
|---|---|---|
| Prompt Categorizer | promptCategorizer.node.js |
Scores prompt against RN and PKT keyword patterns, extracts version tags and author names via regex, combines heuristic confidence with context validation via validateWithContext(). |
| LLM Categorizer | llmCategorizer.mjs |
Fallback when heuristic confidence is below 0.85. Calls Groq (llama3-8b) or OpenAI (gpt-4o-mini) with a structured prompt containing a snippet of repo context (8 releases, 12 authors). |
| Determine Outcome | postCategorizer.node.js |
Reads the final classification and routes to the correct downstream agent or error node based on doc_type and extracted fields. |
| RN Agent | postCategorizer.node.js |
Queries PostgreSQL for commits matching the requested release tags and appends the commit list to state for generation. |
| PKT Agent | postCategorizer.node.js |
Queries PostgreSQL for commits by the identified author (matched by name or email) and appends results to state. |
| Mistral Generator | mistralGenerate.mjs |
Takes the commit list, formats it as a changelog bullet list, and sends it to Ollama at localhost:11434 to generate a narrative document using the Mistral model. |
LangGraph state shape
// apps/backend/graph/state.mjs { prompt: string, // user's natural language request repoPath: string, // absolute path to local git repo context: { // built from git tags + shortlog RN: { releases: [{ tag, date }] }, PKT: { authors: [{ name, email }] }, meta: { updatedAt, repoPath } }, draft: { // set by categorizer node doc_type: "RN" | "PKT" | "INVALID", confidence: 0.0–1.0, extracted: { from_tag, to_tag, person, feature }, rationale: string, version: string }, final: { // frozen copy of draft + commits or error ...draft, commits?: [{ commit_id, message, author_name, author_email, committed_at, release_tag, code_diff }], notification?: { type: "error", message: string }, nextAgent?: "RN" | "PKT" } }
Prompt classification — two-phase strategy
Phase 1 uses pattern matching against curated keyword sets. If the combined confidence score (heuristic + context match) reaches 0.85, the result is finalized immediately without invoking an external API. Phase 2 is only triggered for ambiguous inputs.
| Phase | Mechanism | Trigger | Providers |
|---|---|---|---|
| Heuristic | Regex pattern scoring + validateWithContext() |
Always runs first | Local — no API call |
| LLM fallback | Structured JSON prompt with repo context snippet | Heuristic confidence < 0.85 | Groq (llama3-8b) or OpenAI (gpt-4o-mini) |
Database schema
-- PostgreSQL — commit_tracker database CREATE TABLE public.commits ( commit_id VARCHAR PRIMARY KEY, message TEXT, author_name VARCHAR, author_email VARCHAR, committed_at TIMESTAMP, release_tag TEXT[], code_diff TEXT );
Folder structure
RepoSense/ apps/ backend/ agents/ contracts.js enum constants (doc types) promptCategorizer.js heuristic scoring engine promptCategorizer.node.js LangGraph node wrapper llmCategorizer.mjs Groq / OpenAI fallback postCategorizer.node.js RN / PKT / error nodes validateWithContext.js context matching (tags, authors) graph/ categorizer.graph.mjs LangGraph state machine definition state.mjs state shape + reducers services/ commitDetails.mjs PostgreSQL query helpers mistralGenerate.mjs Ollama / Mistral generation repoContext.mjs git context builder (tags, authors) server/ index.mjs Express API server config.mjs app-level defaults ui/ src/ components/ PromptForm.jsx repo path + prompt input PromptLibrary.jsx predefined prompt templates ResultCard.jsx categorizer result display ReleaseNotesGenerator.jsx RN generation trigger + output App.jsx root component vite.config.js
API Reference
The backend runs on Express (default port 3000). All request and response bodies are JSON.
Classify a prompt and build Git context. Runs the full LangGraph pipeline and returns the classified document type along with fetched commits.
Request body
repoPath— absolute path to the local Git repositoryprompt— natural-language user requestrefresh— (optional boolean) triggergit fetch --all --tagsbefore building context
Response
ok— true / falsetool.observation— classification result (doc_type, confidence, extracted, rationale)prompt_context— repo context snapshot (releases, authors)
Generate polished release notes from commits. Formats commits as a changelog list, sends to local Mistral via Ollama, and returns structured narrative text.
Request body
observation— the categorizer result object (includes commits array)userPrompt— original user prompt for context
Response
ok— true / falserelease_notes— Markdown string with thematic sections
Health check — returns 200 OK when the server is running.
Technology Stack
Backend
Node.js (ESM) Express LangGraph LangChain simple-git pg node-fetch corsFrontend
React 19 Vite 7 ESLintAI / LLM
Mistral (Ollama) Groq API OpenAI APIInfrastructure
PostgreSQL Ollama (local) Local Git reposQuick Start
# clone and switch to final_code branch git clone https://github.com/SRIKANTH284/reposense.git git checkout final_code # backend cd apps/backend npm install cp .env.example .env # set GROQ_API_KEY or OPENAI_API_KEY node server/index.mjs # frontend (separate terminal) cd apps/ui npm install npm run dev
Environment Variables
LOCAL_LLM_ENDPOINT=http://localhost:11434 LOCAL_LLM_MODEL=mistral GROQ_API_KEY=gsk_... # required for LLM fallback (or use OpenAI) OPENAI_API_KEY=sk-... # alternative to Groq # PostgreSQL (configured in commitDetails.mjs) # host: 127.0.0.1 | port: 5432 | db: commit_tracker
Authors
Built by Srikanth Badavath and Neelesh Samptur as part of the CS 5704 Software Engineering course at Virginia Tech.
Documentation
Multi-Agent Framework for Software Documentation
bsrikanth@vt.edu · nsamptur@vt.edu
Documentation is considered the single source of truth that keeps development, QA, and product teams aligned with an organization's progress and future goals. Developers in the industry face demanding deadlines, making it difficult to maintain documentation. This paper proposes a multi-agent framework that automates the generation of three types of software documentation — Release Notes (RN), Person-centric Knowledge Transfer (PKT), and Feature-centric Knowledge Transfer (FKT) — using metadata from a Git version control system. Four specialized agents handle the full pipeline: a Prompt Categorizer Agent, an Ingestion Agent (with bot-commit filtering), a Summarizer Agent, and a Publishing Agent that commits generated documents back to the repository.