RepoSense — Automated Software Documentation via Multi-Agent LLM Pipelines

Overview

RepoSense automates the creation of software documentation by reading a local Git repository's commit history through a pipeline of specialized agents. Users describe what they want in natural language — the system classifies the intent, fetches the relevant commits from a PostgreSQL store, and generates a structured document using a locally-running Mistral model.

Built as part of the CS 5704 Software Engineering course at Virginia Tech, the project demonstrates a practical multi-agent architecture using LangGraph, with heuristic and LLM-based classification working in tandem.

This is a proof-of-concept. Architecture and APIs are subject to change. The production codebase lives on the final_code branch.

What it generates

Release Notes (RN)

Narrative-style release notes for a given version tag. Structured into thematic sections with highlights and engineering fixes — modeled after VS Code release notes.

Person-centric KT (PKT)

Author-scoped documentation covering a contributor's commits, areas of ownership, and knowledge — designed for team handovers.

Feature-centric KT (FKT)

Maps changes to individual product features or modules. Planned capability, in active development.

Invalid detection

Off-topic or unrecognizable prompts are rejected with a clear explanation rather than producing incorrect documentation.

System Architecture

The system is divided into three main layers: a React frontend, an Express backend that hosts the LangGraph agent graph, and a PostgreSQL database that stores pre-indexed commit data. An Ollama instance running Mistral handles on-device text generation.

High-level flow

User Input

Prompt + repo path

›

Agent 1

Categorizer

Heuristic + LLM

›

Router

Outcome

RN / PKT / Error

›

Agent 2

Fetch

PostgreSQL commits

›

LLM

Generate

Mistral via Ollama

LangGraph state machine

The backend exposes a LangGraph graph defined in categorizer.graph.mjs. Each node reads from and writes to a shared state object. Routing between nodes is handled by conditional edges based on confidence scores and document type.

Graph node routing

START
  └── categorizerNode          (heuristic scoring + context validation)
        ├── confidence >= 0.85 ──► finalizeNode
        └── confidence < 0.85  ──► llmNode
                                      └── finalizeNode
                                            └── determineOutcomeNode
                                                  ├── RN + tags found      ──► rnAgentNode
                                                  ├── RN + no tags         ──► rnMissingTagsNode (error)
                                                  ├── PKT + person found   ──► pktAgentNode
                                                  ├── PKT + no person      ──► pktMissingPersonNode (error)
                                                  └── INVALID              ──► invalidPromptNode (error)
                                                                                    └── END

Agent roles

Agent / Node	File	Responsibility
Prompt Categorizer	`promptCategorizer.node.js`	Scores prompt against RN and PKT keyword patterns, extracts version tags and author names via regex, combines heuristic confidence with context validation via `validateWithContext()`.
LLM Categorizer	`llmCategorizer.mjs`	Fallback when heuristic confidence is below 0.85. Calls Groq (llama3-8b) or OpenAI (gpt-4o-mini) with a structured prompt containing a snippet of repo context (8 releases, 12 authors).
Determine Outcome	`postCategorizer.node.js`	Reads the final classification and routes to the correct downstream agent or error node based on doc_type and extracted fields.
RN Agent	`postCategorizer.node.js`	Queries PostgreSQL for commits matching the requested release tags and appends the commit list to state for generation.
PKT Agent	`postCategorizer.node.js`	Queries PostgreSQL for commits by the identified author (matched by name or email) and appends results to state.
Mistral Generator	`mistralGenerate.mjs`	Takes the commit list, formats it as a changelog bullet list, and sends it to Ollama at `localhost:11434` to generate a narrative document using the Mistral model.

LangGraph state shape

// apps/backend/graph/state.mjs
{
  prompt:    string,          // user's natural language request
  repoPath:  string,          // absolute path to local git repo
  context: {                   // built from git tags + shortlog
    RN:  { releases: [{ tag, date }] },
    PKT: { authors:  [{ name, email }] },
    meta: { updatedAt, repoPath }
  },
  draft: {                     // set by categorizer node
    doc_type:   "RN" | "PKT" | "INVALID",
    confidence: 0.0–1.0,
    extracted:  { from_tag, to_tag, person, feature },
    rationale:  string,
    version:    string
  },
  final: {                     // frozen copy of draft + commits or error
    ...draft,
    commits?:      [{ commit_id, message, author_name, author_email,
                      committed_at, release_tag, code_diff }],
    notification?: { type: "error", message: string },
    nextAgent?:    "RN" | "PKT"
  }
}

Prompt classification — two-phase strategy

Phase 1 uses pattern matching against curated keyword sets. If the combined confidence score (heuristic + context match) reaches 0.85, the result is finalized immediately without invoking an external API. Phase 2 is only triggered for ambiguous inputs.

Phase	Mechanism	Trigger	Providers
Heuristic	Regex pattern scoring + `validateWithContext()`	Always runs first	Local — no API call
LLM fallback	Structured JSON prompt with repo context snippet	Heuristic confidence < 0.85	Groq (llama3-8b) or OpenAI (gpt-4o-mini)

Database schema

-- PostgreSQL — commit_tracker database
CREATE TABLE public.commits (
  commit_id    VARCHAR   PRIMARY KEY,
  message      TEXT,
  author_name  VARCHAR,
  author_email VARCHAR,
  committed_at TIMESTAMP,
  release_tag  TEXT[],
  code_diff    TEXT
);

Folder structure

final_code branch

RepoSense/
  apps/
    backend/
      agents/
        contracts.js                 enum constants (doc types)
        promptCategorizer.js         heuristic scoring engine
        promptCategorizer.node.js    LangGraph node wrapper
        llmCategorizer.mjs           Groq / OpenAI fallback
        postCategorizer.node.js      RN / PKT / error nodes
        validateWithContext.js        context matching (tags, authors)
      graph/
        categorizer.graph.mjs        LangGraph state machine definition
        state.mjs                    state shape + reducers
      services/
        commitDetails.mjs            PostgreSQL query helpers
        mistralGenerate.mjs          Ollama / Mistral generation
        repoContext.mjs              git context builder (tags, authors)
      server/
        index.mjs                    Express API server
      config.mjs                     app-level defaults
    ui/
      src/
        components/
          PromptForm.jsx             repo path + prompt input
          PromptLibrary.jsx          predefined prompt templates
          ResultCard.jsx             categorizer result display
          ReleaseNotesGenerator.jsx  RN generation trigger + output
        App.jsx                      root component
      vite.config.js

API Reference

The backend runs on Express (default port 3000). All request and response bodies are JSON.

POST /categorize

Classify a prompt and build Git context. Runs the full LangGraph pipeline and returns the classified document type along with fetched commits.

Request body

repoPath — absolute path to the local Git repository
prompt — natural-language user request
refresh — (optional boolean) trigger git fetch --all --tags before building context

Response

ok — true / false
tool.observation — classification result (doc_type, confidence, extracted, rationale)
prompt_context — repo context snapshot (releases, authors)

POST /generate-release-notes

Generate polished release notes from commits. Formats commits as a changelog list, sends to local Mistral via Ollama, and returns structured narrative text.

Request body

observation — the categorizer result object (includes commits array)
userPrompt — original user prompt for context

Response

ok — true / false
release_notes — Markdown string with thematic sections

GET /health

Health check — returns 200 OK when the server is running.

Technology Stack

Backend

Node.js (ESM) Express LangGraph LangChain simple-git pg node-fetch cors

Frontend

React 19 Vite 7 ESLint

AI / LLM

Mistral (Ollama) Groq API OpenAI API

Infrastructure

PostgreSQL Ollama (local) Local Git repos

Quick Start

# clone and switch to final_code branch
git clone https://github.com/SRIKANTH284/reposense.git
git checkout final_code

# backend
cd apps/backend
npm install
cp .env.example .env   # set GROQ_API_KEY or OPENAI_API_KEY
node server/index.mjs

# frontend (separate terminal)
cd apps/ui
npm install
npm run dev

Environment Variables

LOCAL_LLM_ENDPOINT=http://localhost:11434
LOCAL_LLM_MODEL=mistral

GROQ_API_KEY=gsk_...        # required for LLM fallback (or use OpenAI)
OPENAI_API_KEY=sk-...        # alternative to Groq

# PostgreSQL (configured in commitDetails.mjs)
# host: 127.0.0.1 | port: 5432 | db: commit_tracker

Authors

Built by Srikanth Badavath and Neelesh Samptur as part of the CS 5704 Software Engineering course at Virginia Tech.

Documentation

Project Report · CS 5704 · Virginia Tech

Multi-Agent Framework for Software Documentation

Srikanth Badavath · Neelesh Samptur

Department of Computer Science, Virginia Tech, Blacksburg, USA
bsrikanth@vt.edu · nsamptur@vt.edu

Documentation is considered the single source of truth that keeps development, QA, and product teams aligned with an organization's progress and future goals. Developers in the industry face demanding deadlines, making it difficult to maintain documentation. This paper proposes a multi-agent framework that automates the generation of three types of software documentation — Release Notes (RN), Person-centric Knowledge Transfer (PKT), and Feature-centric Knowledge Transfer (FKT) — using metadata from a Git version control system. Four specialized agents handle the full pipeline: a Prompt Categorizer Agent, an Ingestion Agent (with bot-commit filtering), a Summarizer Agent, and a Publishing Agent that commits generated documents back to the repository.

Software Engineering Documentation Knowledge Transfer Release Notes LangGraph Multi-Agent