From Vector Databases to Neo4j Knowledge Graphs: How Sri Bhaanu Gundu Built a RAG System That Hit 90% Retrieval Accuracy for Construction AI

Share to save for later

Mar 25, 2026

At a mid-size construction firm in the Pacific Northwest, the data sat everywhere. Specifications in PDFs. Part numbers in SQL databases. Safety documents in SharePoint. Supplier records in spreadsheets updated by three different teams who never talked to each other.

I tried the obvious approach first: vector databases. Pinecone, FAISS, Chroma — the tools every RAG tutorial recommends. They worked for generic question-answering. They failed spectacularly for construction.

The problem wasn't the embeddings. The problem was that construction data has relationships — a part number connects to a specification, which connects to a supplier, which connects to a project phase, which connects to a compliance requirement. Vector similarity search treats every chunk as an island. In construction, nothing is an island.

So I built something different: a Neo4j knowledge graph with construction-specific entity embeddings, integrated with LangChain and PageRank algorithms for data prioritization. The result was 90%+ retrieval accuracy and a 50% reduction in reliance on external vector databases.

This is how I built it, what failed along the way, and why knowledge graph RAG is the architecture that finally made construction AI usable.

SB
Expert Insight by

Sri Bhaanu Gundu

AI Engineer — Generative AI & Multi-Agent Systems

AI / Construction Technology / Enterprise AILinkedIn

Sri Bhaanu Gundu is an AI development specialist with 8+ years of experience spanning construction technology, automotive, healthcare, and workforce analytics. At a mid-size construction firm, he architected a Neo4j knowledge graph RAG system for construction data that achieved 90%+ retrieval accuracy and reduced reliance on external vector databases by 50%. He built multi-agent AI systems using CrewAI, LangGraph, and Amazon Bedrock Agents, and designed part-number validation pipelines that improved retrieval accuracy by 75%. Currently an AI Engineer at a Fortune 500 consumer products company, he builds LLM-powered conversational AI and invoice detection pipelines on Google Cloud. He holds a Master's in Data Science from the University of Maryland, Baltimore County (GPA 3.8/4.0).

Verified Expert
Quick Answers (TL;DR)

What is knowledge graph RAG?

Knowledge graph RAG combines a graph database (like Neo4j) with retrieval-augmented generation to preserve entity relationships during retrieval. Instead of treating documents as isolated text chunks, it maps entities — parts, specifications, suppliers, projects — as nodes and their relationships as edges, enabling the LLM to traverse connected data and return contextually accurate answers.

Why use a knowledge graph instead of a vector database for RAG?

Vector databases excel at semantic similarity search but lose relational context. In domain-specific environments like construction, a part number connects to specifications, suppliers, project phases, and compliance requirements. A knowledge graph preserves these connections. Switching from Pinecone/FAISS/Chroma to Neo4j improved retrieval accuracy from below 70% to above 90% while reducing vector database costs by 50%.

Which multi-agent framework is best for RAG: CrewAI, LangGraph, or Bedrock?

After benchmarking all three in production, CrewAI delivered the best balance of performance, cost, and development speed. LangGraph offers finer control over agent orchestration but requires more engineering effort. Amazon Bedrock Agents integrates natively with AWS services but limits model flexibility. For cost-sensitive RAG with GPT-3.5, CrewAI with SynonymRetriever and VectorContextRetriever matched GPT-4 quality at a fraction of the cost.

What Is Knowledge Graph RAG?

Share to save for later

Most RAG tutorials teach the same pipeline: chunk your documents, embed them with OpenAI, store them in a vector database, retrieve by cosine similarity. For generic Q&A over a handful of PDFs, that works fine.

For construction data — where a single part number connects to a specification, a supplier, a project phase, a compliance document, and three different pricing tiers — it falls apart.

90%+
Retrieval accuracy with Neo4j knowledge graph RAG in production
Internal benchmarks
50%
Reduction in reliance on external vector databases
Internal metrics
75%
Improvement in part number retrieval accuracy with multi-LLM agents
Internal benchmarks
Knowledge Graph RAG

Knowledge graph RAG is a retrieval-augmented generation architecture that uses a graph database to store and traverse entity relationships during retrieval, rather than relying solely on vector similarity search. Entities (such as parts, specifications, suppliers, and projects) are stored as nodes, and their relationships are stored as edges. During retrieval, the system traverses the graph to find contextually connected information — enabling more accurate and relationship-aware answers than vector-only RAG.

The fundamental difference: vector databases answer "what text is most similar to this query?" Knowledge graphs answer "what entities are connected to this query, and how?"

In construction AI, the second question is almost always the one that matters.

Key Takeaway

Knowledge graph RAG preserves entity relationships that vector similarity search discards. For domain-specific data with complex entity connections — construction, healthcare, supply chain — graph-based retrieval consistently outperforms vector-only approaches in both accuracy and contextual relevance.

Why Vector Databases Alone Failed for Construction Data

Share to save for later

When I started the AI chatbot project at a mid-size construction company in 2023, vector databases were the default choice. I tested Pinecone, FAISS, and Chroma against our construction datasets.

The results were serviceable for broad questions: "What are the safety requirements for electrical work?" returned reasonable chunks. But the moment a query required relational context — "Which supplier provides Part X for Project Y at the best price under specification Z?" — the system collapsed.

The Three Failure Modes

Orphaned context. A vector database retrieves the most semantically similar chunks. But a part number's embedding is almost identical across dozens of documents. The system returned the right part number from the wrong project, the wrong supplier, or the wrong specification revision.
Missing relationships. "Show me all compliance documents for Supplier A on Project B" requires traversing a relationship chain: Supplier → Parts → Specifications → Compliance Documents → Project. Vector search has no concept of this chain. It retrieves documents that mention the supplier and documents that mention the project — but not the ones that connect them.
Cost at scale. Running Pinecone and Chroma for a construction company with thousands of specifications, part catalogs, and project documents meant significant hosting and API costs. And the accuracy was not matching those costs.
Where Vector-Only RAG Breaks Down
  • Semantically similar chunks from different projects get mixed — the system returns the right part number from the wrong context
  • Multi-hop queries (Supplier → Part → Specification → Project) require relationship traversal that vector similarity cannot perform
  • Embedding construction jargon creates false similarity matches — 'steel beam grade A36' and 'steel reinforcement grade 60' appear similar to embeddings but are completely different materials
  • Cost scales linearly with document volume without proportional accuracy gains

The realization: vector databases are retrieval tools, not knowledge systems. Construction AI needed a knowledge system.

Key Takeaway

Vector databases retrieve by semantic similarity, which works for single-hop, context-free queries. For domain-specific data with multi-hop entity relationships — parts connected to suppliers connected to projects connected to specifications — graph databases are architecturally superior because they preserve the relational structure that vector search discards.

But identifying the problem is the easy part. Building the replacement was harder — and required rethinking the entire data architecture.

Building the Neo4j Knowledge Graph

Share to save for later

I architected an on-premise Neo4j graph database optimized specifically for construction entity embeddings. The decision to go on-premise was driven by two factors: cost control (no per-query pricing from a hosted vector service) and data sensitivity (construction project data often includes proprietary specifications and pricing).

Entity Design

The graph schema mapped construction data into five core node types:

  • Parts — individual components with part numbers, specifications, and pricing
  • Suppliers — vendors linked to parts through supply agreements
  • Projects — construction projects with phases, timelines, and compliance requirements
  • Specifications — technical documents governing material and installation standards
  • Compliance Documents — safety and regulatory records tied to projects and suppliers
Relationships between nodes encoded the connections that vector search missed: SUPPLIES, SPECIFIED_IN, REQUIRED_FOR, COMPLIANT_WITH, PHASE_OF.

Construction-Specific Embeddings

Generic embeddings from OpenAI or Hugging Face models don't understand construction terminology well. "NEC Class 1 Division 2" and "NFPA 70E compliance" carry specific meaning that general-purpose models flatten.

I generated domain-specific entity embeddings trained on our construction corpus and stored them directly in Neo4j. Each node carried both its graph relationships and its semantic embedding — enabling hybrid queries that combined graph traversal with vector similarity.

The breakthrough was not choosing between graph and vector retrieval — it was combining them. A graph query narrows the search to the right context (correct project, correct supplier). A vector query within that context finds the most relevant document. Together, retrieval accuracy exceeded 90%.

Sri Bhaanu Gundu, AI Development Specialist

The result: a 50% reduction in reliance on external vector databases like Pinecone, FAISS, and Chroma, with retrieval accuracy exceeding 90% — up from below 70% with vector-only approaches.

Key Takeaway

Hybrid knowledge graph RAG combines graph traversal for relational context with vector similarity for semantic matching. Graph queries narrow retrieval to the correct entity context (project, supplier, specification). Vector queries within that narrowed context find the most relevant documents. This hybrid approach achieved 90%+ retrieval accuracy in production while cutting external vector database costs by 50%.

LangChain, PageRank, and Property Graphs

Share to save for later

Once the Neo4j knowledge graph was populated, the challenge shifted to query optimization: how do you prioritize which nodes and relationships to traverse when a user asks a complex, multi-entity question?

I integrated LangChain and PageRank algorithms into LLaMA property graphs for data prioritization. The insight: not all nodes in a construction knowledge graph are equally important. A specification referenced by 40 projects carries more authority than one referenced by 2.

Why PageRank Worked for Construction Data

PageRank, originally designed to rank web pages by link authority, maps naturally to construction knowledge graphs. A supplier node with edges to many high-priority project nodes has high authority. A specification node referenced across multiple compliance documents is more relevant than an orphaned spec.

I applied PageRank scoring to prioritize retrieval results, so the system surfaced the most authoritative and well-connected entities first — not just the most semantically similar ones.

Outperforming spaCy and Amazon Comprehend

Before building the graph-based pipeline, I benchmarked NLP-based entity extraction using spaCy and Amazon Comprehend. Both tools performed adequately for generic named entity recognition but struggled with construction-specific entities: part numbers with mixed alphanumeric formats, specification references embedded in technical prose, and supplier names that overlapped with common words.

The LangChain + PageRank approach on property graphs surpassed both spaCy and Amazon Comprehend for information retrieval efficiency — because it was not just extracting entities, it was ranking them by relational importance within the graph.

ApproachEntity ExtractionRelationship AwarenessDomain AccuracyUsed At
spaCy NERGood for general entities, weak on construction-specific formatsNone — extracts entities without relationshipsBelow 60% for construction part numbersInitial benchmarks
Amazon ComprehendBetter coverage, still genericLimited — detects entities, not connections65-70% for construction dataInitial benchmarks
LangChain + PageRank on Neo4jDomain-specific embeddings handle construction terminologyFull graph traversal with authority ranking90%+ with hybrid graph-vector retrievalProduction system
Key Takeaway

PageRank on a knowledge graph ranks entities by relational authority — how many high-priority nodes connect to them — rather than just semantic similarity. For construction data, this meant specifications referenced across 40 projects surfaced before orphaned documents, and the LangChain + PageRank pipeline outperformed spaCy and Amazon Comprehend for domain-specific information retrieval.

Multi-Agent Architecture: CrewAI vs LangGraph vs Bedrock

Share to save for later

The knowledge graph handled retrieval. But construction queries often require multiple steps: identify the relevant project, find the associated suppliers, cross-reference specifications, and validate compliance — all in one user question.

A single LLM call cannot orchestrate that. I needed a multi-agent system where specialized agents handle different aspects of the query pipeline.

I evaluated three frameworks: CrewAI, LangGraph, and Amazon Bedrock Agents.

The Benchmark

I ran each framework against the same construction dataset with identical queries, measuring retrieval accuracy, latency, cost per query, and development time.

FactorCrewAILangGraphAmazon Bedrock Agents
Development speedFastest — high-level agent abstractions, less boilerplateModerate — requires explicit state graph definitionSlowest — AWS service integration adds configuration overhead
Orchestration controlGood for sequential and hierarchical agent flowsBest — fine-grained control over agent state transitions and branchingLimited — constrained to AWS-defined agent patterns
Model flexibilityAny LLM (GPT-3.5, GPT-4, Gemini, LLaMA, Groq)Any LLM via LangChain integrationsPrimarily AWS Bedrock models (Claude, Titan) — external models require extra setup
Cost efficiencyLowest — lightweight framework with minimal infrastructure overheadModerate — framework is free but complex orchestration increases computeHighest — AWS service costs compound (Lambda, SageMaker, Bedrock API)
RAG integrationStrong with SynonymRetriever and VectorContextRetrieverStrong via LangChain native retriever ecosystemNative with Bedrock Knowledge Bases — limited customization
Production deploymentFastAPI + Docker — lightweight and portableFastAPI + Docker — similar deployment modelAWS-native — Lambda, ECR, SageMaker — vendor-locked

Why CrewAI Won

After comparing performance and computation across all three frameworks, I selected CrewAI for the production multi-agent system. The deciding factors: faster development cycles, lower infrastructure costs, and full model flexibility — I could switch between GPT-3.5, GPT-4, GPT-4 Mini, Gemini, and Meta LLaMA models without architectural changes.

LangGraph is the better choice when you need precise control over complex state machines with branching logic. Amazon Bedrock Agents makes sense when your entire infrastructure is already AWS-native and you need minimal custom development.

For a construction AI system that needed to be cost-efficient, model-flexible, and deployed quickly through FastAPI — CrewAI was the right tool.

Key Takeaway

CrewAI delivered the best balance of development speed, cost efficiency, and model flexibility for production multi-agent RAG. LangGraph excels at complex state machine orchestration but requires more engineering. Amazon Bedrock Agents suits AWS-native environments but limits model choice and adds vendor lock-in. The right framework depends on cost sensitivity, deployment model, and how much orchestration control the use case demands.

Part Number Validation: 75% Accuracy Improvement

Share to save for later

One of the highest-impact applications of the multi-agent system was automated part number extraction and validation — a task that previously required manual cross-referencing across multiple databases.

I designed a multi-LLM agent pipeline using Tavily Agent and Serper API to extract part numbers from construction documents, validate them against company databases, and verify specifications. The system ran on AWS EC2, Lambda, and SageMaker, with the multi-agent code deployed through FastAPI.

How the Pipeline Worked

  1. Extraction agent — Parsed construction documents (PDFs, specifications, purchase orders) using PyMuPDF and Amazon Textract to identify part numbers
  2. Validation agent — Cross-referenced extracted part numbers against the company's SQL Server database and AWS S3 document stores
  3. Verification agent — Used Tavily and Serper API to check part numbers against supplier catalogs and external databases
  4. Reconciliation agent — Flagged mismatches and generated correction reports

The result: 75% improvement in retrieval accuracy and computation efficiency compared to the previous manual and single-model approaches.

The Multi-Agent Advantage for Validation

Single-model RAG treats extraction and validation as one task. Multi-agent architectures let each agent specialize — one extracts, one validates, one verifies, one reconciles. Specialization improves accuracy at each step, and the pipeline catches errors that a single model misses.

Key Takeaway

Multi-LLM agent pipelines outperform single-model approaches for complex validation tasks. A four-agent pipeline (extract → validate → verify → reconcile) using Tavily, Serper API, and Amazon Textract achieved a 75% improvement in part-number retrieval accuracy while reducing manual cross-referencing effort.

Cost Optimization: GPT-3.5 vs GPT-4 in Production

Share to save for later

GPT-4 delivers better reasoning. GPT-3.5 is roughly 10-20x cheaper per token. For a construction AI system processing thousands of queries daily, that cost difference is the difference between a viable product and a budget overrun.

I built the multi-agent framework with RAG, Google APIs, and custom retriever models — SynonymRetriever and VectorContextRetriever — specifically to enable cost-efficient retrieval with GPT-3.5 that matched GPT-4 quality.

How SynonymRetriever and VectorContextRetriever Closed the Gap

The quality gap between GPT-3.5 and GPT-4 is primarily in reasoning over ambiguous or poorly-retrieved context. If the retrieval layer provides precise, well-structured context, GPT-3.5's reasoning is sufficient for most construction queries.

SynonymRetriever expanded queries with domain-specific synonyms before retrieval. "Steel beam" also searched for "W-shape," "I-beam," and "wide flange" — terms a construction professional uses interchangeably but that have different embeddings.
VectorContextRetriever retrieved not just the top-k similar chunks, but also the surrounding context from the knowledge graph — related specifications, supplier information, and project constraints.

Together, these retrievers provided GPT-3.5 with context so precise that it matched GPT-4's answer quality for 85%+ of construction queries — at a fraction of the cost.

Key Takeaway

The quality gap between GPT-3.5 and GPT-4 shrinks dramatically when the retrieval layer is precise. Domain-specific retrievers (SynonymRetriever for query expansion, VectorContextRetriever for graph-augmented context) enabled GPT-3.5 to match GPT-4 quality on 85%+ of construction queries — reducing per-query costs by 10-20x without sacrificing accuracy.

From Construction to Consumer AI

Share to save for later

In April 2025, I moved to a Fortune 500 consumer products company as an AI Engineer. The domain changed — from construction documents to product recommendations. The architecture patterns transferred directly.

At the new company, I design and deploy LLM-powered conversational AI systems using Google Gemini models for customer-facing chatbot experiences. I build multi-agent AI workflows using the Sierra Agents framework for task coordination and domain-specific decision making. I also developed an end-to-end invoice detection and information extraction pipeline deployed via Google Cloud Run.

The knowledge graph principles from the construction role apply here too: products connect to categories, categories connect to growing conditions, growing conditions connect to geographic zones. A flat vector search might recommend the right product for the wrong climate. A graph-aware system gets it right.

The multi-agent architecture also carried over. Autonomous agents handle different aspects of customer interaction — product lookup, recommendation reasoning, and order context — coordinated through the Sierra Agents framework on Google Cloud.

Why Knowledge Graph Patterns Transfer Across Industries

Any domain with structured entity relationships — products to categories, parts to suppliers, patients to treatments — benefits from graph-aware retrieval over flat vector search. The specific entities change. The architecture pattern does not.

Key Takeaway

Knowledge graph RAG is not construction-specific — it is a general architecture for any domain with structured entity relationships. The same graph traversal + vector hybrid pattern that achieved 90%+ accuracy for construction data applies to product recommendations, healthcare records, and any system where entities connect through typed relationships.

Key Takeaways: Knowledge Graph RAG for Domain-Specific AI
  1. 01Knowledge graph RAG combines graph database traversal (for relational context) with vector similarity search (for semantic matching) — outperforming vector-only RAG for domain-specific data with complex entity relationships
  2. 02Replacing Pinecone, FAISS, and Chroma with a Neo4j knowledge graph reduced external vector database reliance by 50% and pushed retrieval accuracy above 90%
  3. 03Construction-specific entity embeddings stored in Neo4j enabled hybrid queries: graph traversal narrows context, vector search finds the most relevant document within that context
  4. 04LangChain + PageRank algorithms on property graphs outperformed spaCy and Amazon Comprehend for domain-specific information retrieval by ranking entities by relational authority
  5. 05CrewAI won the multi-agent framework comparison against LangGraph and Amazon Bedrock Agents for production RAG — best balance of development speed, cost, and model flexibility
  6. 06A four-agent pipeline (extract → validate → verify → reconcile) improved part-number retrieval accuracy by 75% using Tavily, Serper API, and Amazon Textract
  7. 07SynonymRetriever and VectorContextRetriever enabled GPT-3.5 to match GPT-4 answer quality on 85%+ of queries, reducing per-query costs by 10-20x
  8. 08Knowledge graph RAG patterns transfer across industries — the same architecture that worked for construction data applies to consumer product recommendations and any domain with structured entity relationships
FAQ

What is knowledge graph RAG?

Knowledge graph RAG is a retrieval-augmented generation architecture that stores entities and their relationships in a graph database (such as Neo4j) rather than relying solely on vector embeddings. During retrieval, the system traverses graph relationships to find contextually connected information, then uses vector similarity within the narrowed context for final document selection. This hybrid approach preserves relational structure that vector-only RAG discards.

When should I use a knowledge graph instead of a vector database for RAG?

Use a knowledge graph when your data has meaningful entity relationships that affect retrieval quality — parts connected to suppliers, patients connected to treatments, products connected to categories. If your queries require multi-hop reasoning (find all compliance documents for Supplier A on Project B), graph databases outperform vector search. If your queries are single-hop semantic search over unstructured text, vector databases are sufficient and simpler to implement.

Is Neo4j the best graph database for RAG applications?

Neo4j is the most mature and widely-adopted graph database for knowledge graph RAG, with native support for vector indexes (since version 5.11), Cypher query language, and integrations with LangChain and LlamaIndex. Alternatives include Amazon Neptune (AWS-native), ArangoDB (multi-model), and TigerGraph (distributed). Neo4j is the strongest choice for teams that need a proven ecosystem, extensive documentation, and hybrid graph-vector query capabilities.

How does PageRank improve RAG retrieval?

PageRank assigns authority scores to nodes based on how many high-quality connections they have. In a knowledge graph RAG system, this means frequently-referenced specifications, well-connected suppliers, and heavily-linked project documents surface first in retrieval results — not just semantically similar ones. PageRank shifts retrieval from similarity-based to authority-based ranking within the graph context.

Which multi-agent framework is best for RAG: CrewAI, LangGraph, or Amazon Bedrock?

CrewAI is best for rapid development with cost-efficient, model-flexible multi-agent RAG — it supports any LLM and deploys easily via FastAPI. LangGraph is best when you need fine-grained control over agent state transitions and complex branching logic. Amazon Bedrock Agents is best for AWS-native environments with minimal custom development needs. The choice depends on cost sensitivity, required orchestration complexity, and deployment infrastructure.

Can GPT-3.5 replace GPT-4 in production RAG systems?

Yes, with the right retrieval layer. The quality gap between GPT-3.5 and GPT-4 is primarily in reasoning over ambiguous context. Domain-specific retrievers like SynonymRetriever (query expansion with domain synonyms) and VectorContextRetriever (graph-augmented context retrieval) provide GPT-3.5 with precise, well-structured context that reduces ambiguity. In production, this approach enabled GPT-3.5 to match GPT-4 quality on 85%+ of construction queries at 10-20x lower per-query cost.

How do you build construction-specific entity embeddings?

Start by collecting a domain corpus — construction specifications, part catalogs, supplier documents, and project records. Fine-tune or train embeddings on this corpus so the model learns that 'W-shape,' 'I-beam,' and 'wide flange' are related terms, and that 'NEC Class 1 Division 2' is a specific electrical classification. Store these embeddings directly in Neo4j nodes alongside graph relationships, enabling hybrid queries that combine graph traversal with domain-aware vector similarity.

Sources
  1. 01Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksLewis, P., Perez, E., Piktus, A., et al. — Meta AI Research (2020)
  2. 02From Local to Global: A Graph RAG Approach to Query-Focused SummarizationEdge, D., Trinh, H., Cheng, N., et al. — Microsoft Research (2024)
  3. 03Neo4j Graph Database Documentation — Vector Search IndexNeo4j, Inc.
  4. 04LangChain Documentation — Graph-Based RetrievalLangChain, Inc.