At a mid-size construction firm in the Pacific Northwest, the data sat everywhere. Specifications in PDFs. Part numbers in SQL databases. Safety documents in SharePoint. Supplier records in spreadsheets updated by three different teams who never talked to each other.
I tried the obvious approach first: vector databases. Pinecone, FAISS, Chroma — the tools every RAG tutorial recommends. They worked for generic question-answering. They failed spectacularly for construction.
The problem wasn't the embeddings. The problem was that construction data has relationships — a part number connects to a specification, which connects to a supplier, which connects to a project phase, which connects to a compliance requirement. Vector similarity search treats every chunk as an island. In construction, nothing is an island.
So I built something different: a Neo4j knowledge graph with construction-specific entity embeddings, integrated with LangChain and PageRank algorithms for data prioritization. The result was 90%+ retrieval accuracy and a 50% reduction in reliance on external vector databases.
This is how I built it, what failed along the way, and why knowledge graph RAG is the architecture that finally made construction AI usable.
Sri Bhaanu Gundu
AI Engineer — Generative AI & Multi-Agent Systems
Sri Bhaanu Gundu is an AI development specialist with 8+ years of experience spanning construction technology, automotive, healthcare, and workforce analytics. At a mid-size construction firm, he architected a Neo4j knowledge graph RAG system for construction data that achieved 90%+ retrieval accuracy and reduced reliance on external vector databases by 50%. He built multi-agent AI systems using CrewAI, LangGraph, and Amazon Bedrock Agents, and designed part-number validation pipelines that improved retrieval accuracy by 75%. Currently an AI Engineer at a Fortune 500 consumer products company, he builds LLM-powered conversational AI and invoice detection pipelines on Google Cloud. He holds a Master's in Data Science from the University of Maryland, Baltimore County (GPA 3.8/4.0).
What is knowledge graph RAG?
Knowledge graph RAG combines a graph database (like Neo4j) with retrieval-augmented generation to preserve entity relationships during retrieval. Instead of treating documents as isolated text chunks, it maps entities — parts, specifications, suppliers, projects — as nodes and their relationships as edges, enabling the LLM to traverse connected data and return contextually accurate answers.
Why use a knowledge graph instead of a vector database for RAG?
Vector databases excel at semantic similarity search but lose relational context. In domain-specific environments like construction, a part number connects to specifications, suppliers, project phases, and compliance requirements. A knowledge graph preserves these connections. Switching from Pinecone/FAISS/Chroma to Neo4j improved retrieval accuracy from below 70% to above 90% while reducing vector database costs by 50%.
Which multi-agent framework is best for RAG: CrewAI, LangGraph, or Bedrock?
After benchmarking all three in production, CrewAI delivered the best balance of performance, cost, and development speed. LangGraph offers finer control over agent orchestration but requires more engineering effort. Amazon Bedrock Agents integrates natively with AWS services but limits model flexibility. For cost-sensitive RAG with GPT-3.5, CrewAI with SynonymRetriever and VectorContextRetriever matched GPT-4 quality at a fraction of the cost.
Most RAG tutorials teach the same pipeline: chunk your documents, embed them with OpenAI, store them in a vector database, retrieve by cosine similarity. For generic Q&A over a handful of PDFs, that works fine.
For construction data — where a single part number connects to a specification, a supplier, a project phase, a compliance document, and three different pricing tiers — it falls apart.
- Knowledge Graph RAG
Knowledge graph RAG is a retrieval-augmented generation architecture that uses a graph database to store and traverse entity relationships during retrieval, rather than relying solely on vector similarity search. Entities (such as parts, specifications, suppliers, and projects) are stored as nodes, and their relationships are stored as edges. During retrieval, the system traverses the graph to find contextually connected information — enabling more accurate and relationship-aware answers than vector-only RAG.
The fundamental difference: vector databases answer "what text is most similar to this query?" Knowledge graphs answer "what entities are connected to this query, and how?"
In construction AI, the second question is almost always the one that matters.
Knowledge graph RAG preserves entity relationships that vector similarity search discards. For domain-specific data with complex entity connections — construction, healthcare, supply chain — graph-based retrieval consistently outperforms vector-only approaches in both accuracy and contextual relevance.
When I started the AI chatbot project at a mid-size construction company in 2023, vector databases were the default choice. I tested Pinecone, FAISS, and Chroma against our construction datasets.
The results were serviceable for broad questions: "What are the safety requirements for electrical work?" returned reasonable chunks. But the moment a query required relational context — "Which supplier provides Part X for Project Y at the best price under specification Z?" — the system collapsed.
The Three Failure Modes
- Semantically similar chunks from different projects get mixed — the system returns the right part number from the wrong context
- Multi-hop queries (Supplier → Part → Specification → Project) require relationship traversal that vector similarity cannot perform
- Embedding construction jargon creates false similarity matches — 'steel beam grade A36' and 'steel reinforcement grade 60' appear similar to embeddings but are completely different materials
- Cost scales linearly with document volume without proportional accuracy gains
The realization: vector databases are retrieval tools, not knowledge systems. Construction AI needed a knowledge system.
Vector databases retrieve by semantic similarity, which works for single-hop, context-free queries. For domain-specific data with multi-hop entity relationships — parts connected to suppliers connected to projects connected to specifications — graph databases are architecturally superior because they preserve the relational structure that vector search discards.
But identifying the problem is the easy part. Building the replacement was harder — and required rethinking the entire data architecture.
I architected an on-premise Neo4j graph database optimized specifically for construction entity embeddings. The decision to go on-premise was driven by two factors: cost control (no per-query pricing from a hosted vector service) and data sensitivity (construction project data often includes proprietary specifications and pricing).
Entity Design
The graph schema mapped construction data into five core node types:
- Parts — individual components with part numbers, specifications, and pricing
- Suppliers — vendors linked to parts through supply agreements
- Projects — construction projects with phases, timelines, and compliance requirements
- Specifications — technical documents governing material and installation standards
- Compliance Documents — safety and regulatory records tied to projects and suppliers
SUPPLIES, SPECIFIED_IN, REQUIRED_FOR, COMPLIANT_WITH, PHASE_OF.Construction-Specific Embeddings
Generic embeddings from OpenAI or Hugging Face models don't understand construction terminology well. "NEC Class 1 Division 2" and "NFPA 70E compliance" carry specific meaning that general-purpose models flatten.
I generated domain-specific entity embeddings trained on our construction corpus and stored them directly in Neo4j. Each node carried both its graph relationships and its semantic embedding — enabling hybrid queries that combined graph traversal with vector similarity.
The breakthrough was not choosing between graph and vector retrieval — it was combining them. A graph query narrows the search to the right context (correct project, correct supplier). A vector query within that context finds the most relevant document. Together, retrieval accuracy exceeded 90%.
The result: a 50% reduction in reliance on external vector databases like Pinecone, FAISS, and Chroma, with retrieval accuracy exceeding 90% — up from below 70% with vector-only approaches.
Hybrid knowledge graph RAG combines graph traversal for relational context with vector similarity for semantic matching. Graph queries narrow retrieval to the correct entity context (project, supplier, specification). Vector queries within that narrowed context find the most relevant documents. This hybrid approach achieved 90%+ retrieval accuracy in production while cutting external vector database costs by 50%.
Once the Neo4j knowledge graph was populated, the challenge shifted to query optimization: how do you prioritize which nodes and relationships to traverse when a user asks a complex, multi-entity question?
I integrated LangChain and PageRank algorithms into LLaMA property graphs for data prioritization. The insight: not all nodes in a construction knowledge graph are equally important. A specification referenced by 40 projects carries more authority than one referenced by 2.
Why PageRank Worked for Construction Data
PageRank, originally designed to rank web pages by link authority, maps naturally to construction knowledge graphs. A supplier node with edges to many high-priority project nodes has high authority. A specification node referenced across multiple compliance documents is more relevant than an orphaned spec.
I applied PageRank scoring to prioritize retrieval results, so the system surfaced the most authoritative and well-connected entities first — not just the most semantically similar ones.
Outperforming spaCy and Amazon Comprehend
Before building the graph-based pipeline, I benchmarked NLP-based entity extraction using spaCy and Amazon Comprehend. Both tools performed adequately for generic named entity recognition but struggled with construction-specific entities: part numbers with mixed alphanumeric formats, specification references embedded in technical prose, and supplier names that overlapped with common words.
The LangChain + PageRank approach on property graphs surpassed both spaCy and Amazon Comprehend for information retrieval efficiency — because it was not just extracting entities, it was ranking them by relational importance within the graph.
| Approach | Entity Extraction | Relationship Awareness | Domain Accuracy | Used At |
|---|---|---|---|---|
| spaCy NER | Good for general entities, weak on construction-specific formats | None — extracts entities without relationships | Below 60% for construction part numbers | Initial benchmarks |
| Amazon Comprehend | Better coverage, still generic | Limited — detects entities, not connections | 65-70% for construction data | Initial benchmarks |
| LangChain + PageRank on Neo4j | Domain-specific embeddings handle construction terminology | Full graph traversal with authority ranking | 90%+ with hybrid graph-vector retrieval | Production system |
PageRank on a knowledge graph ranks entities by relational authority — how many high-priority nodes connect to them — rather than just semantic similarity. For construction data, this meant specifications referenced across 40 projects surfaced before orphaned documents, and the LangChain + PageRank pipeline outperformed spaCy and Amazon Comprehend for domain-specific information retrieval.
The knowledge graph handled retrieval. But construction queries often require multiple steps: identify the relevant project, find the associated suppliers, cross-reference specifications, and validate compliance — all in one user question.
A single LLM call cannot orchestrate that. I needed a multi-agent system where specialized agents handle different aspects of the query pipeline.
I evaluated three frameworks: CrewAI, LangGraph, and Amazon Bedrock Agents.
The Benchmark
I ran each framework against the same construction dataset with identical queries, measuring retrieval accuracy, latency, cost per query, and development time.
| Factor | CrewAI | LangGraph | Amazon Bedrock Agents |
|---|---|---|---|
| Development speed | Fastest — high-level agent abstractions, less boilerplate | Moderate — requires explicit state graph definition | Slowest — AWS service integration adds configuration overhead |
| Orchestration control | Good for sequential and hierarchical agent flows | Best — fine-grained control over agent state transitions and branching | Limited — constrained to AWS-defined agent patterns |
| Model flexibility | Any LLM (GPT-3.5, GPT-4, Gemini, LLaMA, Groq) | Any LLM via LangChain integrations | Primarily AWS Bedrock models (Claude, Titan) — external models require extra setup |
| Cost efficiency | Lowest — lightweight framework with minimal infrastructure overhead | Moderate — framework is free but complex orchestration increases compute | Highest — AWS service costs compound (Lambda, SageMaker, Bedrock API) |
| RAG integration | Strong with SynonymRetriever and VectorContextRetriever | Strong via LangChain native retriever ecosystem | Native with Bedrock Knowledge Bases — limited customization |
| Production deployment | FastAPI + Docker — lightweight and portable | FastAPI + Docker — similar deployment model | AWS-native — Lambda, ECR, SageMaker — vendor-locked |
Why CrewAI Won
After comparing performance and computation across all three frameworks, I selected CrewAI for the production multi-agent system. The deciding factors: faster development cycles, lower infrastructure costs, and full model flexibility — I could switch between GPT-3.5, GPT-4, GPT-4 Mini, Gemini, and Meta LLaMA models without architectural changes.
LangGraph is the better choice when you need precise control over complex state machines with branching logic. Amazon Bedrock Agents makes sense when your entire infrastructure is already AWS-native and you need minimal custom development.
For a construction AI system that needed to be cost-efficient, model-flexible, and deployed quickly through FastAPI — CrewAI was the right tool.
CrewAI delivered the best balance of development speed, cost efficiency, and model flexibility for production multi-agent RAG. LangGraph excels at complex state machine orchestration but requires more engineering. Amazon Bedrock Agents suits AWS-native environments but limits model choice and adds vendor lock-in. The right framework depends on cost sensitivity, deployment model, and how much orchestration control the use case demands.
One of the highest-impact applications of the multi-agent system was automated part number extraction and validation — a task that previously required manual cross-referencing across multiple databases.
I designed a multi-LLM agent pipeline using Tavily Agent and Serper API to extract part numbers from construction documents, validate them against company databases, and verify specifications. The system ran on AWS EC2, Lambda, and SageMaker, with the multi-agent code deployed through FastAPI.
How the Pipeline Worked
- Extraction agent — Parsed construction documents (PDFs, specifications, purchase orders) using PyMuPDF and Amazon Textract to identify part numbers
- Validation agent — Cross-referenced extracted part numbers against the company's SQL Server database and AWS S3 document stores
- Verification agent — Used Tavily and Serper API to check part numbers against supplier catalogs and external databases
- Reconciliation agent — Flagged mismatches and generated correction reports
The result: 75% improvement in retrieval accuracy and computation efficiency compared to the previous manual and single-model approaches.
Single-model RAG treats extraction and validation as one task. Multi-agent architectures let each agent specialize — one extracts, one validates, one verifies, one reconciles. Specialization improves accuracy at each step, and the pipeline catches errors that a single model misses.
Multi-LLM agent pipelines outperform single-model approaches for complex validation tasks. A four-agent pipeline (extract → validate → verify → reconcile) using Tavily, Serper API, and Amazon Textract achieved a 75% improvement in part-number retrieval accuracy while reducing manual cross-referencing effort.
GPT-4 delivers better reasoning. GPT-3.5 is roughly 10-20x cheaper per token. For a construction AI system processing thousands of queries daily, that cost difference is the difference between a viable product and a budget overrun.
I built the multi-agent framework with RAG, Google APIs, and custom retriever models — SynonymRetriever and VectorContextRetriever — specifically to enable cost-efficient retrieval with GPT-3.5 that matched GPT-4 quality.
How SynonymRetriever and VectorContextRetriever Closed the Gap
The quality gap between GPT-3.5 and GPT-4 is primarily in reasoning over ambiguous or poorly-retrieved context. If the retrieval layer provides precise, well-structured context, GPT-3.5's reasoning is sufficient for most construction queries.
Together, these retrievers provided GPT-3.5 with context so precise that it matched GPT-4's answer quality for 85%+ of construction queries — at a fraction of the cost.
The quality gap between GPT-3.5 and GPT-4 shrinks dramatically when the retrieval layer is precise. Domain-specific retrievers (SynonymRetriever for query expansion, VectorContextRetriever for graph-augmented context) enabled GPT-3.5 to match GPT-4 quality on 85%+ of construction queries — reducing per-query costs by 10-20x without sacrificing accuracy.
In April 2025, I moved to a Fortune 500 consumer products company as an AI Engineer. The domain changed — from construction documents to product recommendations. The architecture patterns transferred directly.
At the new company, I design and deploy LLM-powered conversational AI systems using Google Gemini models for customer-facing chatbot experiences. I build multi-agent AI workflows using the Sierra Agents framework for task coordination and domain-specific decision making. I also developed an end-to-end invoice detection and information extraction pipeline deployed via Google Cloud Run.
The knowledge graph principles from the construction role apply here too: products connect to categories, categories connect to growing conditions, growing conditions connect to geographic zones. A flat vector search might recommend the right product for the wrong climate. A graph-aware system gets it right.
The multi-agent architecture also carried over. Autonomous agents handle different aspects of customer interaction — product lookup, recommendation reasoning, and order context — coordinated through the Sierra Agents framework on Google Cloud.
Any domain with structured entity relationships — products to categories, parts to suppliers, patients to treatments — benefits from graph-aware retrieval over flat vector search. The specific entities change. The architecture pattern does not.
Knowledge graph RAG is not construction-specific — it is a general architecture for any domain with structured entity relationships. The same graph traversal + vector hybrid pattern that achieved 90%+ accuracy for construction data applies to product recommendations, healthcare records, and any system where entities connect through typed relationships.
- 01Knowledge graph RAG combines graph database traversal (for relational context) with vector similarity search (for semantic matching) — outperforming vector-only RAG for domain-specific data with complex entity relationships
- 02Replacing Pinecone, FAISS, and Chroma with a Neo4j knowledge graph reduced external vector database reliance by 50% and pushed retrieval accuracy above 90%
- 03Construction-specific entity embeddings stored in Neo4j enabled hybrid queries: graph traversal narrows context, vector search finds the most relevant document within that context
- 04LangChain + PageRank algorithms on property graphs outperformed spaCy and Amazon Comprehend for domain-specific information retrieval by ranking entities by relational authority
- 05CrewAI won the multi-agent framework comparison against LangGraph and Amazon Bedrock Agents for production RAG — best balance of development speed, cost, and model flexibility
- 06A four-agent pipeline (extract → validate → verify → reconcile) improved part-number retrieval accuracy by 75% using Tavily, Serper API, and Amazon Textract
- 07SynonymRetriever and VectorContextRetriever enabled GPT-3.5 to match GPT-4 answer quality on 85%+ of queries, reducing per-query costs by 10-20x
- 08Knowledge graph RAG patterns transfer across industries — the same architecture that worked for construction data applies to consumer product recommendations and any domain with structured entity relationships
What is knowledge graph RAG?
Knowledge graph RAG is a retrieval-augmented generation architecture that stores entities and their relationships in a graph database (such as Neo4j) rather than relying solely on vector embeddings. During retrieval, the system traverses graph relationships to find contextually connected information, then uses vector similarity within the narrowed context for final document selection. This hybrid approach preserves relational structure that vector-only RAG discards.
When should I use a knowledge graph instead of a vector database for RAG?
Use a knowledge graph when your data has meaningful entity relationships that affect retrieval quality — parts connected to suppliers, patients connected to treatments, products connected to categories. If your queries require multi-hop reasoning (find all compliance documents for Supplier A on Project B), graph databases outperform vector search. If your queries are single-hop semantic search over unstructured text, vector databases are sufficient and simpler to implement.
Is Neo4j the best graph database for RAG applications?
Neo4j is the most mature and widely-adopted graph database for knowledge graph RAG, with native support for vector indexes (since version 5.11), Cypher query language, and integrations with LangChain and LlamaIndex. Alternatives include Amazon Neptune (AWS-native), ArangoDB (multi-model), and TigerGraph (distributed). Neo4j is the strongest choice for teams that need a proven ecosystem, extensive documentation, and hybrid graph-vector query capabilities.
How does PageRank improve RAG retrieval?
PageRank assigns authority scores to nodes based on how many high-quality connections they have. In a knowledge graph RAG system, this means frequently-referenced specifications, well-connected suppliers, and heavily-linked project documents surface first in retrieval results — not just semantically similar ones. PageRank shifts retrieval from similarity-based to authority-based ranking within the graph context.
Which multi-agent framework is best for RAG: CrewAI, LangGraph, or Amazon Bedrock?
CrewAI is best for rapid development with cost-efficient, model-flexible multi-agent RAG — it supports any LLM and deploys easily via FastAPI. LangGraph is best when you need fine-grained control over agent state transitions and complex branching logic. Amazon Bedrock Agents is best for AWS-native environments with minimal custom development needs. The choice depends on cost sensitivity, required orchestration complexity, and deployment infrastructure.
Can GPT-3.5 replace GPT-4 in production RAG systems?
Yes, with the right retrieval layer. The quality gap between GPT-3.5 and GPT-4 is primarily in reasoning over ambiguous context. Domain-specific retrievers like SynonymRetriever (query expansion with domain synonyms) and VectorContextRetriever (graph-augmented context retrieval) provide GPT-3.5 with precise, well-structured context that reduces ambiguity. In production, this approach enabled GPT-3.5 to match GPT-4 quality on 85%+ of construction queries at 10-20x lower per-query cost.
How do you build construction-specific entity embeddings?
Start by collecting a domain corpus — construction specifications, part catalogs, supplier documents, and project records. Fine-tune or train embeddings on this corpus so the model learns that 'W-shape,' 'I-beam,' and 'wide flange' are related terms, and that 'NEC Class 1 Division 2' is a specific electrical classification. Store these embeddings directly in Neo4j nodes alongside graph relationships, enabling hybrid queries that combine graph traversal with domain-aware vector similarity.
- 01Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis, P., Perez, E., Piktus, A., et al. — Meta AI Research (2020)
- 02From Local to Global: A Graph RAG Approach to Query-Focused Summarization — Edge, D., Trinh, H., Cheng, N., et al. — Microsoft Research (2024)
- 03Neo4j Graph Database Documentation — Vector Search Index — Neo4j, Inc.
- 04LangChain Documentation — Graph-Based Retrieval — LangChain, Inc.