arXiv:2505.19847v2 Announce Type: replace
Abstract: Retrieval-Augmented Generation (RAG) improves factuality by grounding LLMs in external knowledge, yet conventional centralized RAG requires aggregating distributed data, raising privacy risks and incurring high retrieval latency and cost. We present DGRAG, a distributed graph-driven RAG framework for edge-cloud collaborative systems. Each edge device organizes local documents into a knowledge graph and periodically uploads subgraph-level summaries to the cloud for lightweight global indexing without exposing raw data. At inference time, queries are first answered on the edge; a gate mechanism assesses the confidence and consistency of multiple local generations to decide whether to return a local answer or escalate the query. For escalated queries, the cloud performs summary-based matching to identify relevant edges, retrieves supporting evidence from them, and generates the final response with a cloud LLM. Experiments on distributed question answering show that DGRAG consistently outperforms decentralized baselines while substantially reducing cloud overhead. Read More