Abstract:
To address the challenges in effectively utilizing massive unstructured data within geological exploration and the issues of hallucination and lack of specialized logic in general large language models (LLMs), we propose an intelligent knowledge mining framework for vertical domains that integrates Knowledge Graph (KG) and Retrieval-Augmented Generation (RAG). This framework is validated through a case study of Carlin-type gold deposits in the Southwest Guizhou, China, and Nevada, USA. Firstly, a RAG-based intelligent question-answering system was constructed using a locally deployed DeepSeek-32B model. Through vector retrieval and generative reading comprehension, the system achieves precise traceability of professional knowledge and highly reliable Q&A. Secondly, leveraging Supervised Fine-Tuning (SFT) techniques on the LLM, a cross-regional metallogenic knowledge graph systematically covering stratigraphy, structure, alteration minerals, and ore-controlling factors was efficiently built from hundreds of multi-sources, heterogeneous geological documents. Experimental results demonstrate that the proposed system significantly outperforms GPT-4o in terms of objective accuracy. For subjective content generation, it exhibits high faithfulness and full traceability, effectively mitigating the hallucination problem. Analyses based on graph topology not only quantitatively reveal the macroscopic similarities and differences in mineralization between the two regions but also quantify the cascading indicative pathways—from orebody entities and alteration assemblages to geochemical element anomalies. This confirms the system’s capability to discover implicit clues for mineral exploration. This study realizes the intelligent transformation and in-depth mining of knowledge from unstructured text to structured representations. It offers a novel technical pathway to address the dilemma of "data-rich yet knowledge-poor" prevalent in the geoscience domain.