Abstract:
Three-dimensional mineral prospectivity modeling serves as a crucial technical approach in the exploration of deep concealed mineral resources. In recent years, deep learning methods represented by convolutional neural networks have achieved some progress in integrating 3D predictive information; however, constrained by the local receptive fields of CNNs, it remains difficult to extract long-range dependencies and global correlations between 3D predictive factors and mineralization occurrences, which limits the prediction accuracy for deep concealed ore bodies. To address these issues, this study develops a 3D-ViT model based on the Vision Transformer (ViT) architecture, tailored for 3D geological data. The model employs a 3D voxel-patch embedding module and decoupled 3D positional encoding to explicitly preserve the structural information of geological bodies. By leveraging a multi-head self-attention mechanism, a global perceptual field is constructed to model cross-scale spatial relationships between multiple predictive factors—such as intrusions, strata, and structures—and mineralization evidence. In a case study of the Shizishan ore field in Anhui Province, the model successfully predicted the main known ore bodies, achieving an AUC of 0.96. Its accuracy, recall, and F1-score all surpassed those of 3D-CNN and traditional machine learning models, demonstrating strong predictive capability and precision. Based on the prediction results, four prospective target areas were delineated in the deep part of the Shizishan ore field, verifying the method’s effectiveness and reliability in detecting concealed mineralization under complex geological settings. This research not only extends the application of ViT to three-dimensional geoscientific data but also provides a novel methodology with global perception for intelligent prediction of deep mineral resources, holding significant potential for practical exploration applications.