• 中国出版政府奖提名奖

    中国百强科技报刊

    湖北出版政府奖

    中国高校百佳科技期刊

    中国最美期刊

    Volume 46 Issue 8
    Aug.  2021
    Turn off MathJax
    Article Contents
    Chu Deping, Wan Bo, Li Hong, Fang Fang, Wang Run, 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039-3048. doi: 10.3799/dqkx.2020.309
    Citation: Chu Deping, Wan Bo, Li Hong, Fang Fang, Wang Run, 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 46(8): 3039-3048. doi: 10.3799/dqkx.2020.309

    Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model

    doi: 10.3799/dqkx.2020.309
    • Received Date: 2020-09-17
      Available Online: 2021-09-14
    • Publish Date: 2021-08-15
    • Geological entity is the key and core information in geological texts, and its accurate recognition is an important prerequisite for geological information extraction and mining. The ELMO-CNN-BiLSTM-CRF model is designed in this paper. Based on the pre-trained word vector, the deep BiLSTM-CRF neural network model is constructed. By adding dynamic features of words and character-level features of words, it makes up for the lack of specificity of word vectors, improves the recognition level of complex multi-word meanings in geological text and the ability to extract local features of geological entities. Taking the geological survey report of Xiongcun copper mine in Xietongmen County of Xizang Autonomous Region as an example, the performance of the model is evaluated. The accuracy rate, recall rate and F1 value of the model are 95.15%, 95.26% and 95.21% respectively. Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geological entity recognition, and can effectively identify long geological entity words and geological polysemants.

       

    • loading
    • Baumann, P., Mazzetti, P., Ungar, J., et al., 2016. Big Data Analytics for Earth Sciences: The Earth Server Approach. International Journal of Digital Earth, 9(1): 3-29. https://doi.org/10.1080/17538947.2014.1003106
      Chen, S.D., Ouyang, X.Y., 2020. Overview of Named Entity Recognition Technology. Radio Communications Technology, 46(3): 251-260 (in Chinese with English abstract).
      Chiu, J. P. C., Nichols, E., 2016. Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4: 357-370. https://doi.org/10.1162/tacl_a_00104
      Collobert, R., Weston, J., Bottou, L., et al., 2011. Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 12(1): 2493-2537. http://d.wanfangdata.com.cn/periodical/Arxiv000000493885
      Fan, R. Y., Wang, L. Z., Yan, J. N., et al., 2019. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15. https://doi.org/10.3390/ijgi9010015
      Hochreiter, S., Schmidhuber, J., 1997. Long Short-Term Memory. Neural Computation, 9(8): 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
      Jiang, B.C., Wan, G., Xu, J., et al., 2018. Geographic Knowledge Graph Building Extracted from Multi-Sourced Heterogeneous Data. Acta Geodaetica et Cartographica Sinica, 47(8): 1051-1061 (in Chinese with English abstract). http://www.zhangqiaokeyan.com/academic-journal-cn_acta-geodaetica-cartographica-sinica_thesis/0201230440688.html
      Kim, Y., 2014. Convolutional Neural Networks for Sentence Classification. Conference on Empirical Methods in Natural Language Processing (EMNLP). The Association for Computational Linguistics, Doha.
      Lafferty, J.D., McCallum, A., Pereira, F., 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco.
      Lample, G., Ballesteros, M., Subramanian, S., et al., 2016. Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego. https://doi.org/10.18653/v1/n16-1030
      Li, C.L., Li, J.Q., Zhang, H.C., et al., 2015. Big Data Application Architecture and Key Technologies of Intelligent Geological Survey. Geological Bulletin of China, 34(7): 1288-1299 (in Chinese with English abstract). http://www.researchgate.net/publication/286100282_Big_data_application_architecture_and_key_technologies_of_intelligent_geological_survey
      Li, L.S., Guo, Y.K., 2018. Biomedical Named Entity Recognition with CNN-BLSTM-CRF. Journal of Chinese Information Processing, 32(1): 116-122 (in Chinese with English abstract). http://europepmc.org/abstract/MED/29718118
      Liu, Y.P., Li, D.D., 2020. Chinese Named Entity Recognition Method Based on Bi-Directional LSTM-CNN-CRF. Journal of Harbin University of Science and Technology, 25(1): 115-120 (in Chinese with English abstract).
      Ma, K., 2018. Research on the Key Technologies of Geological Big Data Representation and Association (Dissertation). China University of Geosciences, Wuhan (in Chinese with English abstract).
      Ma, X. Z., Hovy, E., 2016. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). The Association for Computational Linguistics, Berlin. https://doi.org/10.18653/v1/p16-1101
      Qiu, Q. J., Xie, Z., Wu, L., et al., 2019a. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931-946. https://doi.org/10.1029/2019ea000610
      Qiu, Q. J., Xie, Z., Wu, L., et al., 2019b. BiLSTM-CRF for Geological Named Entity Recognition from the Geoscience Literature. Earth Science Informatics, 12(4): 565-579. https://doi.org/10.1007/s12145-019-00390-3
      Tan, Y.J., Qu, H.G., Wen, M., 2018. On Big Data of Geological Survey. Geomatics World, 25(2): 7-11 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTotal-CHRK201802003.htm
      Tolle, K. M., Tansley, D. S. W., Hey, A. J. G., 2011. The Fourth Paradigm: Data-Intensive Scientific Discovery. Proceedings of the IEEE, 99(8): 1334-1337. https://doi.org/10.1109/jproc.2011.2155130
      Turian, J.P., Ratinov, L., Bengio, Y., 2010. Word Representations: A Simple and General Method for Semi-Supervised Learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Uppsala.
      Wang, C. B., Ma, X. G., Chen, J. G., et al., 2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers & Geosciences, 112: 112-120. https://doi.org/10.1016/j.cageo.2017.12.007
      Wang, J. M., Hu, Y. J., Joseph, K., 2020. NeuroTPR: A Neuro-Net Toponym Recognition Model for Extracting Locations from Social Media Messages. Transactions in GIS, 24(3): 719-735. https://doi.org/10.1111/tgis.12627
      Yang, Y.Q., 2018. Current Situation, Problems and Countermeasures of Geological Prospecting Units Participate in the "Big Data" Project Construction. Natural Resource Economics of China, 31(7): 31-34 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZDKJ201807008.htm
      Zhang, G.Y., Fu, J.Y., Ouyang, Z. Z., et al., 2020. The Importance of Space Database Establishment Based on DGSS in Big Data Environment. Earth Science, 45(9): 3451-3460 (in Chinese with English abstract).
      Zhang, M.Z., Yu, M.L., Wang, Y., et al., 2013. Designing and Building the National Geo-Environment Monitoring Data Warehouse. Earth Science, 38(6): 1347-1355 (in Chinese with English abstract). http://www.researchgate.net/publication/289950672_Designing_and_building_the_national_Geo-Environment_Monitoring_data_warehouse
      Zhang, X.Y., Ye, P., Wang, S., et al., 2018. Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica, 34(2): 343-351 (in Chinese with English abstract). http://www.zhangqiaokeyan.com/academic-journal-cn_acta-petrologica-sinica_thesis/0201252011589.html
      Zhang, X.Y., Zhang, C.J., Wu, M.G., et al., 2020. SpatioTemporal Features Based Geographical Knowledge Graph Construction. Scientia Sinica Informationis, 50(7): 1019-1032 (in Chinese with English abstract). doi: 10.1360/SSI-2019-0269
      Zhao, P.D., 2015. Digital Mineral Exploration and Quantitative Evaluation in the Big Data Age. Geological Bulletin of China, 34(7): 1255-1259 (in Chinese with English abstract). http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZQYD201507001.htm
      Zhao, Y.O., Zhang, J.Z., Li, Y.B., et al., 2020. Sentiment Analysis Using Embedding from Language Model and Multi-Scale Convolutional Neural Network. Journal of Computer Application, 40(3): 651-657 (in Chinese with English abstract). doi: 10.1007/s12652-018-1095-6
      Zhu, Y.Q., Tan, Y.J., Zhang, J.T., et al., 2015. A Framework of Hadoop Based Geology Big Data Fusion and Mining Technologies. Acta Geodaetica et Cartographica Sinica, 44(S1): 152-159 (in Chinese with English abstract). http://www.cqvip.com/QK/90069X/2015B12/670679412.html
      Zuo, R.G., Peng, Y., Li, T., et al., 2020. Challenges of Geological Prospecting Big Data Mining and Integration Using Deep Learning Algorithms. Earth Science, 46(1): 350-358 (in Chinese with English abstract).
      陈曙东, 欧阳小叶, 2020. 命名实体识别技术综述. 无线电通信技术, 46(3): 251-260. doi: 10.3969/j.issn.1003-3114.2020.03.001
      蒋秉川, 万刚, 许剑, 等, 2018. 多源异构数据的大规模地理知识图谱构建. 测绘学报, 47(8): 1051-1061. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB201808005.htm
      李超岭, 李健强, 张宏春, 等, 2015. 智能地质调查大数据应用体系架构与关键技术. 地质通报, 34(7): 1288-1299. doi: 10.3969/j.issn.1671-2552.2015.07.006
      李丽双, 郭元凯, 2018. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别. 中文信息学报, 32(1): 116-122. doi: 10.3969/j.issn.1003-0077.2018.01.015
      刘宇鹏, 栗冬冬, 2020. 基于BLSTM-CNN-CRF的中文命名实体识别方法. 哈尔滨理工大学学报, 25(1): 115-120. https://www.cnki.com.cn/Article/CJFDTOTAL-HLGX202001018.htm
      马凯, 2018. 地质大数据表示与关联关键技术研究(博士学位论文). 武汉: 中国地质大学.
      谭永杰, 屈红刚, 文敏, 2018. 论地质调查工作大数据. 地理信息世界, 25(2): 7-11. doi: 10.3969/j.issn.1672-1586.2018.02.002
      杨宇谦, 2018. 地勘单位参与"大数据"项目建设的现状、问题及对策. 中国国土资源经济, 31(7): 31-34. https://www.cnki.com.cn/Article/CJFDTOTAL-ZDKJ201807008.htm
      张广宇, 付俊彧, 欧阳兆灼, 等, 2020. 大数据时代下基于DGSS系统下空间数据库建立的重要性. 地球科学, 45(9): 3451-3460. doi: 10.3799/dqkx.2020.130
      张鸣之, 喻孟良, 王勇, 等, 2013. 国家级地质环境数据仓库的设计与实现. 地球科学, 38(6): 1347-1355. doi: 10.3799/dqkx.2013.133
      张雪英, 叶鹏, 王曙, 等, 2018. 基于深度信念网络的地质实体识别方法. 岩石学报, 34(2): 343-351.
      张雪英, 张春菊, 吴明光, 等, 2020. 顾及时空特征的地理知识图谱构建方法. 中国科学: 信息科学, 50(7): 1019-1032. https://www.cnki.com.cn/Article/CJFDTOTAL-PZKX202007005.htm
      赵鹏大, 2015. 大数据时代数字找矿与定量评价. 地质通报, 34(7): 1255-1259. doi: 10.3969/j.issn.1671-2552.2015.07.001
      赵亚欧, 张家重, 李贻斌, 等, 2020. 融合基于语言模型的词嵌入和多尺度卷积神经网络的情感分析. 计算机应用, 40(3): 651-657. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202003008.htm
      朱月琴, 谭永杰, 张建通, 等, 2015. 基于Hadoop的地质大数据融合与挖掘技术框架. 测绘学报, 44(S1): 152-159. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB2015S1023.htm
      左仁广, 彭勇, 李童, 等, 2020. 基于深度学习的地质找矿大数据挖掘与集成的挑战. 地球科学, 46(1): 350-358. doi: 10.3799/dqkx.2020.111
    • 加载中

    Catalog

      通讯作者: 陈斌, bchen63@163.com
      • 1. 

        沈阳化工大学材料科学与工程学院 沈阳 110142

      1. 本站搜索
      2. 百度学术搜索
      3. 万方数据库搜索
      4. CNKI搜索

      Figures(3)  / Tables(6)

      Article views (2552) PDF downloads(136) Cited by()
      Proportional views

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return