• 中国出版政府奖提名奖

    中国百强科技报刊

    湖北出版政府奖

    中国高校百佳科技期刊

    中国最美期刊

    Volume 48 Issue 8
    Aug.  2023
    Turn off MathJax
    Article Contents
    Wang Quanyu, Li Zhenhua, Tu Zhipeng, Chen Guanyu, Hu Jun, Chen Jiaqi, Chen Jianjun, Lv Guobin, 2023. Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model. Earth Science, 48(8): 3137-3150. doi: 10.3799/dqkx.2022.462
    Citation: Wang Quanyu, Li Zhenhua, Tu Zhipeng, Chen Guanyu, Hu Jun, Chen Jiaqi, Chen Jianjun, Lv Guobin, 2023. Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model. Earth Science, 48(8): 3137-3150. doi: 10.3799/dqkx.2022.462

    Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model

    doi: 10.3799/dqkx.2022.462
    • Received Date: 2022-12-01
    • Publish Date: 2023-08-25
    • Geotechnical engineering named entity recognition is an important prerequisite and the work foundation for geotechnical information mining and knowledge Graph. Aiming at the recognition and classification of named entities in geotechnical texts, this article first designs and constructs a named entity corpus of geotechnical engineering according to Standard for Fundamental Terms of Geotechnical Engineering (GB/T 50279-2014) and other national industry standards; and based on deep learning technologies, a named entity recognition and classification deep learning model GENER is proposed for geotechnical engineering text. In GENER, the distributed representation learning of geotechnical engineering text features is realized based on the BERT pretrained language model; the geotechnical engineering text context feature encoding is achieved based on the BiGRU context coding layer; and based on the label decoding layer of CRF, the context features are decoded to generate the label sequence of geotechnical engineering named entity. Finally, based on the geotechnical engineering corpus, the GENER model is experimentally analyzed. comparing with other deep learning models for named entity recognition based on pretrained language models, the GENER model has better performance. The precision reaches 90.94%, the recall reaches 92.88%, the F1-score reaches 91.89%and model training speed increased by 4.735% respectively.Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geotechnical engineering entity recognition.

       

    • loading
    • Bengio, Y., Courville, A., Vincent, P., 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(8): 1798-1828. https://doi.org/10.1109/TPAMI.2013.50
      Cho, K., Van, M., Gulcehre, C., et al., 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv Preprint ArXiv: 1406.1078.
      Chu, D. P., Wan, P., Li, H., et al., 2021. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science. 46(8): 3039-3048(in Chinese with English abstract).
      Chung, J., Gulcehre, C., Cho, K. H., et al., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv: 1412.3555.
      Devlin, J., Chang, M., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. ArXiv Preprint ArXiv: 1810.04805.
      Dong, L., Yang, N., Wang, W., et al., 2019. Unified Language Model Pre-Training for Natural Language Understanding and Generation. ArXiv Preprint ArXiv: 1905.03197.
      Fan, R., Wang, L., Yan, J., et al., 2020. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 9(1): 15. https://doi.org/10.3390/ijgi9010015
      Goyal, A., Gupta, V., Kumar, M., 2018. Recent Named Entity Recognition and Classification Techniques: A Systematic Review. Computer Science Review, 29: 21-43. https://doi.org/10.1016/j.cosrev.2018.06.001
      He, Y.X., Luo, C.W., Hu, B.Y., 2015. Geographic Entity Recognition Method Based on CRF Model And Rules Combination. Computer Application and Software. 2015, 32(1): 179(in Chinese with English abstract).
      Lafferty, J., Mccallum, A., Pereira, F., 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, 282–289.
      Lample, G., Ballesteros, M., Subramanian, S., et al., 2016. Neural Architectures for Named Entity Recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego. https://doi.org/10.18653/v1/n16-1030
      Li, J., Sun, A.X., Han, J.L., et al., 2022. ASurvey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1): 50-70. https://doi.org/10.1109/TKDE.2020.2981314
      Liu, D. S., Liu, H. L., Wu, Y., et al, 2022. Genetic Features of Geo-Materials and Their Testing Metohd. Journal of Civil and Environmental Engineering, 44(04): 1-9(in Chinese with English abstract).
      Liu, H. L., Zhang, R. H., Liu, D. S., et al., 2021. Study on the Characteristics of Physical and Mechanical Parameters of Engineering Geology Based on Data Fusion. Journal of Civil and Environmental Engineering, 1-11(in Chinese with English abstract).
      Liu, X., Zhang, S., Wei, F., et al., 2011 Recognizing Named Entities in Tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, USA, 359-367.
      Liu, Y., Ott, M., Goyal, N., et al., 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv Preprint ArXiv: 1907.11692.
      Marrero, M., Urbano, J., Sánchez-Cuadrado, S., et al., 2013. Named Entity Recognition: Fallacies, Challenges and Opportunities. Computer Standards & Interfaces, 35(5): 482-489. https://doi.org/10.1016/j.csi.2012.09.004
      Ministry of Housing and Urban Rural Development of The People's Republic of China, 2013. GB/T 50330-2013: Construction Side Slope Engineering technology Stand. Beijing: China Architecture & Building Pres(in Chinese).
      Ministry of Housing and Urban Rural Development of The People's Republic of China, 2015. JTGT 84-2015: Terminology Standard for geotechnical investigation. Beijing: China Architecture & Building Pres(in Chinese).
      Ministry of Water Resources of the People's Republic of China, 2014. GB/T 50279-2014: Basic Nomenclature Standard of Geotechnical Engineer. China Planning Press, Beijing(in Chinese).
      Nadeau, D., Sekine, S., 2007. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes, 30(1): 3-26. https://doi.org/10.1075/li.30.1.03nad
      Qiu, Q., Xie, Z., Wu, L., et al., 2019. BiLSTM-CRF for Geological Named Entity Recognition from The Geoscience Literature. Earth Science Informatics, 12(4): 565-579. https://doi.org/10.1007/s12145-019-00390-3
      Qiu, Q., Xie, Z., Wu, L., et al., 2019. GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning. Earth and Space Science, 6(6): 931-946. https://doi.org/10.1029/2019EA000610
      Qiu, X., Sun, T., Xu, Y., et al., 2020. Pre-Trained Models for Natural Language Processing: A Survey. Science China Technological Sciences, 63(10): 1872-1897. https://doi.org/10.1007/s11431-020-1647-3
      Quimbaya, A. P., Múnera, A. S, , Rivera, R. A. G., et al., 2016. Named Entity Recognition over Electronic Health Records through a Combined Dictionary-Based Approach. Procedia Computer Science, 100: 55-61. https://doi.org/10.1016/j.procs.2016.09.123
      Ritter, A., Clark, S., Etzioni, O., 2011. Named Entity Recognition in Tweets: an Experimental Study. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA.
      Rocktäschel, T., Weidlich, M., Leser, U., 2012. ChemSpot: a Hybrid System for Chemical Named Entity Recognition. Bioinformatics, 28(12): 1633-1640. https://doi.org/10.1093/bioinformatics/bts183
      Sharnagat, R., 2014. Named Entity Recognition: A Literature Survey. Center For Indian Language Technology, 8-20.
      Wang, C., Ma, X., Chen, J., et al., 2018. Information Extraction and Knowledge Graph Construction from Geoscience Literature. Computers & Geosciences, 112: 112-120. https://doi.org/10.1016/j.cageo.2017.12.007
      Yang, J., Zhang, Y., Li, L., et al., 2018. YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Australia. https://doi.org/10.18653/v1/P18-4006
      Yang, Z., Dai, Z., Yang, Y., et al., 2019. Xlnet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., NewYork.
      Zhang, G. Y., Fu, J. Y., Ouyang, Z. Z., et al., 2020. The Importance of Space Database Establishment Based on DGSS in Big Data Environment. Earth Science. 45(9): 3451-3460(in Chinese with English abstract).
      Zhang, S., Elhadad, N., 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of biomedical informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004
      Zhang, S. D., Elhadad, N., 2013. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts. Journal of Biomedical Informatics, 46(6): 1088-1098. https://doi.org/10.1016/j.jbi.2013.08.004
      Zhang, X. Y., Ye, P., Wang, S., et al., 2018. Geological Entity Recognition Method Based on Deep Belief Networks. Acta Petrologica Sinica. 34(2): 343-351(in Chinese with English abstract).
      Zhang, X.Y., Zhu, S. N., Zhang, C. J., 2012. Annotation of Geographical Named Entities in Chinese Text. Acta Geodaetica et Cartographica Sinica, 41(1): 115-120. (in Chinese with English abstract).
      Zhang, Z., Han, X., Liu, Z., et al., 2019. ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence. https://doi.org/10.18653/v1/P19-1139
      储德平, 万波, 李红, 等, 2021. 基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别. 地球科学, 46(8): 3039-3048. doi: 10.3799/dqkx.2020.309
      何炎祥, 罗楚威, 胡彬尧, 2015. 基于CRF和规则相结合的地理命名实体识别方法. 计算机应用与软件, 32(1): 179. https://www.cnki.com.cn/Article/CJFDTOTAL-JYRJ201501047.htm
      张雪英, 叶鹏, 王曙, 等, 2018. 基于深度信念网络的地质实体识别方法. 岩石学报, 34(2): 343-351. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXB201802011.htm
      刘汉龙, 章润红, 刘东升, 等, 2021. 基于数据融合的工程地质物理力学参数特征研究. 土木与环境工程学报(中英文), 1-11.
      刘东升, 刘汉龙, 吴越, 等, 2022. 岩土材料的基因特征及其测试方法研究. 土木与环境工程学报(中英文), 44(04): 1-9. https://www.cnki.com.cn/Article/CJFDTOTAL-JIAN202204001.htm
      张雪英, 朱少楠, 张春菊, 2012. 中文文本的地理命名实体标注. 测绘学报, 41(1): 115-120. https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB201201023.htm
      张广宇, 付俊彧, 欧阳兆灼, 等, 2020. 大数据时代基于dgss系统下空间数据库建立的重要性. 地球科学, 45(9): 3451-3460. doi: 10.3799/dqkx.2020.130
      中华人民共和国水利部, 2014. GB/T 50279-2014: 岩土工程基本术语标准. 北京: 中国计划出版社.
      中华人民共和国住房和城乡建设部, 2013, GB/T 50330-2013: 建筑边坡工程技术规范. 北京: 中国建筑工业出版社.
      中华人民共和国住房和城乡建设部, 2015. JTGT 84-2015: 岩土工程勘察术语标准. 北京: 中国建筑工业出版社.
    • 加载中

    Catalog

      通讯作者: 陈斌, bchen63@163.com
      • 1. 

        沈阳化工大学材料科学与工程学院 沈阳 110142

      1. 本站搜索
      2. 百度学术搜索
      3. 万方数据库搜索
      4. CNKI搜索

      Figures(14)  / Tables(10)

      Article views (940) PDF downloads(56) Cited by()
      Proportional views

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return