El procesamiento del lenguaje natural es un campo dentro de la inteligencia artificial que estudia cómo modelar computacionalmente el lenguaje humano. La representación de palabras a través de vectores, conocida como Word embeddings, se populariza en los últimos años a través de técnicas como Doc2Vec o Word2Vec. El presente estudio evalúa el uso de Doc2Vec en un conjunto de conversaciones recopiladas por el centro de emergencia ECU911, perteneciente al cantón Cuenca de la provincia del Azuay durante el año 2020, con el fin de clasificar los incidentes para que el operador pueda tomar la mejor decisión, en cuanto a las acciones a realizar cuando se presente una emergencia. Además, se compara Doc2Vec con la técnica Word2Vec para verificar su nivel de desempeño tanto en precisión como en tiempo. A base de las pruebas realizadas se concluye que Doc2Vec tiene un desempeño sólido al utilizar modelos entrenados con gran corpus, superando a Word2Vec en este aspecto.
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.
Referencias
Balcerek, J., Pawlowski, P., & Dabrowski, A. (2017). Classification of emergency phone conversations with artificial neural network. Signal Processing - Algorithms, Architectures, Arrangements, and Applications Conference Proceedings, SPA, 2017-Septe, 343–348. https://doi.org/10.23919/SPA.2017.8166890
Basili, V. R., Selby, R. W., & Hutchens, D. H. (1986). Experimentation in Software Engineering. In IEEE Transactions on Software Engineering: Vol. SE-12 (Issue 7). https://doi.org/10.1109/TSE.1986.6312975
Bendraou, R., Combemale, B., Cregut, X., & Gervais, M.-P. (2008). Definition of an Executable SPEM 2.0. 390–397. https://doi.org/10.1109/aspec.2007.60
Blomberg, S. N., Folke, F., Ersbøll, A. K., Christensen, H. C., Torp-Pedersen, C., Sayre, M. R., Counts, C. R., & Lippert, F. K. (2019). Machine learning as a supportive tool to recognize cardiac arrest in emergency calls. Resuscitation, 138(October 2018), 322–329. https://doi.org/10.1016/j.resuscitation.2019.01.015
Dai, X., Bikdash, M., & Meyer, B. (2017). From social media to public health surveillance: Word embedding based clustering method for twitter classification. Conference Proceedings - IEEE SOUTHEASTCON, Table I. https://doi.org/10.1109/SECON.2017.7925400
Gobierno de la República del Ecuador. (2019). Servicio Integrado de Seguridad ECU911. https://www.ecu911.gob.ec/
Gomez-Perez, J. M., Denaux, R., & Garcia-Silva, A. (2020). A Practical Guide to Hybrid Natural Language Processing. In A Practical Guide to Hybrid Natural Language Processing. Springer International Publishing. https://doi.org/10.1007/978-3-030-44830-1
Guti, L., & Keith, B. (2019). A Systematic Literature Review on Word Embeddings (Issue April 2020). Springer International Publishing. https://doi.org/10.1007/978-3-030-01171-0
Heimerl, F., & Gleicher, M. (2018). Interactive Analysis of Word Vector Embeddings. Computer Graphics Forum, 37(3), 253–265. https://doi.org/10.1111/cgf.13417
Kim, S., Park, I., & Yoon, B. (2020). Sao2vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec. PLoS ONE, 15(2), 1–26. https://doi.org/10.1371/journal.pone.0227930
Lau, J. H., & Baldwin, T. (2016). An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. 78–86. https://doi.org/10.18653/v1/w16-1609
Mayo, M. (2018). Preprocesamiento de datos de texto: un tutorial en Python. https://medium.com/datos-y-ciencia/preprocesamiento-de-datos-de-texto-un-tutorial-en-python-5db5620f1767
McKinney, W. (2013). Python for data analysis. In J. S. and M. Blanchette (Ed.), Journal of Chemical Information and Modeling (Melanie Ya, Vol. 53, Issue 9). O’Reilly Media, Inc.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 1–12.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations ofwords and phrases and their compositionality. Advances in Neural Information Processing Systems, 1–9.
Nakata, T. (2017). Text-mining on incident reports to find knowledge on industrial safety. Proceedings - Annual Reliability and Maintainability Symposium. https://doi.org/10.1109/RAM.2017.7889795
Nath Nandi, R., Arefin Zaman, M. M., Al Muntasir, T., Hosain Sumit, S., Sourov, T., & Jamil-Ur Rahman, M. (2018). Bangla News Recommendation Using doc2vec. 2018 International Conference on Bangla Speech and Language Processing, ICBSLP 2018, 1–5. https://doi.org/10.1109/ICBSLP.2018.8554679
Rehurek, R., & Sojka, P. (2011). Gensim — Statistical Semantics in Python (Vol. 6611, Issue May 2010).
Security, H., & Directorate, T. (2011). Computer Aided Dispatch Systems Computer-aided. September.
Senel, L. K., Utlu, I., Yucesoy, V., Koc, A., & Cukur, T. (2018). Semantic structure and interpretability of word embeddings. IEEE/ACM Transactions on Audio Speech and Language Processing, 26(10), 1769–1779. https://doi.org/10.1109/TASLP.2018.2837384
Shao, Y., Taylor, S., Marshall, N., Morioka, C., & Zeng-Treitler, Q. (2019). Clinical Text Classification with Word Embedding Features vs. Bag-of-Words Features. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, 2874–2878. https://doi.org/10.1109/BigData.2018.8622345
Truşcă, M. M. (2019). Efficiency of SVM classifier with Word2Vec and Doc2Vec models. Proceedings of the International Conference on Applied Statistics, 1(1), 496–503. https://doi.org/10.2478/icas-2019-0043
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., … Vázquez-Baeza, Y. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3), 261–272. https://doi.org/10.1038/s41592-019-0686-2
Zhang, J., Zhang, M., Ren, F., Yin, W., Prior, A., Villella, C., & Chan, C. Y. (2018). Enable automated emergency responses through an agent-based computer-aided dispatch system. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 3, 1844–1846.