A Machine Learning System for Detection and Analysis of Sexist Content in Urban Music

Dany Pianchiche-Anapa
Pablo Pico-Valencia
Juan A. Holgado-Terriza
Abstract

This paper presents aspects related to the creation of an automatic classifier designed to evaluate and categorize the level of sexism present in the lyrics of songs of the urban music genre. The classification system assigns lyrics to three different categories: "A", indicating content suitable for audiences of all ages; "B", indicating content requiring adult supervision; and "C", representing adult-oriented material. The classifier was implemented in Python by applying the following algorithms: Naïve Bayes, nearest neighbours, decision tree, support vector machine and logistic regression. For the model training process, a dataset composed of 479 observations was created, divided into 75% for training and 25% for testing. The training dataset included both expressions with sexist connotations and those without. The classifier that achieved the highest degree of accuracy was the model based on the logistic regression algorithm with 77% accuracy. In order to facilitate the exploitation of the classifier in production environments, the model was integrated with a graphical user interface that facilitates the usability of the system for potential beneficiaries.

DOWNLOADS
Download data is not yet available.
How to Cite
Pianchiche-Anapa, D., Pico-Valencia, P., & Holgado-Terriza, J. A. (2024). A Machine Learning System for Detection and Analysis of Sexist Content in Urban Music. Revista Tecnológica - ESPOL, 36(1), 68-80. https://doi.org/10.37815/rte.v36n1.1088

References

Alqarni, A., & Rahman, A. (2023). Arabic Tweets-Based Sentiment Analysis to Investigate the Impact of COVID-19 in KSA: A Deep Learning Approach. Big Data and Cognitive Computing, 7(1), 1–29. https://doi.org/10.3390/bdcc7010016

Apriliani, D., Abidin, T., Sutanta, E., Hamzah, A., & Somantri, O. (2020). Sentiment analysis for assessment of hotel services review using feature selection approach based-on decision tree. International Journal of Advanced Computer Science and Applications, 11(4), 240–245. https://doi.org/10.14569/IJACSA.2020.0110432

Arce-García, S., & Menéndez-Mendéndez, M.-I. (2023). Inflamando el debate público: metodología para determinar origen y características de discursos de odio sobre diversidad sexual y de género en Twitter and gender diversity on Twitter. Profesional de La Información, 3(1), 1–19. https://doi.org/10.3145/epi.2023.ene.06

Back, B. H., & Ha, I. K. (2019). Comparison of sentiment analysis from large twitter datasets by naive bayes and natural language processing methods. J. Inf. Commun. Converg. Eng., 17(4), 239–245. https://doi.org/10.21541/apjes.939338

Bijalwan, V., Kumar, V., Kumari, P., & Pascual, J. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70. https://doi.org/10.14257/ijdta.2014.7.1.06

Castañeda Muñoz, J. (2019). Análisis, clasificación y predicción del vocabulario de cibercrimen en Internet usando modelos predictivos de Machine Learning [Tesis de Maestría, Universidad Cuahtémoc]. https://uconline.mx/comunidadead/application/views/repositoriodetesis/TesisfinalJoseAlexanderCastanedaMunoz.pdf

Cedeño-Moreno, D., & Vargas, M. (2020). Aprendizaje automático aplicado al análisis de sentimientos. I+D Tecnológico, 16(2), 59–66. https://doi.org/10.33412/idt.v16.2.2833

Dake, D. K., & Gyimah, E. (2023). Using sentiment analysis to evaluate qualitative students’ responses. Education and Information Technologies, 28(4), 4629–4647. https://doi.org/10.1007/s10639-022-11349-1

Dhrodia, A. (2017). Social media and the silencing effect: why misogyny online is a human rights issue. https://www.newstatesman.com/culture/social-media/2017/11/social-media-and-silencing-effect-why-misogyny-online-human-rights-issue

Fahmi, M., Yuningsih, Y., & Puspita, A. (2023). Sentiment Analysis Of Online Gojek Transportation Services On Twitter Using The Naïve Bayes Method. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 8(2), 84–90. https://doi.org/10.33480/jitk.v8i2.4004

Ghosh, S., Roy, S., & Bandyopadhyay, S. K. (2012). A tutorial review on Text Mining Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 1(4), 223–233. www.ijarcce.com

Jiang, L., & Suzuki, Y. (2019). Detecting hate speech from tweets for sentiment analysis. 2019 6th International Conference on Systems and Informatics, ICSAI 2019, Icsai, 671–676. https://doi.org/10.1109/ICSAI48974.2019.9010578

Lepe, M. (2021). Modelos híbridos basados en Lexicones y Machine Learning para la detección de agresividad sobre textos en idioma Español. http://mcc.ubiobio.cl/docs/tesis/manuel_lepe-tesis(manuellepe).pdf

Mesiti, A. M., & Yeo, H. L. (2023). Social Media: The Good, the Bad, and the Ugly. Clinics in Colon and Rectal Surgery, 36(5), 347–352. https://doi.org/10.1055/s-0043-1763281

Ministerio Telecomunicaciones de Ecuador. (2019). Ley Orgánica de Comunicaciones. https://www.telecomunicaciones.gob.ec/wp-content/uploads/2020/01/Ley-Organica-de-Comunicación.pdf

Nugrahaeni, R. A., & Mutijarsa, K. (2017). Comparative analysis of machine learning KNN, SVM, and random forests algorithm for facial expression classification. Proceedings - 2016 International Seminar on Application of Technology for Information and Communication, ISEMANTIC 2016, 163–168. https://doi.org/10.1109/ISEMANTIC.2016.7873831

Nurfaizah, Hariguna, T., & Romadon, Y. I. (2019). The accuracy comparison of vector support machine and decision tree methods in sentiment analysis. Journal of Physics: Conference Series, 1367(1). https://doi.org/10.1088/1742-6596/1367/1/012025

OED. (1866). Sexism. https://www.oed.com/search/dictionary/?scope=Entries&q=sexism

Penagos Rojas, Y. (2012). Lenguajes del poder. La música reggaetón y su influencia en el estilo de vida de los estudiantes. Plumilla Educativa, 10(2), 290–305. https://dialnet.unirioja.es/servlet/articulo?codigo=4323457

Pico-Valencia, P., Vinueza-Celi, O., & Holgado-Terriza, J. A. (2021). Bringing Machine Learning Predictive Models Based on Machine Learning Closer to Non-technical Users. Advances in Intelligent Systems and Computing, 1273 AISC, 3–15. https://doi.org/10.1007/978-3-030-59194-6_1

Piñón Lora, M., & Pulido Moreno, A. (2020). La imagen de la mujer en el reggaetón: un análisis crítico del discurso. Revista Iberoamericana de Comunicación, 38, 45–77. https://ric.ibero.mx/index.php/ric/article/view/67/53

RAE. (2023). Sexismo. http://dle.rae.es/srv/search?m=30&w=sexismo

Ramasamy, L. K., Kadry, S., & Lim, S. (2021). Selection of optimal hyper-parameter values of support vector machine for sentiment analysis tasks using nature-inspired optimization methods. Bulletin of Electrical Engineering and Informatics, 10(1), 290–298. https://doi.org/10.11591/eei.v10i1.2098

Rasel, R. I., Sultana, N., Akhter, S., & Meesad, P. (2018). Detection of cyber-aggressive comments on social media networks: A machine learning and text mining approach. ACM International Conference Proceeding Series, 37–41. https://doi.org/10.1145/3278293.3278303

Sri Mulyani, E. D., Rohpandi, D., & Rahman, F. A. (2019). Analysis of Twitter Sentiment Using the Classification of Naive Bayes Method about Television in Indonesia. 2019 1st International Conference on Cybernetics and Intelligent System, ICORIS 2019, 1(August), 89–93. https://doi.org/10.1109/ICORIS.2019.8874896

Wang, P., Yan, Y., Si, Y., Zhu, G., Zhan, X., Wang, J., & Pan, R. (2020). Classification of Proactive Personality: Text Mining Based on Weibo Text and Short-Answer Questions Text. IEEE Access, 8, 97370–97382. https://doi.org/10.1109/ACCESS.2020.2995905

Xia, H., Yang, Y., Pan, X., Zhang, Z., & An, W. (2020). Sentiment analysis for online reviews using conditional random fields and support vector machines. Electronic Commerce Research, 20(2), 343–360. https://doi.org/10.1007/s10660-019-09354-7

Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 10843 LNCS. Springer International Publishing. https://doi.org/10.1007/978-3-319-93417-4_48

Similar Articles

You may also start an advanced similarity search for this article.