The continuous evaluation of solar photovoltaic plants is essential for their operation. Their main variables must be monitored to verify that the electrical energy is delivered under optimal operating and efficiency conditions. This research presents a methodology based on data science to evaluate solar photovoltaic plants. This methodology was applied to the data set of a solar plant of the US National Renewable Energy Laboratory, analyzing the data to obtain temporal curves of irradiance and energy, as well as the leading performance indicators. Also, this study used the K-Means algorithm to generate clusters within the data set and the K-NN algorithm to create class prediction models of the energy and PR indicator. Clusters grouping the generated power values and the PR values were obtained. The energy class classification model had an accuracy of 91.67%, while the PR indicator class classification model had an accuracy of 83.33%. Since the average fouling rate in the monthly and annual scales was above 90%, while those of the PR were around 70%, a study is recommended to determine the origin of the losses in the plant. It is also suggested that a model be developed to determine the impact of ambient temperature, PV module temperature, and wind speed on of electric power production.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Ahire, N., Agrawal, A., & Sharma, D. (2018). Performance Analysis of PV Solar Power System. IOSR Journal of Electrical and Electronics Engineering, 35-41. DOI: 10.9790/1676-1302013541.
Amat Rodrigo, J. (15 de 02 de 2023). Ciencia de Datos, Estadística, Machine Learning y Programación. (Joaquin Amat Rodrigo) Recuperado el 01 de Dicembre de 2022, de https://www.cienciadedatos.net/documentos/pystats05-correlacion-lineal-python.html
Asea Brown Boveri. (2019). Technical Application Paper. Photovoltaic plants-Cutting edge technology. From sun to socket. https://search.abb.com/library/Download.aspx?DocumentID=9AKK107492A3277&LanguageCode=en&DocumentPartId&Action=Launch.
Asociación Mexicana de Energía Solar. (2021). Operación y Mantenimiento. Guía de Mejores Prácticas / Edición México. https://asolmex.org/2021/04/29/operacion-y-mantenimiento/.
Cielen, D., Meysman, A., & Ali, M. (2016). Introducing Data Science. Shelter Island, NY: Manning Publications Co.
Cordero, R., Damiani, A., Laroze, D., MacDonell, S., Jorquera, J., Sepúlveda, E., . . . Torres, G. (2018). Effects of soiling on photovoltaic (PV) modules in the Atacama Desert. Scientific Reports, 1-14. DOI:10.1038/s41598-018-32291-8.
Fenner, M. E. (2020). Machine Learning with Python for Everyone. Boston: Pearson Education, Inc.
Igual, L., & Seguí, S. (2017). Introduction to Data Science - A Python Approach to Concepts, Techniques and Applications. Switzerland: Springer International Publishing.
International Electrotechnical Commission. (2016). IEC TS 61 724-3 Photovoltaic system performance – Part 3: Energy evaluation method. IEC.
Jordan, D., & Kurtz, S. (2012). Photovoltaic Degradation Rates — An Analytical Review. National Renewable Energy Laboratory.
Lee, W. M. (2019). Python Machine Learning. Indianapolis: John Wiley & Sons, Inc.
León-Ospina, C., Arias-Zarate, H., & Hernandez, C. (2023). Performance Evaluation of Photovoltaic Projects in Latin America. International Journal of Advanced Computer Science and Applications, 201-212. https://dx.doi.org/10.14569/IJACSA.2023.0140123.
McKinney, W. (2018). Python for Data Analysis. Sebastopol, CA: O’Reilly Media, Inc.
Nugroho, W., & Sudiarto, B. (2021). Performance evaluation of 5 MW Solar PV Power Plant in Kupang. Materials Science and Engineering. doi:10.1088/1757-899X/1098/4/042069.
PVDAQ NREL. (15 de 02 de 2023). Duramat. Obtenido de Duramat: https://datahub.duramat.org/dataset/pvdaq-time-series-with-soiling-signal
Ratner, B. (2017). Statistical and Machine-Learning Data Mining - Techniques for Better Predictive Modeling and Analysis of Big Data. Boca Raton, FL: CRC Press Taylor & Francis Group.
Romero-Fiances, I., Muñoz-Cerón, E., Espinoza-Paredes, R., Nofuentes, G., & de la Casa, J. (2019). Analysis of the Performance of Various PV Module Technologies in Peru. Energies. doi:10.3390/en12010186.
Russano, E., & Ferreira Avelino, E. (2020). Fundamentals of Machine Learning Using Python. Oakville, Canadá: Arcler Press.
SolarDesignTool. (15 de 02 de 2023). Obtenido de SolarDesignTool site: http://www.solardesigntool.com/components/module-panel-solar/Sanyo/2735/HIP200BA3/specification-data-sheet.html
Tackie, S., & Özerdem, Ö. (2022). Performance Evaluation and Viability Studies of Photovoltaic Power Plants in North Cyprus. International Journal of Renewable Research, 2237-2247. https://doi.org/10.20508/ijrer.v12i4.13670.g8583.
Umargono, E., Suseno, J. E., & Gunanwan S.K, V. (2019). K-Means Clustering Optimization Using the Elbow Method and Early Centroid Determination Based on Mean and Median Formula. Advances in Social Science, Education and Humanities Research, 474. https://doi.org/10.2991/assehr.k.201010.019.
Vasisht, M., Srinivasan, J., & Ramasesha, S. (2016). Performance of solar photovoltaic installations: Effect of seasonal variations. Solar Energy, 39-46. http://dx.doi.org/10.1016/j.solener.2016.02.013.
Veerendra Kumar, D., Deville, L., Ritter III, K., Raush, J. R., Ferdowsi, F., Gottumukkala, R., & Chambers, T. (2022). Performance Evaluation of 1.1 MW Grid-Connected Solar Photovoltaic Power Plant in Louisiana. Energies. https://doi.org/10.3390/en15093420
Verma, S., Yadav, D., & Sengar, N. (2021). Performance Evaluation of Solar Photovoltaic Power Plants of Semi-Arid Region and Suggestions for Efficiency Improvement. International Journal of Renewable Energy Research, 762-775. https://dorl.net/dor/20.1001.1.13090127.2021.11.2.25.4.
Yahyaoui, I. (2018). Advances in Renewable Energies and Power Technologies - Volume 1: Solar and Wind Energies. Cambridge: Elsevier Inc.
Yuan, C., & Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. Multidisciplinary Scientific Journal, 226-235. doi:10.3390/j2020016.

