Title |
Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data |
Authors |
Perera-Lago J. , Toscano-Duran V. , PALUZO HIDALGO, EDUARDO, Narteni S. , Rucco M. |
External publication |
No |
Means |
Commun. Comput. Info. Sci. |
Scope |
Conference Paper |
Nature |
Científica |
SJR Quartile |
4 |
Web |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200676903&doi=10.1007%2f978-3-031-63803-9_21&partnerID=40&md5=70e33ab33c7cbb217e59d09da795d9d2 |
Publication date |
01/01/2024 |
Scopus Id |
2-s2.0-85200676903 |
DOI |
10.1007/978-3-031-63803-9_21 |
Abstract |
Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the e-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by e-representativeness, i.e., both of them have points closer than e, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that e-representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine-learning component widely adopted for dealing with tabular data. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024. |
Keywords |
Learning algorithms; Learning systems; Machine components; Machine learning; Feature importance; Fundamental component; Machine learning algorithms; Machine-learning; Measure approach; Multiple impact; Power; Representativeness; Vehicles collision; Xgboost; Decision trees |
Universidad Loyola members |
|