Data driven initialization for machine learning classification models

López Jaimes, David Santiago

doi:https://doi.org/10.48713/10336_34737

Ítem

Acceso Abierto

Data driven initialization for machine learning classification models

Mostrar el registro sencillo de la publicación

dc.contributor.advisor	Caicedo Dorado, Alexander
dc.creator	López Jaimes, David Santiago
dc.creator.degree	Profesional en Matemáticas Aplicadas y Ciencias de la Computación	es
dc.creator.degreeLevel	Pregrado
dc.creator.degreetype	Full time	es
dc.date.accessioned	2022-08-22T19:50:57Z
dc.date.available	2022-08-22T19:50:57Z
dc.date.created	2022-05-08
dc.description	El principal objetivo de este proyecto de grado es desarrollar una estrategia para la inicialización de los parámetros θ tanto para la regresión logística (clasificador lineal) como para la regresión multinomial, y las redes neuronales clásicas (fully connected feed-forward). Esta inicialización se basó en las propiedades de la distribución estadística de los datos con los que se entrenan los modelos. Esto con el fin de inicializar el modelo en una región de la función de costo más adecuada y así, pueda llegar a una mejorar su tasa de convergencia, y producir mejores resultados en menores tiempos de entrenamiento. La tesis presenta una explicación intuitiva y matemática de los modelos de inicialización propuestos, y contrasta el desarrollo teórico con un benchmark donde se utilizaron diferentes datasets, incluyendo toy examples. Así mismo, también se presenta un análisis de estos resultados, se discuten las limitaciones de las propuestas y el trabajo futuro que se puede derivar a partir de este trabajo.	es
dc.description.abstract	Thanks to the great advance technology has had, the increase in computer resources and the strong impact that the era of "big data" has had on society, artificial intel- ligence has become a highly studied and used area. Machine learning is a branch of artificial intelligence which main objective is to build models that are capable of learning from a set of data, without the need to be explicitly programmed. These models use tools from different branches of mathematics, such as statistics and lin- ear algebra, to identify patterns and relationships between a set of data. Regarding machine learning, it allows us to generate models that are capable of classifying a set of data based on its intrinsic characteristics and its relationship with an objective variable. These models are widely used in real-life problems, such as classifying a bank transaction as malicious or normal, determining with a certain probability whether a tumor is malignant or benign, estimating a person’s credit risk, among others. Most of these classification models learn through the use of gradient descent or its variations. This is an iterative algorithm which allows finding the parameters θ of the model that minimize a cost function and allow an adequate classification. These parameters are initialized randomly. However, there are several limitations when training these models. The real data is in considerably large dimensions, and it is difficult to know the shape of the cost surface that is generated with it. This causes the models to require a lot of care, and a large amount of time and computational resources for their training. On the other hand, due that in most cases the cost func- tion is not convex as it normally happens in neural networks, it is possible that when initializing the weights randomly, the algorithm stalls because it was initialized in a flat region of the cost function, or that it initializes in a very rough region and does not converge to an appropriate minimum. This is way the present study aims to propose an initialization strategy for classi- fication problems that initialize the models in an appropriate region of the cost func- tion in order to improve its convergence rate and produce better results in faster training times. We aim to propose a new deterministic initialization strategy for the logistic regression (linear classifier), the multinomial logistic regression and the classical neural networks (fully connected feed-forward) for classification problems. We proposed an initialization strategy based on the properties of the statistical distribution of the data on which the models are trained. For the logistic regression and the multinomial logistic regression we propose to initialize the models with a characteristic vector of the data distribution of each class, such as its mean or me- dian. In the fully connected feed-forward neural networks we propose to use pro- totype data of each one of the classes. These prototype data are not the most repre- sentative data of the entire class distribution, but in this case, they are data that map and linearize the separation boundary with the other classes. A benchmark for the initialization proposal was made using various real datasets for classification tasks from the UCI and Kaggle repositories. We also tested the proposed initializations with different toy examples. In the logistic regression, we compared the behavior of the model using ran- dom initialization and using the proposed initialization. For fully connected feed- forward neural networks, we compared the behavior of the neural networks using the proposed initialization and the state of the art initializations for these models,Xavier’s and He’s initialization. In both cases, we were able to successfully initial- ize the models reducing it’s required training time and making the learning algo- rithm start in a better region of the cost function. In this way, we proposed new initialization strategies for the multinomial logistic regression and the neural network models for classification problems. The logistic regression initialization is based on statistical estimators of the data distribution and distance metrics, particularly the mean and the euclidean distance between different scalar products. The neural networks strategy is based on the decision boundary linearization using prototype data of each one of the classes. We have seen that our approach works very well for all the tested datasets, considerably reducing the computational resources required for the training of these models and increasing their performance.	es
dc.format.extent	94 pp	es
dc.format.mimetype	application/pdf	es
dc.identifier.doi	https://doi.org/10.48713/10336_34737
dc.identifier.uri	https://repository.urosario.edu.co/handle/10336/34737
dc.language.iso	eng	es
dc.publisher	Universidad del Rosario
dc.publisher.department	Escuela de Ingeniería, Ciencia y Tecnología
dc.publisher.program	Programa de Matemáticas Aplicadas y Ciencias de la Computación - MACC
dc.rights	Atribución-NoComercial-CompartirIgual 2.5 Colombia	*
dc.rights.accesRights	info:eu-repo/semantics/openAccess	es
dc.rights.acceso	Abierto (Texto Completo)	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/2.5/co/	*
dc.source.bibliographicCitation	F. Luis and G. Moncayo, No Title, Third, T. Dietterich, C. Bishop, D. Hecker- man, M. Jordan, and M. Kearns, Eds., ISBN: 9788490225370	es
dc.source.bibliographicCitation	R. S. M. Carbonell J.G. and T. M. Mitchell, “Machine Learning: A Historical and Methodological Analysis,” AI Mag., vol. 4, no. 3, 1983.	es
dc.source.bibliographicCitation	R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification,” International Journal of Ad- vanced Research in Artificial Intelligence, vol. 2, no. 2, 2013, ISSN: 21654050. DOI: 10.14569/ijarai.2013.020206.	es
dc.source.bibliographicCitation	L. M. Castro Heredia, Y. Carvajal Escobar, and Á. J. Ávila Díaz, “Análisis Clúster como Técnica de Análisis Exploratorio de Registros Múltiples en Datos Meteorológicos,” Ingeniería de Recursos Naturales y del Ambiente, vol. enero-dici, no. 11, pp. 11–20, 2012.	es
dc.source.bibliographicCitation	H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A. Moustafa, “The application of unsupervised clustering methods to Alzheimer’s disease,” Frontiers in Computational Neuroscience, vol. 13, no. May, pp. 1–9, 2019, ISSN: 16625188. DOI: 10.3389/fncom.2019.00031.	es
dc.source.bibliographicCitation	D. N. Wagner, “Economic patterns in a world with artificial intelligence,” Evo- lutionary and Institutional Economics Review, vol. 17, no. 1, pp. 111–131, 2020, ISSN: 1349-4961. DOI: 10.1007/s40844-019-00157-x. [Online]. Available: https://doi.org/10.1007/s40844-019-00157-x.	es
dc.source.bibliographicCitation	Z. Wang, Q. Wang, and D. W. Wang, “Bayesian network based business in- formation retrieval model,” Knowledge and Information Systems, vol. 20, no. 1, pp. 63–79, 2009, ISSN: 02193116. DOI: 10.1007/s10115-008-0151-5.	es
dc.source.bibliographicCitation	C. Lemaréchal, “Cauchy and the Gradient Method,” Documenta Mathematica, vol. ISMP, pp. 251–254, 2012. [Online]. Available: https://www.math.uni- bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf.	es
dc.source.bibliographicCitation	S. Ruder, “An overview of gradient descent optimization algorithms,” pp. 1– 14, 2016. arXiv: 1609.04747. [Online]. Available: http://arxiv.org/abs/ 1609.04747.	es
dc.source.bibliographicCitation	A. Lydia and S. Francis, “Adagrad - an optimizer for stochastic gradient de- scent,” vol. Volume 6, pp. 566–568, May 2019.	es
dc.source.bibliographicCitation	H. Shaziya, “A study of the optimization algorithms in deep learning,” Mar. 2020. DOI: 10.1109/ICISC44355.2019.9036442.	es
dc.source.bibliographicCitation	D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014. DOI: 10.48550/ARXIV.1412.6980. [Online]. Available: https://arxiv.org/abs/ 1412.6980.	es
dc.source.bibliographicCitation	Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship Rotated Bounding Box Space for Ship Extraction from High-Resolution Optical Satellite Images with Com- plex Backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 8, pp. 1074–1078, 2016, ISSN: 1545598X. DOI: 10.1109/LGRS.2016.2565705.	es
dc.source.bibliographicCitation	S. K. Kumar, “On weight initialization in deep neural networks,” pp. 1–9, 2017. arXiv: 1704.08863. [Online]. Available: http://arxiv.org/abs/1704.08863.	es
dc.source.bibliographicCitation	K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1026–1034, 2015, ISSN: 15505499. DOI: 10.1109/ICCV.2015.123. arXiv: 1502.01852.	es
dc.source.bibliographicCitation	H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in Neural Information Processing Systems, vol. 2018-December, no. NeurIPS 2018, pp. 6389–6399, 2018, ISSN: 10495258. arXiv: 1712.09913.	es
dc.source.bibliographicCitation	E. Hurtado Dianderas, “Modelo De Regresión Logística,” Gestión en el Tercer Milenio, vol. 10, no. 20, pp. 25–27, 2007, ISSN: 1560-9081. DOI: 10.15381/gtm. v10i20.9059.	es
dc.source.bibliographicCitation	V. F. Dr. Vladimir, No Title No Title No Title, 69. 1967, vol. 1, pp. 5–24, ISBN: 9781107057135.	es
dc.source.bibliographicCitation	A. Field, “Logistic regression Logistic regression Logistic regression,” Discov- ering Statistics Using SPSS, pp. 731–735, 2012.	es
dc.source.bibliographicCitation	L. Camarero Rioja, A. Almazán Llorente, and B. Mañas Ramírez, “Regresión Logística : Fundamentos y aplicación a la investigación sociológica,” Análi- sis Multivariante, p. 61, 2011. [Online]. Available: https://www2.uned.es/ socioestadistica/Multivariante/Odd_Ratio_LogitV2.pdf.	es
dc.source.bibliographicCitation	G. Zurek, M. Blach, P. Giedziun, J. Czakon, L. Fulawka, and L. Halo, “Brain tumor classification using logistic regression and linear support vector classi- fier,” no. September 2016, pp. 1–4, 2015.	es
dc.source.bibliographicCitation	Ö. Çokluk, “Logistic regression: Concept and application,” Kuram ve Uygula- mada Egitim Bilimleri, vol. 10, no. 3, pp. 1397–1407, 2010, ISSN: 13030485.	es
dc.source.bibliographicCitation	M. A. Aljarrah, F. Famoye, and C. Lee, “Generalized logistic distribution and its regression model,” Journal of Statistical Distributions and Applications, vol. 7, no. 1, 2020, ISSN: 21955832. DOI: 10.1186/s40488-020-00107-8.	es
dc.source.bibliographicCitation	M. K. Cain, Z. Zhang, and K. H. Yuan, “Univariate and multivariate skew- ness and kurtosis for measuring nonnormality: Prevalence, influence and es- timation,” Behavior Research Methods, vol. 49, no. 5, pp. 1716–1735, 2017, ISSN: 15543528. DOI: 10.3758/s13428-016-0814-1.	es
dc.source.bibliographicCitation	T. et. all. Hastie, “Springer Series in Statistics The Elements of Statistical Learning,” The Mathematical Intelligencer, vol. 27, no. 2, pp. 83–85, 2009, ISSN: 03436993. [Online]. Available: http : / / www . springerlink . com / index / D7X7KX6772HQ2135.pdf.	es
dc.source.bibliographicCitation	P. Xu, F. Davoine, T. Denoeux, et al., “Evidential multinomial logistic regres- sion for multiclass classifier calibration To cite this version : HAL Id : hal- 01271569 Evidential multinomial logistic regression for multiclass classifier calibration,” Universite de technologie de Compie‘gne, 2016.	es
dc.source.bibliographicCitation	R. Rifkin, “In Defense of One-Vs-All Classification In Defense of One-Vs-All Classification,” Journal of Machine Learning Research 5 (2004) 101-141, no. June 2014, 2004.	es
dc.source.bibliographicCitation	N. S. Themudo, “ Emanuela Bozzini and Bernard Enjolras (Eds.), Governing Ambiguities: New Forms of Local Governance and Civil Society,” Journal of Comparative Policy Analysis: Research and Practice, vol. 15, no. 5, pp. 476–477, 2013, ISSN: 1387-6988. DOI: 10.1080/13876988.2013.846961.	es
dc.source.bibliographicCitation	N. Psy and C. D. Analysis, “Total Categories of the Outcome, Indexed By the Subscript,” Spring, pp. 1–5, 2021.	es
dc.source.bibliographicCitation	T. M. Mitchell and T. M. Mitchell, “The Need for Biases in Learning Generalizations by Computer Science Department Rutgers University New Brunswick , NJ 08904 published May 1980 as Rutgers CS tech report CBM-TR- 117 The Need for Biases in Learning Generalizations,” no. May, 1980.	es
dc.source.bibliographicCitation	P. T. Pregnancy, T. I. Fertility, and F. Growth, “And Development And Devel- opment,” Learning, vol. 50, no. 2011, pp. 681–730, 2005. DOI: 10.1016/j.cell. 2017.06.036.Evolution. [Online]. Available: http://tailieudientu.lrc. tnu.edu.vn/Upload/Collection/brief/brief_49491_54583_TN201500606. pdf.	es
dc.source.bibliographicCitation	F Amenta, D Zaccheo, and W. L. Collier, “Neurotransmitters, neuroreceptors and aging,” Mechanisms of Ageing and Development, vol. 61, no. 3, pp. 249–273, 1991, ISSN: 0047-6374. DOI: https://doi.org/10.1016/0047-6374(91)90059- 9. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/0047637491900599.	es
dc.source.bibliographicCitation	C. Mavridis and J. Baras, “Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach,” 2021. arXiv: 2112.02256. [Online]. Available: http://arxiv.org/abs/2112.02256.	es
dc.source.bibliographicCitation	R. Matsumura, K. Harada, Y. Domae, and W. Wan, “Learning based industrial bin-picking trained with approximate physics simulator,” Advances in Intelli- gent Systems and Computing, vol. 867, pp. 786–798, 2019, ISSN: 21945357. DOI: 10.1007/978-3-030-01370-7_61. arXiv: 1805.08936.	es
dc.source.bibliographicCitation	T. Zamora, Zumbado, “McCulloch-Pitts Artificial Neuron and Rosenblatt ’ s Perceptron,” pp. 16–29,	es
dc.source.bibliographicCitation	S. Haykin, “Rosenblatt ’ s Perceptron,” Neural Networks and Learning Machines, no. 1943, pp. 47–67, 2009.	es
dc.source.bibliographicCitation	D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.	es
dc.source.bibliographicCitation	B. Cheng, R. Xiao, Y. Guo, Y. Hu, J. Wang, and L. Zhang, “Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition,” 2018. arXiv: 1809.06131. [Online]. Available: http: //arxiv.org/abs/1809.06131.	es
dc.source.bibliographicCitation	P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, “Data-dependent ini- tializations of convolutional neural networks,” 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016. arXiv: 1511.06856.	es
dc.source.bibliographicCitation	J. W. Grzymala-Busse, Z. S. Hippe, and T. Mroczek, “Reduced data sets and entropy-based discretization,” Entropy, vol. 21, no. 11, pp. 1–11, 2019, ISSN: 10994300. DOI: 10.3390/e21111051.	es
dc.source.bibliographicCitation	S. Parashar, N. Fateh, V. Pediroda, and C. Poloni, “Self organizing maps (SOM) for design selection in multi-objective optimization using modeFRONTIER,” SAE Technical Papers, no. August 2016, 2008, ISSN: 26883627. DOI: 10.4271/ 2008-01-0874.	es
dc.source.bibliographicCitation	D. Reddy and P. K. Jana, “A new clustering algorithm based on Voronoi dia- gram,” International Journal of Data Mining, Modelling and Management, vol. 6, no. 1, pp. 49–64, 2014, ISSN: 17591171. DOI: 10.1504/IJDMMM.2014.059977.	es
dc.source.bibliographicCitation	T. H. Sardar and Z. Ansari, “Partition based clustering of large datasets us- ing MapReduce framework: An analysis of recent themes and directions,” Fu- ture Computing and Informatics Journal, vol. 3, no. 2, pp. 247–261, 2018, ISSN: 23147288. DOI: 10.1016/j.fcij.2018.06.002. [Online]. Available: https: //doi.org/10.1016/j.fcij.2018.06.002.	es
dc.source.bibliographicCitation	O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A Survey on Locality Sensitive Hashing Algorithms and their Applications,” ACM Comput- ing Surveys, no. April, pp. 0–23, 2021. arXiv: 2102.08942. [Online]. Available: http://arxiv.org/abs/2102.08942.	es
dc.source.bibliographicCitation	C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020. DOI: 10.1038/ s41586-020-2649-2. [Online]. Available: https://doi.org/10.1038/s41586- 020-2649-2.	es
dc.source.bibliographicCitation	J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007. DOI: 10.1109/MCSE.2007.55.	es
dc.source.bibliographicCitation	T. pandas development team, Pandas-dev/pandas: Pandas, version latest, Feb. 2020. DOI: 10.5281/zenodo.3509134. [Online]. Available: https://doi.org/ 10.5281/zenodo.3509134.	es
dc.source.bibliographicCitation	M. L. Waskom, “Seaborn: Statistical data visualization,” Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021. DOI: 10.21105/joss.03021. [Online]. Available: https://doi.org/10.21105/joss.03021.	es
dc.source.bibliographicCitation	F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.	es
dc.source.bibliographicCitation	D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available: http://archive.ics.uci.edu/ml.	es
dc.source.bibliographicCitation	B.-H. A. Guyon Isabelle Gunn R. Steve and G. Dror, Result analysis of the nips 2003 feature selection challenge, 2004.	es
dc.source.instname	instname:Universidad del Rosario
dc.source.reponame	reponame:Repositorio Institucional EdocUR
dc.subject	Redes Neuronales	es
dc.subject	Regresión Logística	es
dc.subject	Gradiente Descendiente	es
dc.subject	Parámetros de un Modelo de Clasificación	es
dc.subject	Vectores Característicos	es
dc.subject	Distribución de las Clases	es
dc.subject.ddc	Matemáticas	es
dc.subject.keyword	Neural Networks	es
dc.subject.keyword	Logistic Regression	es
dc.subject.keyword	Characteristic Vectors	es
dc.subject.keyword	Classes Distributions	es
dc.subject.keyword	Classification Models Parameters	es
dc.title	Data driven initialization for machine learning classification models	es
dc.type	bachelorThesis	es
dc.type.document	Análisis de caso	es
dc.type.hasVersion	info:eu-repo/semantics/acceptedVersion
dc.type.spa	Trabajo de grado	es