Comparative Analysis of Gradient-Based Optimizers in Feedforward Neural Networks for Titanic Survival Prediction

I Putu  Adi Pratama; Ni Wayan Jeri Kusuma Dewi

doi:10.56705/ijodas.v6i1.219

Authors

I Putu Adi Pratama UHN IGB Sugriwa Denpasar
Ni Wayan Jeri Kusuma Dewi Institut Bisnis dan Teknologi Indonesia

DOI:

https://doi.org/10.56705/ijodas.v6i1.219

Keywords:

Feedforward Neural Networks (FNNs), Gradient-based Optimisation Algorithms, Learning Rate Scheduler, Titanic Survival Prediction, Binary Classification

Abstract

Introduction: Feedforward Neural Networks (FNNs), or Multilayer Perceptrons (MLPs), are widely recognized for their capacity to model complex nonlinear relationships. This study aims to evaluate the performance of various gradient-based optimization algorithms in training FNNs for Titanic survival prediction, a binary classification task on structured tabular data. Methods: The Titanic dataset consisting of 891 passenger records was pre-processed via feature selection, encoding, and normalization. Three FNN architectures—small ([64, 32, 16]), medium ([128, 64, 32]), and large ([256, 128, 64])—were trained using eight gradient-based optimizers: BGD, SGD, Mini-Batch GD, NAG, Heavy Ball, Adam, RMSprop, and Nadam. Regularization techniques such as dropout and L2 penalty, along with batch normalization and Leaky ReLU activation, were applied. Training was conducted with and without a dynamic learning rate scheduler, and model performance was evaluated using accuracy, precision, recall, F1-score, and cross-entropy loss. Results: The Adam optimizer combined with the medium architecture achieved the highest accuracy of 82.68% and an F1-score of 0.77 when using a learning rate scheduler. RMSprop and Nadam also performed competitively. Models without learning rate schedulers generally showed reduced performance and slower convergence. Smaller architectures trained faster but yielded lower accuracy, while larger architectures offered marginal gains at the cost of computational efficiency. Conclusions: Adam demonstrated superior performance among the tested optimizers, especially when coupled with learning rate scheduling. These findings highlight the importance of optimizer choice and learning rate adaptation in enhancing FNN performance on tabular datasets. Future research should explore additional architectures and optimization strategies for broader generalizability

Downloads

Download data is not yet available.

References

Y. Ai, “Predicting Titanic Survivors by Using Machine Learning,” Highlights in Science, Engineering and Technology, vol. 34, pp. 360–367, 2023.

M Yasser H, “Titanic Dataset.”

N. K. Pandey and A. Jain, “Predicting survival on titanic through exploratory data analysis and logistic regression: Unraveling historical patterns and insights,” The Pharma Innovation Journal, vol. 8, no. 4, pp. 30–37, Jan. 2019.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. Available: http://www.deeplearningbook.org

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proceedings of the 30th International Conference on Machine Learning, 2013.

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations (ICLR), Dec. 2015. Doi: 10.48550/arXiv.1412.6980

S. Ruder, “An overview of gradient descent optimization algorithms,” Sep. 2016. doi: 10.48550/arXiv.1609.04747

Y. Bengio, “Practical recommendations for gradient-based training of deep architectures,” in Neural Networks: Tricks of the Trade, 2012, pp. 437–478. doi: 10.48550/arXiv.1206.5533

S. Y. Zhao, C. W. Shi, Y. P. Xie, and W. J. Li, “Stochastic normalized gradient descent with momentum for large-batch training,” Science China Information Sciences, vol. 67, no. 11, pp. 1–15, Nov. 2024, doi: 10.1007/s11432-022-3892-8.

P. Gou and J. Yu, “A nonlinear ANN equalizer with mini-batch gradient descent in 40Gbaud PAM-8 IM/DD system,” Optical Fiber Technology, vol. 46, pp. 113–117, Dec. 2018, doi: 10.1016/j.yofte.2018.09.015.

I. Dagal, K. Tanriöven, A. Nayir, and B. Akın, “Adaptive Stochastic Gradient Descent (SGD) for Erratic Datasets,” Future Generation Computer Systems, Dec. 2024, doi: 10.1016/j.future.2024.107682.

M. S. Shamaee, S. F. Hafshejani, and Z. Saeidian, “New logarithmic step size for stochastic gradient descent,” Front Comput Sci, vol. 19, no. 1, pp. 1–10, Jan. 2025, doi: 10.1007/s11704-023-3245-z.

V. Borisov, T. Leemann, K. Sebler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep Neural Networks and Tabular Data: A Survey,” IEEE Trans Neural Netw Learn Syst, vol. 35, no. 6, pp. 7499–7519, Jun. 2024, doi: 10.1109/TNNLS.2022.3229161.

K.-Y. Chen, P.-H. Chiang, H.-R. Chou, T.-W. Chen, and T.-H. Chang, “Trompt: Towards a Better Deep Neural Network for Tabular Data,” in Proceedings of the 40th International Conference on Machine Learning, Jul. 2023, pp. 4392–4434.

A. S. Nazdryukhin, A. M. Fedrak, and N. A. Radeev, “Neural networks for classification problem on tabular data,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Dec. 2021, pp. 1–8. doi: 10.1088/1742-6596/2142/1/012013.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models,” in Proceedings of the 30th International Conference on Machine Learning, 2013.

A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, 2nd Edition. O’Reilly, 2017.

M. A. Arshed, “Multi-Class Skin Cancer Classification Using Vision Transformer Networks and Convolutional Neural Network-Based Pre-Trained Models,” Inf., vol. 14, no. 7, 2023, doi: 10.3390/info14070415.

S. Deshmukh, “Skin Cancer Classification Using CNN,” International Conference on Applied Intelligence and Sustainable Computing, ICAISC 2023. 2023, doi: 10.1109/ICAISC58445.2023.10199440.

R. Bharti, “Comparative Analysis of Potato Leaf Disease Classification Using CNN and ResNet50,” 2024 International Conference on Data Science and Its Applications, ICoDSA 2024. pp. 87–91, 2024, doi: 10.1109/ICoDSA62899.2024.10651649.

G. B. Alghanimi, “CNN and ResNet50 Model Design for Improved Ultrasound Thyroid Nodules Detection,” 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems, ICETSIS 2024. pp. 1000–1004, 2024, doi: 10.1109/ICETSIS61505.2024.10459588.

M. Singla, “ResNet50 Utilization for Bag Classification: A CNN Model Visualization Approach in Deep Learning,” 2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems, ICITEICS 2024. 2024, doi: 10.1109/ICITEICS61368.2024.10624847.

J. Girija, “Innovative Precision Medicine: An Explainable AI- Driven Biomarker-Guided Recommendation System with Multilayer FeedForward Neural Network Model,” Lecture Notes in Networks and Systems, vol. 1071. pp. 438–447, 2024, doi: 10.1007/978-3-031-66410-6_35.

H. Li, “Research on music signal feature recognition and reproduction technology based on multilayer feedforward neural network,” Appl. Math. Nonlinear Sci., vol. 9, no. 1, 2024, doi: 10.2478/amns.2023.2.00647.

V. Chandrabanshi, “A novel framework using 3D-CNN and BiLSTM model with dynamic learning rate scheduler for visual speech recognition,” Signal, Image Video Process., vol. 18, no. 6, pp. 5433–5448, 2024, doi: 10.1007/s11760-024-03245-7.