LARGE LANGUAGE MODELS: COMPARISON OF CROSS-ENTROPY AND BINARY CROSS-ENTROPY LOSS

Ilmars  Apeinans; Sergejs  Kodors; Imants  Zarembo

doi:10.17770/het2024.28.8251

Authors

Ilmars Apeinans Rezekne Academy of Technologies (LV)
Dr.sc.ing. Sergejs Kodors Rezekne Academy of Technologies (LV)
Dr.sc.ing. Imants Zarembo Rezekne Academy of Technologies (LV)

DOI:

https://doi.org/10.17770/het2024.28.8251

Keywords:

artificial intellect, classifier, large language models, machine learning

Abstract

The paper explores Large Language Model (LLM) training on custom datasets for classification microservice development. As training general purpose models for every possible situation is not feasible on smaller scale, because of limitations of computation power, usage of smaller model architectures, such as NanoGPT for training LLM model for specific use-case is a more cost-effective solution. In this article the dataset “Internet Movie Database (IMDB)” is applied in the experiment for LLM training. The dataset IMDB contains user comments about movies. Training criteria was Cross-entropy Loss (CELoss) and Binary Cross-entropy Loss (BCELoss), which were compared in the experiment. LLM training showed that validation accuracy for CELoss is 85.84% while validation accuracy for BCELoss is 86.1%. The biggest difference was in the consistency of results as distance between minimal and maximal accuracy for CELoss was 2.36%, but BCELoss distance between minimal and maximal accuracy was 1.04% providing more stable accuracy.

Supporting Agencies

This research is funded by the Latvian Council of Science, project “Testing Interventions and Developing a Knowledge-based Recommendation System to Reduce Plate Waste in School Catering in Latvia”, project No. lzp-2022/1-0492.

Downloads

References

ChatGPT by OpenAI [Online] [Reference to 10.04.2024.]. Available: https://chat.openai.com/

Gemini by Google [Online] [Reference to 10.04.2024.]. Available: https://gemini.google.com/

Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. Learning word vectors for sentiment analysis. ACL Anthology. June 2011. [Online] Available: https://aclanthology.org/P11-1015/

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. Attention is all you need. June 2017. [Online] Available: https://arxiv.org/abs/1706.03762

CUDA toolkit [Online] [Reference to 10.04.2024.]. Available: https://developer.nvidia.com/cuda-toolkit

Cross-Entropy Loss and Its Applications in Deep Learning [Online] [Reference to 04.04.2024.]. Available:https://neptune.ai/blog/cross-entropy-loss-and-its-applications-in-deep-learning#:~:text=Cross%2Dentropy%20loss%20is%20the,how%20effective%20each%20model%20is

Shalev‐Shwartz, S., & Ben-David, S. Understanding machine learning. 2014. [Online] Available: https://doi.org/10.1017/cbo9781107298019

LARGE LANGUAGE MODELS: COMPARISON OF CROSS-ENTROPY AND BINARY CROSS-ENTROPY LOSS

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Online ISSN: 2592-8597

© RTU Rezekne Academy