LARGE LANGUAGE MODELS: COMPARISON OF CROSS-ENTROPY AND BINARY CROSS-ENTROPY LOSS
DOI:
https://doi.org/10.17770/het2024.28.8251Keywords:
artificial intellect, classifier, large language models, machine learningAbstract
The paper explores Large Language Model (LLM) training on custom datasets for classification microservice development. As training general purpose models for every possible situation is not feasible on smaller scale, because of limitations of computation power, usage of smaller model architectures, such as NanoGPT for training LLM model for specific use-case is a more cost-effective solution. In this article the dataset “Internet Movie Database (IMDB)” is applied in the experiment for LLM training. The dataset IMDB contains user comments about movies. Training criteria was Cross-entropy Loss (CELoss) and Binary Cross-entropy Loss (BCELoss), which were compared in the experiment. LLM training showed that validation accuracy for CELoss is 85.84% while validation accuracy for BCELoss is 86.1%. The biggest difference was in the consistency of results as distance between minimal and maximal accuracy for CELoss was 2.36%, but BCELoss distance between minimal and maximal accuracy was 1.04% providing more stable accuracy.
Downloads
References
ChatGPT by OpenAI [Online] [Reference to 10.04.2024.]. Available: https://chat.openai.com/
Gemini by Google [Online] [Reference to 10.04.2024.]. Available: https://gemini.google.com/
Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. Learning word vectors for sentiment analysis. ACL Anthology. June 2011. [Online] Available: https://aclanthology.org/P11-1015/
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. Attention is all you need. June 2017. [Online] Available: https://arxiv.org/abs/1706.03762
CUDA toolkit [Online] [Reference to 10.04.2024.]. Available: https://developer.nvidia.com/cuda-toolkit
Cross-Entropy Loss and Its Applications in Deep Learning [Online] [Reference to 04.04.2024.]. Available:https://neptune.ai/blog/cross-entropy-loss-and-its-applications-in-deep-learning#:~:text=Cross%2Dentropy%20loss%20is%20the,how%20effective%20each%20model%20is
Shalev‐Shwartz, S., & Ben-David, S. Understanding machine learning. 2014. [Online] Available: https://doi.org/10.1017/cbo9781107298019