Meta AI released Llama 3 on April 18, 2024, claiming it to be the most advanced openly available LLM to date. Coinciding with this, OpenAI also unveiled GPT-4o on May 13, 2024, hailed as the cutting-edge proprietary model for various NLP benchmarks.
In a bid to compare open-source and proprietary models, a keen enthusiast conducted a test of both models on a basic zero-shot text classification task, the findings of which are detailed in this analysis.
The experiment involved testing the models on a dataset consisting of sentiments expressed in public tweets towards various US airlines. The dataset was preprocessed to include a balanced selection of neutral, positive, and negative tweets for accuracy comparison.
Firstly, utilizing the GPT-4o model from OpenAI, the sentiment of each tweet was predicted, achieving an accuracy of 78% in the process within a quick processing time of 57.8 seconds.
Subsequently, the Llama 3 model from Meta AI, operated through the Groq API, was employed for the same task. Surprisingly, the process was significantly slower, taking 4 minutes and 14 seconds to predict sentiments with an identical accuracy rate of 78%.
The comparison shed light on various aspects of the two models:
– Accuracy: Both GPT-4o and Llama 3 showcased similar accuracy levels for text classification tasks.
– Speed: Despite claims of superior speed by Groq, Llama 3 was notably slower compared to OpenAI’s GPT-4o during the experiment.
– Price: While GPT-4o comes at a high cost of $5/15 per million tokens, Llama 3 is freely available but demands substantial computational power.
In conclusion, Llama 3 emerges as a cost-effective choice for straightforward tasks like sentiment classification, but concerns arise over its computational requirements and latency. The comparison between the two models offers insights into their performance in real-world applications, highlighting strengths and areas for improvement in both proprietary and open-source AI technologies.