The short history of open source large language model from chatGPT so far


Large language models (LLMs) have revolutionized the field of artificial intelligence, and their long-lasting impact is only growing stronger. OpenAI’s ChatGPT, a highly advanced conversational AI, has served as a significant breakthrough in recent months, igniting fierce competition among companies and researchers. Many are racing to develop state-of-the-art conversational AI systems, vying for the top spot against OpenAI’s remarkable feats.

Google has contributed with its Bard, which it fine-tuned on PaLM-E, and the development of a multi-modal LLM with GPT-4. Furthermore, Meta developed its own LLM known as LLaMa, as an answer to the open-source LLM push. Quite recently, an influx of information related to the latest LLMs has emerged, particularly because Meta chose to share the architecture of LLaMa with the research community for non-commercial purposes only.

Interestingly, LLaMa’s weights eventually leaked, allowing anyone, not just experts or commercial entities, to experiment with these high-performing models firsthand.

Meta released LLaMa on February 24th, 2023, with the primary goal of giving access to this performing LLM for the academic research community. The team provided four versions of LLaMa with varying parameters: 7B, 13B, 33B, and 65B. Like other large language models, LLaMa input a sequence of words and predicts the next word to generate text recursively. According to its paper, LLaMa-13B surpasses GPT-3 (175B) on most benchmarks, and LLaMa-65B rivals the best models, such as Chinchilla-70B (DeepMind) and PaLM-540B (Google).

The LLaMa model was released publicly for the research community for non-commercial purposes via Facebook Research GitHub. However, only the untrained model was made available, with the trained weights accessible separately through a Google Form for research purposes. It’s worth noting that training LLaMa on that scale required 2048 A100 GPUs, each costing about $15k. This showcases the tremendous resources needed for creating such a model.

Aside from the expenditure, having a large and clean dataset is crucial for training LLaMa. The models required trillions of tokens for training, with LLaMa-65B and LLaMa-33B being trained on 1.4 trillion tokens, while LLaMa-7B trained on one trillion tokens. With this pre-trained LLM, it becomes possible to fine-tune it to obtain a replica of ChatGPT, a conversational model capable of human interactions.

However, a significant challenge is obtaining the required data for fine-tuning the model without spending millions of dollars on human intervention. This is what OpenAI did to train InstructGPT, the model behind ChatGPT.

Stanford researchers discovered an inexpensive alternative, a way to fine-tune LLaMa without breaking the bank. They introduced Alpaca-7B, a model fine-tuned from the LLaMa-7B model on 52k instruction-following demonstrations. One key issue with instruction-following models, like ChatGPT, is the production of false information, the propagation of social stereotypes, and the generation of toxic language.

To solve these issues, OpenAI spent millions of dollars to evaluate “bad” answers using human feedback (RLHF) to create InstructGPT. However, OpenAI has not released the dataset used to train InstructGPT, making it a challenge to replicate this kind of model. Stanford researchers’ workaround relied on using Da-Vinci-003 from OpenAI, which was built on InstructGPT, to generate 52k instruction-following examples from 175 self-instructed seed tasks.

According to the Stanford team, generating the 52k instruction-following demonstrations cost around $500, while training the model on 8 80GB A100 GPUs cost approximately $100 and took only three hours. Despite the smaller model size, Alpaca and Da-Vinci-003 displayed similar performance in human evaluation regarding answer quality.

Furthermore, Vicuna, built on LLaMa’s original model, is claimed to perform almost on par with OpenAI ChatGPT or Google Bard on instruction-following tasks, while the overall cost of training remained remarkably low at $300. Two versions of Vicuna have been released for non-commercial use: 7B and 13B parameters. A significant upgrade in Vicuna compared to previous models is the increase in the max context length, from 512 with Alpaca to 2048 tokens.

One caveat with these models, however, is their large size and memory-intensive nature. Deploying them comes with high energy consumption and financial costs. This limitation has led some developers to believe that only corporations with access to large-scale infrastructure could truly benefit from these models. But that changed with Georgi Gerganov’s work on llama.ccp.

Gerganov’s llama.ccp code takes LLMs to the next level by converting popular instruction-following LLMs, originally coded in Python, into C/C++ language. C/C++ is a low-level programming language that does not require machine compilation, resulting in faster execution times. Additionally, the code supports 4-bit quantization, a process that converts 32-bit floating-point numbers (like weights and activation outputs) to the nearest 8-bit fixed-point numbers, resulting in smaller models and increased inferencing speed.

Thanks to the contributions of Gerganov and others, along with the leaked LLaMa weights, it is now possible to run any instruction-following model (like Alpaca or Vicuna) directly on a laptop. Various projects have detailed the process of running Vicuna on personal devices using llama.ccp, paving the way for accessible, open-source AI advancements without the constraints of substantial resources.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC