Based on LLaMA but changing the tensor name, Kai-Fu Lee's large model caused controversy, and the official response came

2023-11-15 06:38:25

Original source: Heart of the Machine

Image source: Generated by Unbounded AI

Some researchers have found that Kai-Fu Lee’s Yi-34B model basically adopts the architecture of LLaMA, but renames two tensors. In response, “Zero One Everything” gave an official response.

Some time ago, a new model was ushered in in the field of open-source large models - the context window size exceeded 200k, and the “Yi” that can process 400,000 Chinese characters at a time.

This large-scale model is built by the large-scale model company “Zero One Everything” founded by Kai-Fu Lee, chairman of Sinovation Ventures and CE0, and includes two versions: Yi-6B and Yi-34B.

According to the Hugging Face English open source community platform and C-Chinese evaluation list, Yi-34B has achieved a number of SOTA international best performance index recognition when it was launched, becoming the “double champion” of global open source large models, beating LLaMA2 and Falcon and other open source competitors.

The Yi-34B also became the only domestic model that successfully topped the Hugging Face global open source model ranking at that time, calling it “the world’s strongest open source model”.

Recently, however, some researchers have discovered that the Yi-34B model basically adopts the architecture of LLaMA, but renames the two tensors.

Original link:

The post also states:

Yi-34B’s code is actually a refactoring of the LLaMA code, but it doesn’t seem to have changed anything substantially. This model is clearly based on the original Apache version 2.0 LLaMA file, but does not mention LLaMA:

Yi vs LLaMA code comparison. Code Link:

In addition, these code changes are not submitted to the transformers project via a pull request, but are appended as external code, which may be a security risk or not supported by the framework. The HuggingFace leaderboard won’t even benchmark this model with a context window of up to 200K because it doesn’t have a custom code strategy.

They claim that this is a 32K model, but it is configured as a 4K model, there is no RoPE scaling configuration, and there is no explanation of how to scale (note: Zero One Thousand Things previously stated that the model itself was trained on a sequence of 4K, but it could be scaled to 32K during the inference phase). At the moment, there is zero information about its fine-tuning data. They also don’t provide instructions to replicate their benchmarks, including suspicious MMLU high scores.

Anyone who has worked in the AI space for a while won’t turn a blind eye to this. Is this false propaganda? license violations? actual benchmark cheating? who cares? change the next paper, or in this case, take all the venture money. Yi is at least above the norm because it’s the base model, and the performance is really good.

And a few days ago, in the Zero One Everything Huggingface community, a developer also pointed out:

As far as we know, Yi uses the architecture of LLaMA entirely, except for two tensors that have been renamed. (input_layernorm, post_attention_layernorm)

During the discussion, some netizens said that if they use the Meta LLaMA architecture, codebase and all related resources exactly, they need to comply with the license agreement stipulated by LLaMA.

In order to comply with LLaMA’s open source license, one developer changed its name back and put it back on huggingface:

01-ai/Yi-34B, tensors have been renamed to match the standard LLaMA model code. Related Links:

Seeing this, we also know which company Jia Yangqing, who left Ali to start a business a few days ago, mentioned in the circle of friends.

In response to this matter, the heart of the machine also verified the zero and one things. Zero One Thing responded:

GPT is a well-established architecture recognized in the industry, and LLaMA summarizes it on GPT. The structural design of the R&D model is based on GPT’s mature structure, drawing on the industry’s top public achievements, and a lot of work has been done based on the understanding of the model and training by the Zero One Everything team, which is one of the foundations for our first release and excellent results. At the same time, Zero One Everything is also continuing to explore the essential breakthrough at the structural level of the model.

model structure is only one part of model training. Yi’s open-source model efforts in other aspects, such as data engineering, training methods, baby sitting, hyperparameter settings, evaluation methods, and the depth of understanding of the nature of evaluation indicators, the depth of research on the principles of model generalization capabilities, and the industry’s top AI Infra capabilities, etc., a lot of R&D and foundation work has been invested, which can often play a greater role and value than the basic structure, which is also the core technology moat of zero 10 things in the pre-training stage of large models.

In the process of a large number of training experiments, the code was renamed due to the need for experimental execution, and we respected the feedback of the open source community, updated the code, and better integrated into the Transformer ecosystem.

We are very grateful for the feedback from the community, we are just starting out in the open source community, and we hope to work with you to create a prosperous community, and Yi Open-source will do its best to continue to improve.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.