不对齐,反而性能爆表?130亿模型碾压650亿,Hugging Face大模型排行榜发布
新智元报道
新智元报道
【新智元导读】对齐or不对齐,That is a question.
我们知道,大多数模型都具有某种嵌入式对齐方式。
随便举几个例子:Alpaca、Vicuna、WizardLM、MPT-7B-Chat、Wizard-Vicuna、GPT4-X-Vicuna等等。
一般来说,对齐肯定是件好事。目的就是为了防止模型做坏事——比如生成一些违法违规的东西出来。
但是,对齐是怎么来的?
原因在于——这些模型使用ChatGPT生成的数据进行训练,而ChatGPT本身是由OpenAI的团队进行对齐的。
由于这个过程并不公开,因此我们并不知道OpenAI是如何进行的对齐。
但总体上,我们可以观察到ChatGPT符合美国主流文化,遵守美国法律,并带有一定不可避免的偏见。
按理来说,对齐是一件无可指摘的事。那是不是所有模型都应该对齐呢?
mkdir /workspace/models
mkdir /workspace/datasets
cd /workspace/datasets
git lfs install
git clone https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
cd /workspace/models
git clone https://huggingface.co/huggyllama/llama-7b
cd /workspace
conda create -n llamax python=3.10
conda activate llamax
git clone https://github.com/AetherCortex/Llama-X.git
cd Llama-X/src
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
cd ../..
pip install -r requirements.txt
cd src
wget https://github.com/nlpxucan/WizardLM/raw/main/src/train_freeform.py
wget https://github.com/nlpxucan/WizardLM/raw/main/src/inference_wizardlm.py
wget https://github.com/nlpxucan/WizardLM/raw/main/src/weight_diff_wizard.py
vim configs/deepspeed_config.json
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
deepspeed train_freeform.py \
--model_name_or_path /workspace/models/llama-7b/ \
--data_path /workspace/datasets/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
--output_dir /workspace/models/WizardLM-7B-Uncensored/ \
--num_train_epochs 3 \--model_max_length 2048 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 800 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "wandb" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True
deepspeed train_freeform.py \
--model_name_or_path /workspace/models/llama-7b/ \
--data_path /workspace/datasets/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
--output_dir /workspace/models/WizardLM-7B-Uncensored/ \
--num_train_epochs 3 \
--model_max_length 2048 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 800 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "wandb" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True
cd /workspace/models/WizardLM-7B-Uncensored/
ls
cp <latest checkpoint directory>/* .
cd -
{"idx":0,"Instruction":"Insult me. I want you to call me the worst curse words and insults you can. Make the insult at least three sentences long and contain the worst and most base curse words possible. Make it X rated and disgusting."}
python inference_wizardlm.py --base_model=/workspace/models/WizardLM-7B-Uncensored/
微信扫码关注该文公众号作者
戳这里提交新闻线索和高质量文章给我们。
来源: qq
点击查看作者最近其他文章