llama&Qwen部署 | KeepLin's blog

KeepLin's blog

心有猛虎，细嗅蔷薇

KeepLin's blog

llama&Qwen部署

发表于：2024-04-07 | 分类：大模型

字数统计: 148 | 阅读时长: 1分钟 |

模型

Chinese Llama(使用中文预训练 + 微调的Atom版本)

Chinese Llama只是在原始llama上微调，但是原始llama是使用英文数据进行预训练的，所以用中文微调效果肯定一般
Atom仅仅使用llama的结构，使用中文数据预训练 + 微调

占用显卡资源

一张3090Ti
使用int4 int8 fp16量化会占用更少的资源

要求

cuda >= 11.6， flash-atten要求
[`pip install flash-atten’常见问题](https://github.com/Dao-AILab/flash-attention/issues/246)
QWen实现了 history_chat功能
Atom需要按照他的prompt实现history_chat功能

作者: Keep Lin
文章链接: http://hust-keep-lin.github.io/2024/04/07/llama-Qwen部署/
版权声明: 本网站所有文章除特别声明外,均采用 CC BY-NC-ND 4.0 许可协议。转载请注明出处!

有收获请作者喝杯咖啡呀~

支付宝

微信