修改模型词嵌入层（wte）

KeepLin's blog

心有猛虎，细嗅蔷薇

KeepLin's blog

修改模型词嵌入层（wte）

发表于：2024-04-07 | 分类：深度学习

字数统计: 123 | 阅读时长: 1分钟 |

修改模型词嵌入的输入，（vocab_size, embedding_size）m使得模型可以解决多个任务，主要是vocab_size这个

报错解决

1
2

RuntimeError: Error(s)in loading state dict for GPT2LMHeadModel:.
size mismatch for wte.weight: copying a param with shape torch,size([58257, 768]) from checkpoint, the shape in current model is torch.size([78, 768]).

原始模型wte层维度为(58257, 768), 而想要其转换为我们所需的(78. 768)
初始化模型后直接使用 model.resize_token_embeddings(len(tokenizer)) 来将wte层的维度设置为tokenizer相同大小

作者: Keep Lin
文章链接: http://hust-keep-lin.github.io/2024/04/07/修改模型词嵌入层（wte）/
版权声明: 本网站所有文章除特别声明外,均采用 CC BY-NC-ND 4.0 许可协议。转载请注明出处!

有收获请作者喝杯咖啡呀~

支付宝

微信