a 不同的 prompt 初始化:随机、从真正的 word 的 embedding 里随机找、用标签名(这里没太理解,prompt 长度很长的时候,怎么用标签名初始化?更长的部分呢?):用标签名最好
b prompt 长度:越长越好
c 预训练的任务:
T5 原先的预训练任务:输入 “Thank you me to your party week” 输出 “ for inviting last ”
span corruption:输入句子,输出 prompt
span corruption + sentinel:输入输出前面都拼个
LM adaptation:输入 prefix,输出句子
d 步数
这里面,小模型能胜过中模型,作者说是因为 there is random luck in which pre-trained checkpoints are able to “overcome” span corruption through prompting alone. This hypothesis could be tested by pre-training new models from scratch.
The ability of our prompts to match the performance of model tuning suggests that task definitions exist in their own subspace of parameters. Looking forward, we believe that factoring out task-defining parameters as distinct from general language-modeling parameters is an exciting step that opens up several avenues for new research.