On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies
2021-04-19 2021-04-19
来源:arXiv:2104.05694v1 [cs.CL] 12 Apr 2021
机构:Stanford
任务:探究预训练模型的 inductive bias 是从哪里习得的
结论
预训练随机 mask (uniform masking), finetune cloze test, This mismatch between theory and practice raises questions about how MLM with uniform masking can learn useful inductive biases. 于是作者设计了一个实验,cloze-x% 就代表最后标签词被 MLM 预测的概率是 p%,其他的词以 (100-x)% 的概率被 MLM 任务在预训练时被预测。p = 100 就是只预测最后一个 label 的词。作者发现,如果把预训练换成 finetune 的 cloze 任务,效果会好得多
MLMs are implicitly trained to recover statistical dependencies among observed token 这部分都是数学推导,我没有看懂
最后的 conclusion:a substantial part of the performance gains of MLM pretraining cannot be attributed to task- specific, cloze-like masks. Instead, learning with task-agnostic, generic masks encourages the model to capture direct statistical dependencies among tokens, and we show through unsupervised parsing evaluations that this has a close correspondence to syntactic structures.