On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

  • 来源:arXiv:2104.05694v1 [cs.CL] 12 Apr 2021
  • 机构:Stanford
  • 任务:探究预训练模型的 inductive bias 是从哪里习得的
  • 结论
    • 预训练随机 mask (uniform masking), finetune cloze test, This mismatch between theory and practice raises questions about how MLM with uniform masking can learn useful inductive biases.

      于是作者设计了一个实验,cloze-x% 就代表最后标签词被 MLM 预测的概率是 p%,其他的词以 (100-x)% 的概率被 MLM 任务在预训练时被预测。p = 100 就是只预测最后一个 label 的词。作者发现,如果把预训练换成 finetune 的 cloze 任务,效果会好得多
    • MLMs are implicitly trained to recover statistical dependencies among observed token
    • 最后的 conclusion:a substantial part of the performance gains of MLM pretraining cannot be attributed to task- specific, cloze-like masks. Instead, learning with task-agnostic, generic masks encourages the model to capture direct statistical dependencies among tokens, and we show through unsupervised parsing evaluations that this has a close correspondence to syntactic structures.