stabilizing the fine-tuning process of the target tasks, notably reducing the degenerate fine-tuning runs
实验:四个对照的中间任务
None: not using any intermediate task, i.e., the standard, vanilla RoBERTa fine-tuning.
HellaSwag: using HellaSwag as the intermediate task. (HellaSwag 是个 commobsense QA 数据集,给一个问题四个答案选一个,错误答案都是给机器 premise 生成的)
HellaSwag-p: using the first proposed baseline, which ablates HellaSwag’s premises. (只给四个选项,ablate 掉 reasoning,相当于让模型判断“是不是机器生成的”)
Syn_GPT2: using the second proposed intermediate task, which is synthesized by GPT2.(wiki 上的句子,前半句作为 promise,后半句作为正确选项,错误选项使用 GPT2 + premise 生成,相当于 ablate 掉 commensense(普通的句子里没有 commensense))