这篇文章制定了 position embedding (PE) 的三个性质,对不同 PE 在不同任务上进行了对比
PE 的三个性质:monotonicity, translation invariance, and symmetry
1) neighboring positions are embedded closer than faraway ones;
2) distances of two arbitrary m-offset position vectors are identical;
3) the metric (distance) itself is symmetric.
两类 PE
absolute PEs (APEs):single positions are mapped to elements of the representation space
relative PEs (RPEs):the difference between positions (i.e., x − y for x, y ∈ N) is mapped to elements of the embedding space。
(这里 WE 指 word embedding,不是两个矩阵
本文研究四个:(1) the fully learnable APE (Gehring et al., 2017), (2) the fixed sinusoidal APE (Vaswani et al., 2017), (3) the fully learnable RPE (Shaw et al., 2018), and (4) the fixed sinusoidal RPE (Wei et al., 2019).
RPEs perform better in span prediction tasks since they meet better translation invariance, mono- tonicity , and asymmetry; the fully-learnable APE which does not strictly have the translation in- variance and monotonicity properties during parameterizations (as it also performed worse in measuring translation invariance and local monotonicity than other APEs and all RPEs) still performs well because it can flexibly deal with special tokens (especially, unshiftable [CLS]).