Graph Transformer

Positional and structural encodings

位置编码（PE）意在提供给定节点在空间中的位置的信息。如果两个节点在图或子图中的位置相邻，那么其位置编码应该相近。通常做法是计算一对节点之间的距离，或特征向量。

Local PE-node features-Within a cluster: 1、随机游走矩阵非对角元求和 2、节点到聚类中心距离

Global PE-node features -Within a graph ：1、Eigenvectors of the Adjacency, Laplacian or distance matrices 2、与graph中心的聚类。3、连通成分的独特定义 4、SignNet

Relative PE-edge features ：1、成对距离：最短路径、核、随机游走、贪心等或者使用全局or局部PE 。2、PEGlayer：位置编码→用于连接预测任务

结构编码（SE）旨在提供一个图结构或子图结构的嵌入，以增强GNN的表达能力和泛化能力。如果两个节点拥有相同的子图结构，亦或两个图相似，那么它们的结构编码也应该相似。

Local SE-node features-m radius 【划定半径范围、相当于子图】：1、度2、随机游走矩阵对角元3、预定义一些结构

Global SE-graph features 【两张图相似则globalSE也相似】

Relative SE-edge features 【和Local SE相关的边的嵌入】

1-Weisfeiler-Leman test

Weisfeiler-Lehman算法在大多数图上会得到一个独一无二的特征集合，这意味着图上的每一个节点都有着独一无二的角色定位（例外在于网格，链式结构等等）。因此，对于大多数非规则的图结构，得到的特征可以作为图是否同构的判别依据——WL Test（例如两个图是否是同质的，取决于节点的排列方式）。

Rethinking Graph Transformers with Spectral Attention

可学习的位置编码

code

Multi-Headed Attention (MHA)

1、 Prepare for multi-head attention：每个head 的vector表示做线性变换。

forward：

input： [seq_len, batch_size, d_model] or [batch_size, d_model]

在最后一个维使用线性变换，分为多个heads

self.linear = nn.Linear(d_model, heads * d_k, bias=bias)
head_shape = x.shape[:-1] #最后一个维为：d_model
x = self.linear(x)
x = x.view(*head_shape, self.heads, self.d_k)

2、MultiHeadAttention

每一个q、k、v都输入到preparationmultiheadattention的module中取准备映射数据。

计算分数：query和key

为每个head计算mask：[seq_len_q, seq_len_k, batch_size]

forward：input：

1
2
3

input：[query: torch.Tensor,key: torch.Tensor,value: torch.Tensor,mask: Optional[torch.Tensor] = None]
query、key和value的原始数据形式 [seq_len, batch_size, d_model]→[seq_len, batch_size, heads, d_k]
计算query和key相乘的得→[seq_len, seq_len, batch_size, heads] .

Yulin Wang