I may be mistaken, but it seems that PyTorch Transformers are autoregressive, which is what masking is for. However, I've seen some implementations where people use just the Encoder and output that directly to a Linear
layer.
In my case, I'm trying to convert a spectrogram (rows are frequencies and columns are timesteps) to another spectrogram of the same dimensions. I'm having an impossible time trying to figure out how to do this.
For my model, I have:
class TransformerReconstruct(nn.Module):
def __init__(self, feature_size=250, num_layers=1, dropout=0.1, nhead=10, output_dim=1):
super(TransformerReconstruct, self).__init__()
self.model_type = 'Transformer'
self.src_mask = None
self.pos_encoder = PositionalEncoding(feature_size)
self.encoder_layer = nn.TransformerEncoderLayer(d_model=feature_size, nhead=nhead, dropout=dropout)
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
self.decoder = nn.Linear(feature_size, output_dim)
self.init_weights()
def init_weights(self):
initrange = 0.1
self.decoder.bias.data.zero_()
self.decoder.weight.data.uniform_(-initrange, initrange)
def forward(self, src):
if self.src_mask is None or self.src_mask.size(0) != len(src):
device = src.device
mask = self._generate_square_subsequent_mask(len(src)).to(device)
self.src_mask = mask
src = self.pos_encoder(src)
output = self.transformer_encoder(src, self.src_mask)
output = self.decoder(output)
return output
def _generate_square_subsequent_mask(self, sz):
mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
return mask
And when training, I have:
model = TransformerReconstruct(feature_size=128, nhead=8, output_dim=128, num_layers=6).to(device)
This returns the right shape, but doesn't seem to learn.
My basic training loop looks like:
for i in range(0, len(data_source) - 1, input_window):
data, target = get_batch(data_source, i, 1)
output = recreate_model(data)
and I'm using an MSELoss
and I'm trying to learn a very simple identity. Where the input and output are the same, however this is not learning. What could I be doing wrong? Thanks in advance.
from How can I do a seq2seq task with PyTorch Transformers if I am not trying to be autoregressive?
No comments:
Post a Comment