Friday 12 March 2021

How can I do a seq2seq task with PyTorch Transformers if I am not trying to be autoregressive?

I may be mistaken, but it seems that PyTorch Transformers are autoregressive, which is what masking is for. However, I've seen some implementations where people use just the Encoder and output that directly to a Linear layer.

In my case, I'm trying to convert a spectrogram (rows are frequencies and columns are timesteps) to another spectrogram of the same dimensions. I'm having an impossible time trying to figure out how to do this.

For my model, I have:

class TransformerReconstruct(nn.Module):
    def __init__(self, feature_size=250, num_layers=1, dropout=0.1, nhead=10, output_dim=1):
        super(TransformerReconstruct, self).__init__()
        self.model_type = 'Transformer'

        self.src_mask = None
        self.pos_encoder = PositionalEncoding(feature_size)
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=feature_size, nhead=nhead, dropout=dropout)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.decoder = nn.Linear(feature_size, output_dim)
        self.init_weights()

    def init_weights(self):
        initrange = 0.1
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, src):
        if self.src_mask is None or self.src_mask.size(0) != len(src):
            device = src.device
            mask = self._generate_square_subsequent_mask(len(src)).to(device)
            self.src_mask = mask

        src = self.pos_encoder(src)
        output = self.transformer_encoder(src, self.src_mask)
        output = self.decoder(output)
        return output

    def _generate_square_subsequent_mask(self, sz):
        mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
        mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
        return mask

And when training, I have:

model = TransformerReconstruct(feature_size=128, nhead=8, output_dim=128, num_layers=6).to(device)

This returns the right shape, but doesn't seem to learn.

My basic training loop looks like:

for i in range(0, len(data_source) - 1, input_window):
  data, target = get_batch(data_source, i, 1)
  output = recreate_model(data)

and I'm using an MSELoss and I'm trying to learn a very simple identity. Where the input and output are the same, however this is not learning. What could I be doing wrong? Thanks in advance.



from How can I do a seq2seq task with PyTorch Transformers if I am not trying to be autoregressive?

No comments:

Post a Comment