Hemant Vishwakarma: Removing SEP token in Bert for text classification

Sunday, 24 July 2022

Removing SEP token in Bert for text classification

Given a sentiment classification dataset, I want to fine-tune Bert.

As you know that BERT created to predict the next sentence given the current sentence. Thus, to make the network aware of this, they inserted a [CLS] token in the beginning of the first sentence then they add [SEP] token to separate the first from the second sentence and finally another [SEP] at the end of the second sentence (it's not clear to me why they append another token at the end).

Anyway, for text classification, what I noticed in some of the examples online (see BERT in Keras with Tensorflow hub) is that they add [CLS] token and then the sentence and at the end another [SEP] token.

Where in other research works (e.g. Enriching Pre-trained Language Model with Entity Information for Relation Classification) they remove the last [SEP] token.

Why is it/not beneficial to add the [SEP] token at the end of the input text when my task uses only single sentence?

from Removing SEP token in Bert for text classification