What is pooled_output in Bert?

Hence, the authors of BERT paper found it sufficient to use only the output from the 1st token for few tasks such as classification. They call this output from the single token (i.e, 1st token) as pooled_output.

What is the difference between sequence output and pooled output?

So 'sequence output' will give output of dimension [1, 8, 768] since there are 8 tokens including [CLS] and [SEP] and 'pooled output' will give output of dimension [1, 1, 768] which is the embedding of [CLS] token. In general people use 'pooled output' of the sentence and use it for text classification (or for any other specific task).

What is the advantage of using pooled output?

Pooled output can be used if there are missing values or you want more accurate results. According to SPSS pooled results are generally more accurate than those provided by single imputation methods Can you help by adding an answer?

What is pooled_output in neural network?

It also has a pooled_output (normally of shape [batch_size, 768]), which is output of an additional “pooler” layer. Pooler layer takes sequence_output[:, 0](first token, i.e. CLS token) followed by dense layer and Tanh activation. That’s where pooled_output got its name and why it’s different from CLS token, but both should serve the same purpose.

