Journal on Communications ›› 2022, Vol. 43 ›› Issue (7): 163-171.doi: 10.11959/j.issn.1000-436x.2022142

• Papers • Previous Articles     Next Articles

Self-supervised speech representation learning based on positive sample comparison and masking reconstruction

Wenlin ZHANG, Xuepeng LIU, Tong NIU, Qi CHEN, Dan QU   

  1. College of Information System Engineering, Information Engineering University, Zhengzhou 450001, China
  • Revised:2022-06-20 Online:2022-07-25 Published:2022-06-01
  • Supported by:
    The National Natural Science Foundation of China(61673395);The National Natural Science Foundation of China(62171470)

Abstract:

To solve the problem that existing contrastive prediction based self-supervised speech representation learning methods need to construct a large number of negative samples, and their performance depends on large training batches, requiring a lot of computing resources, a new speech representation learning method based on contrastive learning using only positive samples was proposed.Combined with reconstruction loss, the proposed method could obtain better representation with lower training cost.The proposed method was inspired by the idea of the SimSiam method in image self-supervised representation learning.Using the siamese network architecture, two random augmentations of the input speech signals were processed by the same encoder network, then a feed-forward network was applied on one side, and a stop-gradient operation was applied on the other side.The model was trained to maximize the similarity between two sides.During training processing, negative samples were not required, so small batch size could be used and training efficiency was improved.Experimental results show that the representation model obtained by the new method achieves or exceeds the performance of existing mainstream speech representation learning models in multiple downstream tasks.

Key words: speech representation, self-supervised learning, unsupervised learning, siamese network

CLC Number: 

No Suggested Reading articles found!