Abstract
Embedding learning is essential in various research areas, especially in natural language processing (NLP). However, given the nature of unstructured data and word frequency distribution, general pre-trained embeddings, such as word2vec and GloVe, are often inferior in language tasks for specific domains because of missing or unreliable embedding. In many domain-specific language tasks, pre-existing side information can often be converted to a graph to depict the pair-wise relationship between words. Previous methods use kernel tricks to pre-compute a fixed graph for propagating information across different words and imputing missing representations. These methods require predefining the optimal graph construction strategy before any model training, resulting in an inflexible two-step process. In this paper, we leverage the recent advances in graph neural networks and self-supervision strategy to simultaneously learn a similarity graph and impute missing embeddings in an end-to-end fashion with the overall time complexity well controlled. We undertake extensive experiments to show that the integrated approach performs better than several baseline methods.
Original language | English (US) |
---|---|
Pages (from-to) | 70610-70620 |
Number of pages | 11 |
Journal | IEEE Access |
Volume | 11 |
DOIs | |
State | Published - 2023 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Materials Science
- General Engineering
Keywords
- Embedding imputation
- graph neural networks
- natural language processing