TY - GEN
T1 - Deep Graph random process for relational-thinking-based speech recognition
AU - Huang, Hengguan
AU - Xue, Fuzhao
AU - Wang, Hao
AU - Wang, Ye
N1 - Funding Information: The authors would like to thank the anonymous reviewers for their insightful comments and suggestions, Dr. Vincent Y. F. Tan, Dr. David Grunberg and Dr. Graham Percival for their assistance in proofreading the initial manuscript. This project was partially funded by research grant R-252-000-A56-114 from the Ministry of Education, Singapore. Publisher Copyright: © International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.
AB - Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts. Such mental processes are difficult to model in real-world problems such as in conversational automatic speech recognition (ASR), as the percepts (if they are modelled as graphs indicating relationships among utterances) are supposed to be innumerable and not directly observable. In this paper, we present a Bayesian nonparametric deep learning method called deep graph random process (DGP) that can generate an infinite number of probabilistic graphs representing percepts. We further provide a closed-form solution for coupling and transformation of these percept graphs for acoustic modeling. Our approach is able to successfully infer relations among utterances without using any relational data during training. Experimental evaluations on ASR tasks including CHiME-2 and CHiME-5 demonstrate the effectiveness and benefits of our method.
UR - http://www.scopus.com/inward/record.url?scp=85094323483&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094323483&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 4481
EP - 4491
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -