TY - GEN
T1 - NMTucker
T2 - 4th ACM International Conference on AI in Finance, ICAIF 2023
AU - Varolgunes, Uras
AU - Zhou, Dan
AU - Yu, Dantong
AU - Uddin, Ajim
N1 - Publisher Copyright: © 2023 Owner/Author.
PY - 2023/11/27
Y1 - 2023/11/27
N2 - Missing values in financial time series data are of paramount importance in financial modeling and analysis. Appropriately handling missing data is essential to ensure the accuracy and reliability of financial models and forecasts. In this paper, we focus on datasets containing multiple attributes of different firms across time, such as firm fundamentals or characteristics, which can be represented as three dimensional tensors with the dimensions time, firm and attribute. Hence, the task of imputing missing values for these datasets can also be formulated as a tensor completion problem. Tensor completion has a wide range of applications, including link prediction, recommendation, and scientific data extrapolation. The widely used completion algorithms, CP and Tucker decompositions, factorize an N-order tensor into N embedding matrices and use multi-linearity among the factors to reconstruct the tensor. Real-world data are often highly sparse and involve complex interactions beyond simple N-order linearity; they demand models capable of capturing latent variables and their non-linear multi-way interactions. We design an algorithm, called Non-Linear Matryoshka Tucker Completion (NMTucker), that uses element-wise Tucker decomposition, multi-layer perceptrons, and non-linear activation functions to solve these challenges and ensure its scalability. To avoid the overfitting problem with existing neural network-based tensor algorithms, we develop a novel strategy that recursively decomposes a tucker core into smaller ones, reduces the number of trainable parameters, and regularizes the complexity. Its structure is similar to Matryoshka dolls of decreasing size in which one is nested inside another. We conduct experiments to show that NMTucker effectively mitigates overfitting and demonstrate its superior generalization capability (up to 53.91% less RMSE) in comparison with the state-of-the-art models in multiple tensor completion tasks.
AB - Missing values in financial time series data are of paramount importance in financial modeling and analysis. Appropriately handling missing data is essential to ensure the accuracy and reliability of financial models and forecasts. In this paper, we focus on datasets containing multiple attributes of different firms across time, such as firm fundamentals or characteristics, which can be represented as three dimensional tensors with the dimensions time, firm and attribute. Hence, the task of imputing missing values for these datasets can also be formulated as a tensor completion problem. Tensor completion has a wide range of applications, including link prediction, recommendation, and scientific data extrapolation. The widely used completion algorithms, CP and Tucker decompositions, factorize an N-order tensor into N embedding matrices and use multi-linearity among the factors to reconstruct the tensor. Real-world data are often highly sparse and involve complex interactions beyond simple N-order linearity; they demand models capable of capturing latent variables and their non-linear multi-way interactions. We design an algorithm, called Non-Linear Matryoshka Tucker Completion (NMTucker), that uses element-wise Tucker decomposition, multi-layer perceptrons, and non-linear activation functions to solve these challenges and ensure its scalability. To avoid the overfitting problem with existing neural network-based tensor algorithms, we develop a novel strategy that recursively decomposes a tucker core into smaller ones, reduces the number of trainable parameters, and regularizes the complexity. Its structure is similar to Matryoshka dolls of decreasing size in which one is nested inside another. We conduct experiments to show that NMTucker effectively mitigates overfitting and demonstrate its superior generalization capability (up to 53.91% less RMSE) in comparison with the state-of-the-art models in multiple tensor completion tasks.
KW - Data Imputation
KW - Financial Time Series
KW - Non-linear Tensor Decomposition
KW - Sparse Tensor Completion
UR - http://www.scopus.com/inward/record.url?scp=85179845501&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179845501&partnerID=8YFLogxK
U2 - 10.1145/3604237.3626909
DO - 10.1145/3604237.3626909
M3 - Conference contribution
T3 - ICAIF 2023 - 4th ACM International Conference on AI in Finance
SP - 516
EP - 523
BT - ICAIF 2023 - 4th ACM International Conference on AI in Finance
PB - Association for Computing Machinery, Inc
Y2 - 27 November 2023 through 29 November 2023
ER -