Solving intractable chemical problems by tensor decomposition

Many complex chemical problems encoded in terms of physics-based models become computationally intractable for traditional numerical approaches due to their unfavourable scaling with increasing molecular size. Tensor decomposition techniques can overcome such challenges by decomposing unattainably large numerical representations of chemical problems into smaller, tractable ones. In the first two decades of this century, algorithms based on such tensor factorizations have become state-of-the-art methods in various branches of computational chemistry, ranging from molecular quantum dynamics to electronic structure theory and machine learning. Here, we consider the role that tensor decomposition schemes have played in expanding the scope of computational chemistry. We relate some of the most prominent methods to their common underlying tensor network formalism, providing a unified perspective on leading tensor-based approaches in chemistry and materials science.


Introduction
The goal of computational chemistry is to describe the properties and reactivity of molecular systems accurately, yet efficiently, with computational models in order to gain insights into complex chemical processes. [1]Depending on the problem at hand, the involved particles are treated classically, fully quantum mechanically, with hybrid quantum-classical schemes, or with data-driven approaches.While the theoretical foundations of both classical and quantum mechanics are well established, the development of efficient computational methods to calculate stationary molecular properties or to simulate the evolution of complex many-body systems over time remains an active field of research.In the past decades, tremendous efforts were made to push the limits of numerical calculations to larger and more complex chemical systems, fueled by the constantly increasing computational power available and the emergence of novel algorithms.A particularly prevalent, albeit often not directly apparent, common ground of many state-of-the-art approaches in computational chemistry is the utilization of powerful tensor decomposition schemes.Tensor decomposition-based approaches enable the systematic factorization of complex high-dimensional numerical representations of molecular features into many, interconnected, lower-dimensional tensors, which essentially allows one to solve several small problems in lieu of one intractably larger one.In this context, tensors can be understood as any kind of multidimensional array of (real or complex) numbers, with their dimensionality being defined by the number of indices required to specify a tensor element.For instance, a vector is a one-dimensional tensor, whereas a matrix corresponds to a two-dimensional tensor.In chemistry, tensors often appear within computational models as a consequence of the numerical description of the entities involved (such as electrons, atomic nuclei, atoms, functional groups, and so forth) and their interactions with one another.Due to the unfavorable scaling of the tensor dimension with system size, numerical approaches based on tensor decomposition schemes have emerged as a particularly promising family of computational methods to tackle chemically challenging systems.
In Section 2 of this article, we consider the foundations of tensor decomposition schemes and review one of the most common factorization techniques, the singular value decomposition, before we present some of the most prominent tensor decomposition approaches in computational chemistry in Section 3. We conclude in Section 4 by providing a unified perspective and by discussing the obtained by a truncated SVD is the best approximation with a fixed rank that can be obtained for a given matrix as measured by the Frobenius norm. [4]This very powerful property makes the SVD one of the most commonly applied matrix factorization techniques, [2] with routine applications including linear least-squares fitting problems, image compression (see Fig. 2 for an example), tensor-based wave function methods in quantum chemistry (see Section 3.2), and machine learning approaches (see Section 3.4).
Numerous other tensor decomposition techniques have been developed, among them most notably QR decomposition, Cholesky factorization, LU decomposition, canonical polyadic decomposition, and (hierarchical) Tucker decomposition.For an in-depth description of these techniques, we refer the reader to refs.[3] and [5].All of the above-mentioned decomposition schemes are routinely applied in various contexts in computational chemistry.In the following, we review some of the most prominent tensor decomposition-based methods currently developed to push the limits of computational approaches in solving complex chemical problems, ranging from interaction tensor factorization for quantum chemical calculations (see Section 3.1) to molecular quantum dynamics simulations (see Section 3.3).

Efficient Encoding of Interactions
One particularly common task encountered during the computational characterization of molecular many-body systems is the evaluation of interactions between many particles.In classical simulations, the calculation of electrostatic interactions between pairs of atoms (or coarse-grained pseudo-particles) is often the most time-consuming step, as a straightforward evaluation of all pairwise interactions scales with the system size squared. [6]n quantum chemical applications, the computational scaling of the pairwise Coulomb interactions is even steeper, as most calculations require the evaluation of four-index interaction integrals.Not only the generation, but also the manipulation and storage of these four-index tensors pose a great challenge due to their N 4 scaling, with N being the number of basis functions (e.g.orbitals) chosen to represent the particles of a molecular system.Since already small molecules easily require a few dozen and more basis functions for a reliable description of the relevant degrees of freedom, efficient schemes to compress the interaction tensors are needed. [7]In the following, we will limit our discussion to the example of two-electron integrals, but many of the techniques mentioned below can also be extended to other multi-index interaction tensors.
A two-electron integral, which is a contribution to the electrostatic energy (consider that the square of two orbitals is a density and the product of two densities over the distance is the integrand of a Poisson integral), can be written in Hartree atomic units for a given atomic-orbital basis set as where and denote the electronic coordinates of two interacting electrons, and is the distance between them.All integrals h ijkl can be stored in a four-index tensor, which we abbreviate in chemical notation as (ij|kl).A routinely applied two-electron integral tensor factorization scheme is the resolution of the identity (RI) technique, [8] sometimes also referred to as density fitting.The basic idea of RI is to express the four-index tensor (ij|kl) in terms of (at most) three-index quantities (3)   future of these methods in the context of emerging computational developments.

Te nsor Decomposition Te chniques
Tensor decomposition schemes have a long history in numerical mathematics and linear algebra. [2]Over the past two centuries, various techniques to factorize matrices have been developed, and some have also been generalized to higher-order tensors. [3]As any high-dimensional tensor can be reshaped into a two-dimensional matrix by an appropriate grouping of indices, any matrix factorization technique can effectively also be applied to reshape higher-order tensors.
One of the most commonly applied matrix factorizations is the singular value decomposition (SVD), which can be viewed as a generalized eigendecomposition applicable to arbitrary matrices.SVD factorizes any real or complex m × n matrix M into a product of three matrices, where U is an m × m unitary matrix and V * is a n × n unitary matrix in its conjugate transpose form.The matrix S is an m × n rectangular diagonal matrix with non-negative real entries, the so-called singular values, on the diagonal.If the matrix exhibits a high degree of degenerateness, it has many singular values that are equal to zero.The number of singular values that are not zero corresponds to the rank of the matrix.
As many large matrices encountered in chemical modeling do not have full rank, they can be efficiently compressed by only retaining the non-zero singular values, and disregarding the redundant part of the factorization, as visualized in Fig. 1.In numerical applications, matrices may have numerous singular values which are very close to zero and can be deemed negligible, hence effectively truncating the factorization.The compressed matrix (1)   (2) Fig. 1.To p row: An original matrix (gray) is decomposed by SVD into two unitary matrices (blue and brown) and a diagonal matrix (purple).By disregarding all singular values which are zero or negligibly small (middle row), the matrix factorization can be compressed (bottom row).
For large molecular systems, the CD factorization might not be sufficient to compress the integral tensor to a tractable size, such as, for instance, in quantum computation applications with limited resource availability. [12]In such cases, one option is to employ a double factorization of the integrals by further decomposition of the singly factorized CD tensor of Eq. ( 6).As L is usually sparse for large molecular systems, it can be further compressed, for instance, with a truncated singular vector decomposition for each single Cholesky vector as This results in a doubly-factorized form of the four-index integral tensor, in which this tensor is compactly represented in terms of low-rank vectors, [11] allowing for efficient storage and handling of the two-electron interaction terms.

Electronic Structure Calculations
Electronic structure calculations are central to computational chemistry, as the accurate determination of the electronic energy is crucial for the prediction of molecular properties and the modeling of chemical processes on a molecular level. [13]As a common starting point, most established wave function methods employ an expansion of the many-electron wave function in terms of antisymmetrized products of molecular orbitals, where is the occupation of the i-th molecular orbital , and is the expansion coefficient that measures the contribution of an electronic configuration to the electronic state .If unrestricted, this expansion is known as the full configuration interaction (FCI) wave function and can be viewed as a generalization of the linear combination of atomic orbitals (LCAO) concept to the total electronic wave function.Unfortunately, the number of many-electron basis states grows exponentially with molecule size.Therefore, an exact solution of the electronic structure problem in the space spanned by the electronic configurations of the FCI ansatz is only feasible for small molecules, drastically limiting the applicability of this direct approach.For large molecules, efficient methods that capture the relevant quantum effects while lowering the computational costs are needed.Processes, where quantum correlations cannot be reliably approximated (e.g. by Kohn-Sham DFT), such as chemical reactions catalyzed by transition metal compounds, require a sophisticated quantum mechanical treatment for an adequate description of the physical mechanisms.
(7) (8)   by inserting a resolution of the identity of g, spanned by an auxiliary basis set which is optimized for an efficient approximation of the full integrals.Straightforwardly applying such an RI decomposition would, however, presuppose knowledge of the full fourindex tensor, and therefore require the costly evaluation of all twoelectron integrals.In order to avoid the full construction of the four-index tensor in the first place, the integrals are directly evaluated in the factorized form, commonly by employing an approximation such as where and are the three-index expansion coefficients of the auxiliary basis set, and are the two-index electron repulsion integrals over the pre-optimized auxiliary basis functions.Hence, for the calculation of the four-index integral tensor in the RI approximation, only up to three-center integrals are required, of which there are much less than in the original set of integrals containing even four-center integrals.Because of its computational efficiency, the RI approximation is routinely applied in quantum chemistry programs to accelerate electronic structure calculations such as density functional theory (DFT). [8]or applications where the RI approximation is not applicable (e.g. because auxiliary basis functions have not been optimized), a more general Cholesky decomposition (CD) can be applied. [9,10]n CD, a positive semidefinite matrix M is decomposed as M = LL T , where L is a lower triangular matrix and L T is its transpose.Hence, the four-index integral tensor can be approximated with controllable accuracy by where L (n) is the n-th Cholesky vector.For many molecular systems, the number of Cholesky vectors required to adequately express the two-electron integrals only grows as N L ~NlogN, [11] therefore effectively reducing the number of terms in the expansion.Directly exploiting the factorized CD form allows for obviating zero or negligibly small integral terms without having to calculate the entire interaction tensor as not all elements of the decomposed matrices are needed to construct the Cholesky vectors L (n) .Such a decomposed form of the two-electron integrals gives rise to what is known as the singly-factorized electronic Hamiltonian.
(5) (6) Images can be conveniently expressed as 3-dimensional tensors, with the pixels distributed along the x-and y-axes, and the color encoding (such as RGB) in the third dimension.Image copyright: ETH Zürich / Gian Marco Castelberg. (4)

Molecular Quantum Dynamics
Simultaneously to DMRG emerging as a state-of-the-art electronic structure method for strongly correlated systems, the multilayer multi-configurational time-dependent Hartree (ML-MCTDH) algorithm [26][27][28] was developed in the field of molecular quantum dynamics to enable high-accuracy nuclear dynamic simulations with many degrees of freedom.While these two developments occurred largely independently of one another and (originally) with very different target applications, it was later realized that the two approaches are related as they can both be reformulated based on tensor network states. [29]Similarly to DMRG, the molecular wave function is expanded in terms of single-particle functions in ML-MCTDH, but instead of a linear MPS factorization, the wave function ansatz corresponds to a hierarchical Tucker decomposition, which is visualized in Fig. 4. The resulting tensor factorization is also known as a tree tensor network state (TTNS), and it allows for an efficient encoding of hierarchical interaction patterns.
Similar to the MPS ansatz, the TTNS ansatz allows for a compression of molecular wave functions by limiting the size of the multi-dimensional tensors in the network to some maximum size for the -th index.In analogy to the MPS factorization in DMRG, the construction of the full-dimensional tensor is also avoided in ML-MCTDH, and instead, the hierarchical Tucker decomposition is directly employed as an ansatz for the nuclear wave function.However, in contrast to standard DMRG as applied to molecular structure problems, ML-MCDTH is commonly employed to simulate quantum dynamics, and thus the tensors change over time as the wave function evolves.Hence, the coefficients are not determined through an energy minimization scheme as in DMRG calculations, but instead, the tensors are propagated in time according to equations of motion derived from the time-dependent variational principle.
However, a similar strategy to solve the time-dependent Schrödinger equation has also been applied to MPSs.The MPS wave function is then propagated in time with the time-dependent DMRG (TD-DMRG) algorithm based on the time-dependent variational principle. [30,31]In a TD-DMRG calculation, each tensor is updated in a sequential fashion similarly to the time-independent DMRG algorithm, until the entire MPS wavefunction has been propagated over a given time step.By applying suitable In the past two decades, the density matrix renormalization group (DMRG) algorithm [14] emerged as a powerful high-accuracy method for efficient calculations of strongly correlated quantum systems in chemistry. [15]While DMRG was originally developed for the simulation of one-dimensional spin lattice models, it has later been extended to quantum chemical applications to capture electron correlation effects in molecular systems. [16,17]lthough at first, only pseudo-linear molecules were considered suitable targets for formal reasons, it was then shown that DMRG can be applied to multi-configurational molecular problems in practice. [15][20] This tensor network formulation of DMRG demonstrates the efficient compression of complex manybody wave functions by factorizing the FCI tensor according to where the CI coefficients are now expressed as products of N matrices (with the first and last, i.e. and , are row and column matrices (vectors), respectively).The resulting wave function ansatz is known as a matrix product state (MPS) and is also referred to as a tensor train in mathematics.
The decomposition of the FCI tensor containing all CI coefficients into MPS form by sequential SVDs is illustrated in Fig. 3.The key advantage of the MPS representation is that it allows for a very effective compression of the full wave function by simply limiting the maximum size of the matrices to dimension m × m through retaining only the m largest singular values in each step.This ensures that the most significant electronic configurations are included in the MPS whereas negligible contributions are truncated, resulting in compact yet accurate multi-configurational wave functions.The approximation degree can be conveniently controlled by adapting the truncation threshold.
To compactly represent molecular wave functions numerically, one usually works directly with the factorized tensor ansatz, thus bypassing the explicit construction of the full-dimensional tensors, which would be unfeasible anyway.Hence, numerical algorithms are required which allow for an efficient calculation of the tensor factorization without having access to the full-dimensional tensor, as the FCI wave function of complex molecular systems is generally not known.For MPSs the tensor parameters can be determined with the DMRG algorithm, which solves the time-independent Schrödinger equation through a variational optimization of the wave function.As the direct solution of the full-dimensional eigenvalue problem is computationally intractable for large systems, DMRG exploits the factorized ansatz by optimizing one tensor after the other.In an iterative (so-called sweeping) procedure, all tensors of the factorization are updated repeatedly until convergence is reached.During this sequential MPS optimization, many small eigenvalue problems need to be solved numerically, rather than one big one.Interestingly, the small eigenvalue problems are typically too large for direct diagonalization approaches to yield the electronic energy, and hence, subspace iteration methods such as Davidson diagonalization [21,22] need to be employed, which represent another set of tensor approximation methods.Because of its favorable scaling with molecular size, the DMRG algorithm allows one to efficiently calculate ground-and excited-state wave functions of electronic, vibrational and vibronic problems, as well as molecular properties that are otherwise very difficult to handle. [23,24]Finally, we note that the MPS format has then also been discovered in the analytic construction of multi-configurational wave function ansätze in the multifacet graphically contracted function method. [25]4]

Machine Learning Approaches
[37] With the advent of sophisticated ML algorithms and the increase in available chemical data, ML has opened up a complementary route to traditional approaches.By employing powerful statistical models, ML enables an efficient characterization of systems which are too large or of which there are too many to be explicitly simulated mechanically with the given computational resources.There are two key areas of application where ML is routinely exploited to expand the limits of traditional computational chemistry.First, ML can further the understanding of chemical processes by extracting patterns in complex, highdimensional data.Second, ML can interpolate chemical data to efficiently predict properties of molecular systems based on datadriven models.Many of the ML approaches commonly applied to these two classes of problems in computational chemistry can be viewed as tensor factorization-based techniques employing statistically optimized tensor decompositions of large, complex problems.A well-known example of an ML method routinely applied to gain insights into complex, high-dimensional data is principal component analysis (PCA). [38]PCA is a linear dimensionality reduction technique based on projecting high-dimensional data sets onto a low-dimensional space (usually two or three dimensions for visualization purposes).The new coordinates, the so-called principal axes, span the space with the largest variation in the data.While PCA is often formulated as an eigendecomposition of the covariance matrix of the data set, it can also be understood as the direct application of an SVD (see Sec. 2) to the data matrix itself.In practice, a PCA via SVD is numerically more efficient in most cases.The singular values obtained from SVD are proportional to the variance of the data associated with each eigenvector.Hence, the largest singular values indicate the most important coordinate axes, and correspondingly, the associated right singular eigenvectors represent the principal components of a data set.One key difference of PCA compared to most of the other tensor-factorization-based approaches referred to in this paper is that PCA is employed to reduce the dimensionality of a chemical data set a posteriori: the full data tensor is known and only later decomposed.Therefore, the factorized tensors can be calculated straightforwardly by SVD, without the need of applying involved algorithms to determine the tensor coefficients.This makes PCA an easily applicable tool for gaining insight into high-dimensional chemical data in cheminformatics and chemometrics, and can be applied, for instance, for the design and planning of chemical libraries. [39] large variety of ML-based approaches has been developed for a second key application in computational chemistry, namely, the efficient prediction of chemical properties of molecular systems.For the modeling of chemical processes and materials, a prevalent target problem is the evaluation of the potential energy for a given atomistic structure.Potential energy evaluations are a frequent task in various applications, ranging from reaction network explorations to molecular dynamic simulations. [40]While classical molecular dynamic simulations based on parametrized force fields allow for cost-efficient evaluations of the potential energy and the resulting atomic forces, classical force fields fail to fully capture the underlying interactions of electrons and atomic nuclei of the molecular system.A proper quantum mechanical description of the potential can be achieved with ab initio molecular dynamics, where the potential energy is obtained directly from an electronic structure calculation (in Born-Oppenheimer molecular dynamics) or by propagation of orbitals (as in Car-Parrinello molecular dynamics) at each simulation time step.However, these approaches are severely limited with respect to system size and simulation timescales due to the cost of the electronic-structure part, even if one restricts them to methods such as DFT.Fortunately, significant advances have been made in this respect with the development of machine learning potentials (MLPs). [41,42]These potentials aim to preserve the high accuracy of first-principle methods while lowering the computational demands to those of classical force fields.MLPs are commonly based on very flexible mathematical expressions which are parametrized by optimizing the predictions for a given reference data set of molecular structures and corresponding ab initio energies and forces.At the highest level, all MLPs can be viewed as sophisticated interpolation schemes on the training data to predict the potential energies of similar molecular structures.Va rious types of MLPs have been developed, with a particularly frequently applied family of potentials being high-dimensional neural network potentials (HDNNPs) which are based on feed-forward neural networks (NNs). [43]Roughly speaking, NNs encode a hierarchy of concepts relating the input, which in this case are molecular descriptors, to the target output, which here corresponds to the potential energy.An illustration of such a feed-forward NN is given in Fig. 5.
Mathematically, the collection of interconnected nodes can be expressed in terms of tensors, which contain all the parameters of the different connections between two layers, and userdefined analytical functions describing the relationship from one layer to the next.Such networks of parameter tensors combined with suitable nonlinear transformations result in very expressive models to encode the mapping between a given input to the target output.As for almost all previously introduced tensordecomposition-based approaches, the key numerical challenge also for HDNNPs is the determination of the tensor coefficients, as they cannot be calculated through a straightforward tensor decomposition.While for the examples in Sections 3.1 to 3.3 the tensor entries were determined through equations arising from the underlying physical principles, the parameters of feed-forward neural networks are learned by optimizing the coefficients of each connection during training of the ML model.For this purpose, a cost (or loss) function must be defined which provides a quantitative measure of how well the model performs for a given reference example.The NN parameters are then optimized with respect to this cost function by employing a suitable optimization procedure, usually some type of backpropagation between particles, to compactly express multi-configurational many-body wave functions, or to flexibly parametrize complex relations between molecular descriptors and chemical properties.The coefficients of the tensor factorization scheme employed are commonly determined by some sort of functional minimization, which can be based either directly on physical principles, such as the variational principle, or on user-defined cost functions measuring the expressiveness of a given tensor parametrization, as is often the case for machine learning models.A key component of tensor-based methods are therefore the efficient numerical procedures employed to compute the tensor coefficients, such as the DMRG algorithm for molecular structure calculations or adaptive gradient-based backpropagation algorithms for training HDNNPs.For applications where dimensionality reduction is the main goal, such as PCA, the coefficient determination becomes trivial, as the parameters can be calculated with a straightforward decomposition of the full data tensor.
The development of various tensor decomposition-based approaches has often occurred simultaneously while also largely independently in different research areas.Different methods were originally intended for very distinct target applications and have initially been formulated with a specific focus.One particular example for this observation is the emergence of DMRG in solidstate physics for the calculation of spin chains simultaneously to MCTDH for molecular quantum dynamics.As both of these methods were then elaborated and their range of application was extended, [31,47] it was realized that they can both be reformulated based on tensor network states for a flexible representation of a large variety of many-body wave functions.Through the common foundation of efficient tensor factorizations, conceptual and algorithmic connections between the two methods became apparent, allowing for a synergistic advancement of these methods.Also, in other research contexts there are many cross-disciplinary developments that are continuously advancing tensor-based approaches, particularly in regard to the prevalence of ML in chemical and materials sciences, which is fueled further by the development of specialized hardware specifically optimized for tensor operations, such as graphical processing units and tensor processing units.
Within the next decade, entirely new approaches for simulating molecular systems and materials may emerge with the advent of quantum computing for real-world chemistry problems. [48]lso in this area, tensor factorization-based techniques will be of great importance, for instance for the calculation of high-quality guiding states or for the design of compact circuits to model complex chemical Hamiltonians.Hence, tensor decomposition-based methods will continue to play a crucial role in tackling intractable chemical problems.algorithm based on gradient descent, [44,45] until convergence is reached.
While the training process of an MLP is computationally quite expensive, as it requires costly reference data sets and entails numerous optimization iterations, all subsequent evaluations of the potential can be computed very efficiently.To ensure accurate potential energy predictions of molecular structures, MLPs can be combined with uncertainty quantification techniques to monitor the model variance error, for instance by employing an ensemble of models.For molecular structures with an energy or force uncertainty predicted above a given target threshold, the MLP can be improved with further training on such structures by employing active learning or lifelong learning schemes, which allow for a continuous optimization of the model parameters. [46]These properties make MLPs a powerful concept to combine the efficiency of classical force fields with the accuracy of quantum chemical calculations.

Conclusions
As the field of computational chemistry aims to gain insight into chemical processes through computational models of molecular systems, tensors naturally arise in various contexts from the numerical representation of the involved particles and the quantification of their interactions.For larger many-body systems, a straightforward numerical description can quickly become prohibitive due to the unfavorable scaling of the tensor dimensionality with system size.By systematically factorizing complex high-dimensional problems into many interconnected lower-dimensional ones with tensor decomposition schemes, previously intractably large chemical systems can be targeted.For this reason, numerical methods based on tensor decompositions are becoming ubiquitous in computational chemistry as a plethora of tensor factorization-based approaches has emerged in the past two decades.
While different tensor decomposition-based approaches are applied in very diverse research fields, they share the underlying assumption that tensor factorizations provide an efficient description of the interacting many-body systems commonly encountered in chemistry.Suitable tensor decomposition schemes often allow for a compact, while accurate representation of molecular systems with their intricate physical interactions governing chemical processes.Depending on the problem at hand, tensor factorizations are leveraged either to efficiently encode the interactions

License and Terms
This is an Open Access article under the terms of the Creative Commons Attribution License CC BY 4.0.The material may not be used for commercial purposes.The license is subject to the CHIMIA terms and conditions: (https://chimia.ch/chimia/about).
The definitive version of this article is the electronic one that can be found at https://doi.org/10.2533/chimia.2024.215

Fig. 2 .
Fig. 2. Image compression with SVD.From left to right: original image; reconstructed image with 5% of the singular values; and with 1% of them.Images can be conveniently expressed as 3-dimensional tensors, with the pixels distributed along the x-and y-axes, and the color encoding (such as RGB) in the third dimension.Image copyright: ETH Zürich / Gian Marco Castelberg.

Fig. 3 .
Fig. 3. Te nsor decomposition of the FCI tensor (top) into an MPS (bottom) by sequential SVDs.In the tensor diagrams, the filled shapes represent the tensors, with the adjoined lines visualizing their respective indices.

Fig. 4 .
Fig.4.In this TTNS, the differently colored tensors highlight the different layers in the hierarchical Tucker decomposition employed in ML-MCDTH.

Fig. 5 .
Fig.5.Feed-forward neural network.Data is mapped from the input layer (green) to the target output quantity (purple) through parametrized connections (arrows) via hidden layers (blue).