Distributed computing environments, such as the Grid, promise enormous raw computational power, but involve high communication overheads. It is therefore believed that they are primarily suited for "embarrassingly parallel" applications, such as Monte Carlo, and for certain applications where the loosely-coupled nature of the science involved in the simulations leads to a coarse grained computation. In a typical application, this is not feasible. We discuss our solution strategy, based on scalable functional decomposition, which can be used to keep the computation coarse grained, even on a large number of processors. Such a decomposition can be attempted through a variety of means. We will discuss the use of time parallelization to achieve this. We demonstrate results with a model problem, and then discuss its implementation for an important problem in nanomaterials simulation. We also show that this technique can be extended to make it inherently fault-tolerant.