There has been a vast amount of work to develop programming models that provide good performance across machine architectures, are easy to use, and have predictable performance. Similarly, the design and optimization of architectures to achieve optimal performance for an application class remains a challenging task. Accurate cost modeling is essential for both application development and system design. Many scientific computing codes are developed by using libraries that provide custom-built collective communication primitives. For example, the family of Bulk Synchronous Parallel (BSP) machine models provides suitable tools for analyzing such problems. However, modeling the effect of bandwidth limitations for globally unbalanced communication and estimating the hierarchical bandwidth used by applications remain key challenges. We present a hierarchical bandwidth machine model (alpha DBSP) that naturally extends the Decomposable BSP (DBSP) model by associating a bandwidth growth factor alpha to each message pattern. Algorithms executed on alpha DBSP have a runtime that is at least as good as DBSP. Hence, there are globally unbalanced problems for which alpha DBSP analysis is simpler or more accurate We present three scientific computing kernels that illustrate the differences between alpha DBSP and DBSP analysis. Similar to the BSP family models, alpha DBSP predicts collective communication execution time for a given machine. Additionally, alpha DBSP estimates the hierarchical bandwidth required by a given application. System architects may use this estimation to design machines that avoid bandwidth bottlenecks for their target application class.