SHF: Small: Communication-Efficient Distributed Algorithms for Machine Learning

Project Details

Description

Advances in sensing and processing technologies, communication capabilities and smart devices have enabled deployment of systems where a massive amount of data is collected and then processed in order to make decisions. The platforms that process this vast amount of data differ depending on the application. Among these, data centers are powerful platforms with vast computational resources where the collected data can be distributed over multiple processors that are all connected through a high-bandwidth network. Data can also be generated and processed in multi-agent systems which are made up of multiple interacting computational units (such as smart devices connected through wireless internet) with limited resources in terms of storage, power, computation, and communication capabilities. Data communication costs, which include the bandwidth and latency, often dominate floating point operation costs thus the performance of optimization algorithms when operating on large data sets is bounded by data communication for both multi-agent systems and data centers. This project proposes novel communication-efficient methods for a class of distributed optimization problems arising in large-scale data analysis and machine learning. The methods and techniques developed under the scope of this project contribute to the efficiency, practical performance and to the mathematical foundations of distributed optimization algorithms. The project is also developing a high-performance software framework that allows the dissemination of efficient domain-specific software and benchmarks.

The project has three goals: the first goal is to improve the communication efficiency of existing algorithms for solving distributed optimization problems in the context of multi-agent systems, through a distributed algorithm for improving the total number of communications required in consensus iterations. The approach is based on leveraging the notion of the effective resistance of a link to identify bottleneck edges for communication purposes, and modifying the classical consensus averaging by taking effective resistances into account. The second goal is to develop communication-avoiding algorithms for data centers, through a framework that allows for reduction in communication by a tunable amount while keeping the arithmetic costs and bandwidth costs the same for a number of applications and existing algorithms. The third goal is to improve communication for hybrid systems which interpolate between multi-agents systems and data centers in terms of communication structure, using a framework that generates algorithm- and architecture-aware codes for reducing communication over these hybrid platforms.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date7/15/1812/31/22

Funding

  • National Science Foundation: $464,412.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.