The Neotys PAC virtual edition is back, and Akamas will be represented by Luca Cavazzana, one of our Software Engineers.
In this session, we’ll showcase how we used AI-powered techniques to cut AWS Elastic Map Reduce costs required to run batch jobs on an Apache Spark big data implementation. The target application is a business intelligence application for the video-on-demand industry. The intervention resulted in cost savings of over 40%.
Performance tuning for big data frameworks can be challenging. The sheer number of parameters on different layers (i.e., Spark framework, JVM, YARN, etc.) and their interdependencies make predicting and optimizing performance immensely complex.
Running big data applications on the cloud adds further complexity, with even more options to find the optimal cluster configuration, such as instance family, size and number.
As a result, teams have to rely on vendor guidelines and generic rules-of-thumbs, which may lead to wasting the potential of an expensive cluster.
Join us to discover how our approach uses automation and AI techniques to iteratively identify optimal stack configurations regardless of its complexity. In this study, we tuned both Apache Spark parameters and EC2 cluster size, finding an optimal tradeoff between resource allocation and execution time that minimizes the overall cost.