In this session at Performance Summit 2021 on September 30th, Luca Chiabrera (Head of Customer Success and Sales Engineering at Akamas) describes how Akamas ML-based optimization helps to tailor Google Dataproc to reduce Spark execution time and cut the bill.
Dataproc is a fully managed and highly scalable Google service that facilitates quickly deploying clusters and executing Spark applications. Nevertheless, the user remains responsible for sizing the Dataproc infrastructure and defining the Spark application execution parameters. These activities have a relevant impact on both the execution time of the Spark applications and the cost of the Dataproc service. Machine Learning techniques can automatically tune Dataproc and Spark applications configurations to make Spark applications run faster and lower billing costs.
See also the other session by Akamas at Performance Summit 2021 here.