Google Dataproc is a fully-managed service that hosts open-source distributed processing platforms such as Apache Spark, Presto, and Apache Hadoop on Google Cloud. Dataproc provides the flexibility to manage and configure clusters of varying size, on-demand.
However, even with Dataproc users are responsible for right-sizing the cluster and identifying the hardware for running each node and the best job execution parameters. Finding a configuration that optimizes both cost and performance at the same, is more an art than a science – in most cases simply an impossible task even for the experts.
In this short video, we show how Akamas AI-powered optimization can effectively address this challenge by automatically identifying the optimal configuration that reduces the cost of the Google Dataproc service and speed-ups Spark applications.