Using Zeppelin to Build Data Science Models for Gathr

Data scientists use different applications like R, Python or Scala (with notebook tool like Apache Zeppelin) to develop data science models. For example, some prefer R to create their models, others like to write code for their models in languages like Python or Scala using notebook tools like Apache Zeppelin and so on.

Gathr, a real-time streaming analytics platform, allows users to build and deploy data models by using different tools like PMML, Scala, pyspark. This streaming analytics platform supports multiple languages and formats, enabling users to create the code in their preferred technology.  Once the model is prepared, it can be deployed on Gathr to run and perform scoring over the data in a distributed fashion.

This article explains how users can create a data model in Apache Zeppelin notebook and use it with the Gathr platform. It also demonstrates how to use pyspark library to build a SVM classifier on Zeppeling and use it on the Gathr.