Spark Streaming Contest: Real-time Anomaly Detection Apps

At Impetus, we take data analytics innovation seriously. Very seriously. And one of the ways we continue to improve our big data software products and services, as well as retain our industry leadership standard, is through community programs that empower users to explore innovative uses for analytics technologies with our real-time streaming software, Gathr.

One of our big data programs was the inaugural Spark Streaming Innovation Contest, an international data hackathon that drew roughly 600 participants from around the world with a grand prize of $10,000 for the best submission. Held from February through April, we opened the contest to the general community, calling on business analysts and engineers to solve real-world anomaly detection problems.

Because hackathon participants vary in skill level and experience, we outfitted them with two tools. Apache Spark and Gathr. We wanted them to be able to access their data quickly while eliminating the need to build complicated models to gain insights.

Apache(R) Spark ™ is the most popular stream processing engine due to its open source framework, powerful programming model, and advanced analytics capabilities. However, Spark typically requires a lot of setup, coding and modeling; therefore, we equipped users with Gathr, a development platform that enables users to create real-time stream processing and machine learning applications.

Gathr makes anomaly detection on Apache Spark extremely easy, allowing developers to leverage their data quickly and spend their time gaining insights instead of programming. With these tools in hand, hackathon participants could build anomaly detection applications quickly, even without prior experience of using Gathr.

A panel of experts, including the Gathr product team, architects and engineers, as well as Alex Woodie, managing editor of Datanami and Mike Matchett, senior analyst and consultant at Taneja Group, evaluated and scored each submission.

Perhaps one of the most shocking discoveries we made is that this year’s winners weren’t even veteran data scientists. “I wouldn’t call myself a data science expert,” said Venu Kanaparthy of Redlands, California. Kanaparthy won the grand prize of $10,000 with his machine learning application for anomaly detection using Spark. Despite his limited experience, he says that he “was able to build a fully functional anomaly detection application on Spark working part-time evenings over about 4 weeks.”

A total of $18,000 was awarded in prize money, including two runners-up. The First runner-up (awarded $5,000) was Anindya Saha from Foster City, California. The second runner-up (awarded $3,000) was Kalyan Janaki from Denver, Colorado. We congratulate our winners and are already looking forward to next year’s competition.

Using Gathr to Calculate the Conversion Rate of a Website

Websites today are the cornerstone to drive business objectives and achieve revenue goals. Hence, business owners need to ask themselves the following questions:

  • Who are my potential customers?
  • What is their pattern of purchase?
  • How can I improve my website to increase business?

Gathr is an excellent platform for performing analytics on the web for any live clickstream data. You can track performance metrics of websites in many ways. One such measure is the relative conversion rate. Let’s take a look at what it means, how it’s derived and why it’s important.