New Approaches to Real-time Anomaly Detection for Streaming Data

Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains. Be it detecting roaming abuse and service disruptions in the telecom industry, identifying anomalous employee behavior that signals a security breach, or preventing out-of-pattern medical spends in incoming health insurance claims; big data anomaly detection has innumerable possibilities.

Anomaly detection has traditionally been driven using rule-based techniques applied to static data processed in batches, which makes it difficult to scale out as the number of scenarios grow. Modern data science techniques are far more efficient. Complex machine learning models can now be built using large amounts of unstructured and semi-structured data from disparate sources including business applications, emails, social media, chat messages, voice, text, and more. Moreover, the massive increase in streaming time-series data is leading to a shift to real-time anomaly detection, creating a need for techniques such as unsupervised learning and continuous models.

Following are some examples of how leading enterprises are using real-time anomaly detection to gain deeper insights and to swiftly respond  to a dynamic environment:

Real-time Anomaly Detection Use Cases Across Verticals

A shift in anomaly detection techniques

Real-time anomaly detection for streaming data is distinct from batch anomaly detection. Streaming analytics calls for models and algorithms that can learn continuously in real-time without storing the entire stream, and are fully automated and not manually supervised. Even though both supervised and unsupervised anomaly detection approaches have existed, the clear majority of big data anomaly detection methods are for batch data processing, that does not fit real-time streaming scenarios and applications.

Moreover, detecting anomalies accurately in streaming data can be difficult; the definition of an anomaly is continuously changing as systems evolve and behaviors change. Furthermore, because anomalies are unexpected, an efficient detection system must be able to determine whether new events are anomalous without relying on preprogrammed thresholds.

Another critical aspect is early detection of anomalies in streaming data, as the focus lies in not only identifying anomalies but predicting and curbing anomalous events in real-time. Thus, predictions must be made online, where the algorithm identifies anomalies before incurring the actual event, unlike batch processing where the model is trained to look back.

A new approach to effective and reliable anomaly detection

One way to implement new approaches to anomaly detection is via hand-coding everything from scratch. However, developing a custom solution from scratch, with the shift to real-time anomaly detection which is significantly more complex has its own set of challenges like:

  • Long implementation cycles
  • Finding the right talent
  • Multiple QA cycles
  • Continuous monitoring and option to scale up with increasing loads once developed

Another approach is the platform approach to big data anomaly detection. Imagine a platform that would solve the complexities of not only building anomaly detection models for streaming data but provide a unified solution to train, calibrate, deploy and enable post-production monitoring of models, on both real-time and batch data.

Gathr is one such real-time anomaly detection platform. It is a specialized platform to rapidly build, run, and continually update anomaly detection models using a visual UI and machine learning capabilities. It leverages open source engines like Apache Spark to create analytics applications at big data scale and has a drag-and-drop interface to build and manage your application workflows visually.

It is an integrated framework not just to create models but also provide end-to-end functionality to build enterprise anomaly detection applications. It perfectly maps to the modern platform approach to anomaly detection by exposing features like:

  • Real-time data integration and processing
  • Rapid development and operationalizing applications
  • A/B testing
  • Monitor, debug, and diagnose at scale
  • Version management
  • Promoting workflows to different environments: Dev-Test-Prod
  • Multi-tenancy

To further get an in-depth view of real-time anomaly detection and the new platform approach to it, download this white paper Guide to Real-time Anomaly Detection for Enterprise Data.

Recent Posts

View more posts


50X faster time to value with Confluent and Gathr...


Data + AI Summit 2023: A must-attend for data scientists,...


Move away from batch ETL with next-gen Change Data Capture


ETL vs ELT: Which data integration practice is right for...