Streaming Big Data ETL with Impetus Gathr and Syncsort DMX – Guest Blog

Streaming Big Data ETL with Impetus Gathr and Syncsort

Today we are announcing a partnership between Syncsort and Impetus Technologies, and our entry into an integration of batch processing and real-time stream processing that we call “Streaming ETL”. The mix of batch and real-time processing has also been referred to as the Lambda Architecture. Streaming ETL allows a mixing of the best batch and streaming technologies under the umbrella of tools which abstract the complexity of the underlying platforms.

The huge increase in types and sources of data has placed pressure on companies to blend and summarize that data quickly to create actionable information. A combination of real time and batch processing is needed to meet the new demands.

There’s a grab bag of technologies that excel in specific aspects: Hadoop Mapreduce, Storm and Spark for massively parallel processing; Kafka and Spark Streaming along with traditional messaging and queuing software for real time data movement; Mesos and YARN for cluster management. These components can be mixed and matched, but there are many APIs to learn and different skill sets needed to leverage them well.

Gathr definition with Syncsort DMX bolt

DMX task that performs lookup for IP