The image above shows the data volume of IoT-connected devices worldwide between 2019 and 2025. The statistic shows the overall data volume of connected devices/IoT connections worldwide in 2019 and 2025. By 2025, the total data volume of connected IoT devices worldwide is forecast to reach 79.4 zettabytes (ZBs, one zettabyte is equal to a trillion Gigabytes). These statistics are enough for us to understand how huge the data from IoT devices are. In this blog post, we are trying to understand how such a large amount of data from IoT devices are processed using Confluent Kafka. We will look at how Confluent Kafka supports MQTT a de-facto standard for messaging protocol for IoT devices which is designed as an extremely lightweight publish/subscribe messaging transport that is ideal for connecting remote devices with a small code footprint and minimal network bandwidth. MQTT today is used in a wide variety of industries, such as automotive, manufacturing, telecommunications, oil and gas, etc.
Confluent Kafka provides 3 components specifically to support MQTT data transmission, which is:
Kafka Connect source and sink connectors, which integrate with MQTT brokers in both directions
Confluent MQTT Proxy, which ingests data from IoT devices without needing an MQTT broker
Confluent REST Proxy for a simple but powerful HTTP-based integration
Kafka Connect is a framework included in Apache Kafka that integrates Kafka with other systems. Its purpose is to make it easy to add new systems to scalable and secure event streaming pipelines while leveraging all the features of Apache Kafka, such as high throughput, scalability, and reliability. The easiest way to download and install new source and sink connectors.
The Kafka Connect MQTT connector is a plugin for sending and receiving data from an MQTT broker.
The MQTT broker is persistent and provides MQTT-specific features. It consumes push data from IoT devices, which Kafka Connect pulls at its own pace, without overwhelming the source or getting overwhelmed by the source. Out-of-the-box scalability and integration features like Kafka Connect Converters and Single Message Transforms (SMTs) are further advantages of using Kafka Connect connectors.
The MQTT connectors are independent of a specific MQTT broker implementation. I have seen several projects start with Mosquitto and then move towards a reliable, scalable broker like HiveMQ during the transition from a pilot project to pre-production.
How does Kafka help us in this scenario?
Apache Kafka is an event streaming platform that combines messaging, storage, and processing of data to build highly scalable, reliable, secure, and real-time infrastructure. Those who use Kafka often use Kafka Connect as well to enable integration with any source or sink. Kafka Streams is also useful because it allows continuous stream processing. From an IoT perspective, Kafka presents the following tradeoffs:
Pros
Stream processing, not just queuing
High throughput
Large scale
High availability
Long-term storage and buffering
Reprocessing of events
Good integration with the rest of the enterprise
Hybrid, multi-cloud, and global deployments
Cons
Not built for tens of thousands of connections
Requires a stable network and solid infrastructure
Lacks IoT-specific features like Keep-Alive and Last Will and Testament
Since Kafka was not built for IoT communication at the edge, the combination of Apache Kafka and MQTT together is a match made in heaven for building scalable, reliable and secure IoT infrastructures.
With Kafka being used as a streaming platform, with the help of Kafka Connect and MQTT Connector, the devices are connected so that data transmission can be made possible.
The Use Cases
Kafka with MQTT is already used in many IoT deployments, both in Consumer IoT and Industrial IoT (IIoT). Most scenarios require a reliable, scalable, and secure end-to-end integration that enables bidirectional communication and data processing in real-time. Some specific use cases are:
Connected car infrastructure: cars communicate with each other and the remote data center or cloud to perform real-time traffic recommendations, prediction maintenance, or personalized services.
Example: Audi
Smart cities and smart homes: Buildings, traffic lights, parking lots, and many other things are connected to each other in order to enable greater efficiency and provide a more comfortable lifestyle. Energy providers connect houses to buy or sell their own solar energy and provide additional digital services.
Example: E.ON
Smart retail and customer 360: Real-time integration between mobile apps of customers and backend services like CRMs, loyalty systems, geolocation, and weather information creates a context-specific customer view and allows for better cross-selling, promotions, and other customer-facing services.
Example: Target
Intelligent manufacturing: Industrial companies integrate machines and robots to optimize their business processes and reduce costs, such as scrapping parts early or predictive maintenance to replace machine parts before they break. Digital services and subscriptions are provided to customers instead of just selling them products.
Example: Severstal
Machine learning plays a huge role in many of these use cases, regardless of the industry, which helps in getting better business-driven results.
Is this something you are looking for? We at Kimshuka Technologies are well experienced in implementing real-time streaming and analytics for IoT devices. Feel free to contact us.
Kommentarer