Skip to main content

Ingesting IoT Sensor Data Into S3 With an RPI3

StreamSets Data Collector Edge is a lightweight agent used to create end-to-end data flow pipelines. We'll use it help stream data collected from a sensor.


Due to the increasing amount of data produced from outside source systems, enterprises are facing difficulties in reading, collecting, and ingesting data into a desired, central database system. An edge pipeline runs on an edge device with limited resources, receives data from another pipeline or reads the data from the device, and controls the device based on the data.
StreamSets Data Collector (SDC) Edge, an ultra-lightweight agent, is used to create end-to-end data flow pipelines in StreamSets Data Collector and to run the pipelines to read and export data in and out of systems. In this blog, StreamSets Data Collector Edge is used to read data from an air pressure sensor (BMP180) from an IoT device (Raspberry Pi3). Meanwhile, StreamSets Data Collector is used to load the data into Amazon's Simple Storage Service (S3) via MQTT.

Prerequisites

  • Install StreamSets
  • Raspberry Pi3
  • BMP180 Sensor
  • Amazon S3 Storage

Use Case

  • Read an air pressure sensor data with an IoT Device (Raspberry Pi3) and send data via MQTT
  • Use SDC to load the data into Amazon S3 via MQTT
Synopsis:
  • Connect the BMP180 temperature/pressure sensor with your Raspberry Pi3
  • Create an edge sending pipeline
  • Create a data collector receiving pipeline

Flow diagram:select



Connecting BMP180 Temperature/Pressure Sensor With a Raspberry Pi3

I2C bus, a communication protocol, is used by Raspberry Pi3 to communicate with other embedded IoT devices such as temperature sensors, displays, accelerometers, and so on. The I2C bus has two wires called SCL and SDA. SCL is a clock line to synchronize all data transfers over the I2C bus, and the SDA is a data line. The devices are connected to the I2C bus via the SCL and SDA lines.
To enable I2C drivers on the Raspberry Pi3, perform the following:
  • Run sudo raspi-config.
  • Choose Interfacing Options from the menu as shown in the following image:
select
















Note: If I2C is not available in the Interfacing Options, check Advanced Options for I2C availability.
  • Click Yes to enable the I2C driver.
  • Click Yes again to load the driver by default.
  • Add i2c-dev to /etc/modules using the following commands:
pi@raspberrypi:~$ sudo nano /etc/modules
i2c-bcm2708
i2c-dev

  • Install i2c-tools using the following command:
pi@raspberrypi:~$ sudo apt-get install python-smbus i2c-tools

  • Reboot the Raspberry Pi3 by using the following command:
sudo reboot

  • Install i2c-tools using the following command:
pi@raspberrypi:~$ sudo apt-get install python-smbus i2c-tools



  • Reboot the Raspberry Pi3 by using the following command:
sudo reboot
  • -
  • Ensure that the I2C modules are loaded and made active using the following command:
pi@raspberrypi:~$ lsmod | grep i2c

  • Connect the Raspberry Pi3 with the BMP180 temperature/pressure sensor as shown in the diagram below:


  • Ensure that the hardware and software are working fine with i2cdetect using the following command:
pi@raspberrypi:~$ sudo i2cdetect -y 1
select

Building Edge Sending Pipeline

To build an edge sending pipeline to read the sensor data, perform the following:
  • Create an SDC Edge Sending pipeline on StreamSets Data Collector.
  • Read the data directly from the device (using I2C Address) using “Sensor Reader" component.
  • Set the I2C address as “0x77”.
  • Use an Expression Evaluator to convert the temperature from Celsius to Fahrenheit.
  • Publish data to the MQTT topic as “bmp_sensor/data”.
  • Download and move the SDC Edge pipeline's executable format (Linux) to device side, where the pipeline runs on the device side (Raspberry Pi3).
  • Start SDC Edge from the SDC Edge home directory on the edge device using the following command:
bin/edge –start=<pipeline_id>

For example:
bin/edge --start=sendingpipeline137e204d-1970-48a3-b449-d28e68e5220e

For example:
bin/edge --start=sendingpipeline137e204d-1970-48a3-b449-d28e68e5220e
select

Building Data Collector Receiving Pipeline

To build a data collector receiving pipeline for storing the received data in Amazon S3, perform the following:
  • Create a receiving pipeline on the StreamSets Data Collector.
  • Use the MQTT subscriber component to consume data from the MQTT topic (bmp_sensor/data).
  • Use the Amazon S3 destination component to load the data into Amazon S3.
  • Run the receiving pipeline in the StreamSets Data Collector.


select

The real-time air pressure data collected and stored is shown in the image below.

select


Making the Most of the AWS IoT Surge

AWS has not been neglecting the IoT space. See what was announced at AWS re:Invent and how you can prepare your solutions to make use of the surge of new tools.


As noted by the eminent Fredric Paul, many of AWS' recent announcements at re:Invent aim to join the power of the public cloud with expanded capabilities at the edge and tools for managing huge, fast-growing fleets of connected devices.
While AWS CEO Andy Jassy announced most of the new services during his keynote, AWS IoT VP Dirk Didascalou shared more details in his IoT State of the Union presentation. ZDNet’s Stephanie Condon had previously noted Jassy’s emphasis on the exponential growth in the number of devices, and Didascalou went deeper into how each new services would play a role in both monitoring and controlling the coming digital tsunami.


Image title

Slide from re:Invent 2017 “IoT State of the Union”
AWS 1-Click creates an extremely simple path for associating devices with pre-built or custom AWS Lambda functions. We’ve been working with this team for a while to expand their Enterprise IoT Button program, and this enables more seamless integration with our ZipLine Enterprise IoT Button Manager application. With the announcement of a 3rd generation AWS IoT Button along with AT&T’s LTE-M Button, we look forward to deploying ZipLine across an even wider range of enterprise use cases.


Image title
Slide from re:Invent 2017 “Introducing AWS IoT 1-Click”
AWS IoT Device Management is designed to onboard, organize, monitor and remotely manage devices at scale. This is going to make things a *lot* easier for organizations launching AWS IoT solutions to stay in control of their devices in the field.
AWS IoT Device Defender will help ensure fleets of devices are secure with continuous auditing, real-time detection and alerts, and fast mitigation of suspected attacks or unexpected behavior (whether spawned by malicious intent or innocent administrative errors). Though announced publicly at re:Invent, this is the youngest service in the AWS IoT family and we haven’t yet had a chance to evaluate. Stay tuned for recommendations and best practices.
We’re excited about AWS Greengrass ML Inference, which enables teams to build and train Machine Learning models in the cloud and deploy them at the edge where they can run locally to provide predictive logic without requiring connectivity to the cloud. As an original Greengrass launch partner, we see this as a big move for AWS at the edge that we’ll be putting to good use for some of our largest customers.
AWS IoT Analytics simplifies the challenging tasks of filtering, transforming, and enriching time-series IoT data. In order to produce meaningful insights and deliver reliable predictive maintenance, your IoT solution must join together data from external sources and internal systems, such as environmental conditions and customer details from your CRM, with incoming data from your machines. AWS IoT Analytics is designed to automate more of this process (for a quick walkthrough, see this post from AWS Tech Evangelist Tara Walker).

Amazon CTO Werner Vogels summed up the value of AWS IoT and other services to enterprises nicely by stating “The quality of the data you have will be the differentiator,” while Larry Dignan took all of this in and declared “The days of Amazon Web Services as [just] an infrastructure provider are over.”
Following up on the re:Invent announcements, AWS Senior Solutions Architect Mahendra Bairagi has put together Essential Capabilities of an IoT Cloud Platform for another look at these new services in context.
Throughout the conference, Jassy called out how quickly IoT is being adopted across a wide variety of industries, and Daniel Bryant illustrated in his recap on InfoQ just how focused AWS has become on helping enterprises take advantage of this new trend.

Comments

Popular posts from this blog

Python and Parquet Performance

In Pandas, PyArrow, fastparquet, AWS Data Wrangler, PySpark and Dask. This post outlines how to use all common Python libraries to read and write Parquet format while taking advantage of  columnar storage ,  columnar compression  and  data partitioning . Used together, these three optimizations can dramatically accelerate I/O for your Python applications compared to CSV, JSON, HDF or other row-based formats. Parquet makes applications possible that are simply impossible using a text format like JSON or CSV. Introduction I have recently gotten more familiar with how to work with  Parquet  datasets across the six major tools used to read and write from Parquet in the Python ecosystem:  Pandas ,  PyArrow ,  fastparquet ,  AWS Data Wrangler ,  PySpark  and  Dask . My work of late in algorithmic trading involves switching between these tools a lot and as I said I often mix up the APIs. I use Pandas and PyArrow for in-RAM comput...

How to construct a File System that lives in Shared Memory.

Shared Memory File System Goals 1. MOUNTED IN SHARED MEMORY The result is a very fast, real time file system. We use Shared Memory so that the file system is public and not private. 2. PERSISTS TO DISK When the file system is unmounted, what happens to it? We need to be able to save the file system so that a system reboot does not destroy it. A great way to achieve this is to save the file system to disk. 3. EXTENSIBLE IN PLACE We want to be able to grow the file system in place. 4. SUPPORTS CONCURRENCY We want multiple users to be able to access the file system at the same time. In fact, we want multiple users to be able to access the same file at the same time. With the goals now in mind we can now talk about the major design issues: FAT File System & Design Issues The  FAT File System  has been around for quite some time. Basically it provides a pretty good file structure. But I have two problems with it: 1. FAT IS NOT EXTENSIBLE IN PLAC...

Fetching Facebook Friends using Windows Azure Mobile Services

This tutorial shows you how to fetch Facebook Friends if you have Facebook accessToken. Here is the the code for Scheduled task called getFriends function getFriends() { //Name of the table where accounts are stored var accountTable = tables.getTable('FacebookAccounts'); //Name of the table where friends are stored var friendsTable = tables.getTable('Friends'); checkAccounts(); function checkAccounts(){ accountTable .read({success: function readAccounts(accounts){ if (accounts.length){ for (var i = 0; i < accounts.length; i++){ console.log("Creating query"); //Call createQuery function for all of the accounts that are found createQuery(accounts[i], getDataFromFacebook); } } else { console.log("Didn't find any account"); prepareAccountTable(); } }}); } function prepareAccountTable(){ var myAccount = { accessToken: "", //enter here you facebook accessToken. You can retrieve ...