Why IoT is Hard — 8 Reasons
What I discovered after working 8 years in the IoT market
IoT architectures are inherently hard to design and implement, there are many reasons, and some of them at not obvious.
In the last 8 years working in IoT, I spent a lot of time moving tools and devices from offline to online, and doing it right is extremely hard, even well-established companies are struggling into getting it right. There is one big error everyone can do, thinking there is a solution that does fit all the use cases. IoT is not a technology, is a business problem.
Hardware, the hardware will be always there in IoT. It can be grouped as follows
- Microcontrollers, which usually gathered data from sensors directly connected to them, can also control actuators.
- Nodes, can be a little more capable than microcontrollers but still very limited, usually, they have some sort of extra connectivity (BLE, LoRaWan, etc. support)
- Gateways, which collect data from other smaller and less powerful devices, can eventually do some serious computation on the edge, also named fog or edge computing.
- Protocol servers, are where most of the processed data will come to, but can have some extra logic to either send data back to the field or process them again and store them on a database.
They can be virtualized or simulated in some way, but all those components end up running on some kind of hardware.
Each group of hardware/components has some specific challenges to be solved. Just to mention a few of them, you can have: microcontrollers with only 2048 Bytes of memory, CPU architectures at 8, 16, 32, or 64 bits. There can be bare metal software, embedded OS (FreeR TOS, MbedOS, etc.), for single processors, real-time Linux of you have some more CPU and memory. Interactions with external interfaces like SPI, I2C, UART, USB, Can BUS. Microcontrollers are often programmed in C/C++, but MicroPython, TinyGo, and Jerryscript are rising too.
This is just for the microcontrollers, the more the hardware is powerful the more options are in the market, you can have GPU doing some ML at the edge in a gateway programmed by using CUDA, Node.js to send data to a third-party service, a local MQTT broker.
There is no standard, so there cannot be good working solutions for every use case.
The radio protocols
Now let’s speak about the most used radio protocols; the most popular are WiFi, BLE, LoRaWAN, SigFox, NB-IoT, Cat-m, LTE. Do you think they have anything in common? I assure you, they have very little in common. Some of them can send just a few bytes every few seconds; some others can use TCP, some UDP only, some support the IP protocol, some others do not, some need an intermediate gateway, some need a global network. The network is never really global (read: coverage issues).
With some radio protocols like BLE or ZigBee, you need a gateway in the middle. It has to speak at least 2 protocols, one for reading/sending the data from the nodes, another one to send the data to the cloud service of choice.
Shit always happens, you can take a look at what caused Facebook employees to be locked out of their offices.
Each sensor measures different things and you can have multiple sensors on the same node, do you think there is a widely used protocol defining what a sensor is measuring (temperature, humidity, etc.) and with what measure unit? If your reply is yes, then you are wrong. Each sensor uses its own measure unit, there are multiple ways to measure the pressure, but do you know some wind detectors do not give you a "speed" value in m/s but anything like "rotations/second" and the rotations refer to the number of rotations of their mechanical parts. You will have to make the conversion, either in the code, in the backend, or in the presentation layer (see below the Dashboard section).
Some nodes are so simple and tiny that you have to do some logic in gateways nearby. They usually have 2 types of connection, one is with the nodes, another one is with the Internet or some other sort of more capable server/cloud service. So the gateways need the same radio connection as the nodes (or a physical connection) and another connection to the IoT System, they can conflict, you could have WiFi for nodes and WiFi for Internet connection. But if you are lucky you can have a decent gateway for a good price. Then the industrial ones usually do not store anything in their own flash, so you need external storage or you have everything going to be sent to the internet with any kind of protocol your IoT server support. And again one or more protocol conversion, another piece in the engine. Most of the gateways are using ARM, but many apps still not work well on ARM and prefer x86, so you now have another issue, having a version of your app for ARM, but you had to test it in your cloud that maybe does not support ARM (I know AWS now supports ARM instances, but is something really recent). So you could have different behaviour from staging to production (Node.js has quite a lot of issues in running properly on ARM in the last few years).
This is one of the most challenging. Updating your firmware over the air is simple unless… You do not have enough RAM nor Flash to store your new firmware while running the old one, you cannot support TLS due to either RAM size or lack of random generator, or just because your TLS keys do not fit the secure element of your chip. Then you eventually have to rewrite the entire TLS stack to fit a few KB, remove all the unused cipher suite, and hope they will not become insecure at some point in the next 10 years.
Once you solved this, you still have to deal with concurrent work (running your code while starting the update) in a one-chip environment with interrupts, and sometimes even without an OS to handle that. You could have a radio chip only supporting some protocols, or being in a network not allowing you to use UDP but only HTTP, while your network maybe is not as reliable as a TCP protocol would require, so you usually need to implement a retry strategy, again, with super-limited constraints to write your own logic.
Representing and analyzing data is as important as storing the data and retrieving them, there are companies fully dedicated to this task alone, like Tableau or Datadog, so let's imagine creating your own system to visualize data, and more importantly doing that on multiple devices (Desktops, Mobiles phones, tablets) but also allowing different people to have different permissions on their dashboard. You do not want everyone to have full capabilities to trigger on an industrial irrigation system remotely, without having different levels of permissions. Some users would only look at the dashboard, some others want to be notified if something goes wrong (over a threshold or differing from usual behavior), someone else will have full read/write access to each sensor/actuator.
Dashboarding is even more complex for systems having a ton of different elements, such as
- Multiple geographical locations for the devices.
- Multiple gateways gathering data from different sensors in the field, and you need a way to localize and identify them all on a dashboard.
- Multiple sensors and actuators, and usually you would need a way to aggregate them for a specific location, let's imagine an IoT parking lot with multiple locations, you will need to group each parking slot within the same location, see an overview of the free ones each lot and level, and maybe for all the locations to calculate how much money you are making while renting them. It is complex.
- Multiple widgets: every sensor/actuator could have one or multiple widgets associated with it, so users usually want to group them all and reuse a group of widgets as a template.
- Templates: not having templates is a huge pain for the users, they do not want to recreate a dashboard every time they add a new set of hardware in the field. They just want them to automatically appear in their dashboard. This is strictly related to the auto-provisioning of your hardware.
Most of the time is very easy to physically access an IoT device, because they are active in the field, and there is very little you can do to protect them other than choosing the right components (expensive usually). These systems are as useful as hard to program, or at least, really limited to very professional users (like ARM TrustedZones) so most of the small companies do not implement even the simplest security measures. Secure boot is quite a standard since a few decades already, but still many vendors do not implement it for finished products. This is just as risky as it sounds. In addition, many popular platforms do not even have a chip to properly implement TLS or any other encryption algorithm that requires a source for generating random data. And you know what? Your algorithm is going to be as weak as the predictable source of pseudo-random data.
Another big hole in the security process for a lot of cheap devices, is the supply chain, both on the physical side (many vendors are involved even in the most simple microcontrollers, sometimes even more than a hundred different vendors for each board). It reminds me of one of the most interesting attacks in the last 10 years involving the chip-maker Supermicro.
There are also more subtle supply chain attack on the software side, re-using code written from someone else that is not properly vetted, is going to add yet another attack surface on your product. Be prepared! You could expose critical systems to a world of attackers.
Is there any solution? Maybe, but it is complex. More complex than you think, and we are all going to make mistakes.
We should think more at the business problem that at the technical solution.
The best advice I can get is to properly design a device to do only one thing, to be as specific as possible and to use as few elements as possible. The system will be simpler to design, implement, support, and will have a minimal surface for attacks. You could then eventually have more than one device communicating each other.
For everything else, we could just be lucky!
If you want to know more about myself, please take a look at my Twitter account
and write me a private message, or just comment this article.