“If AI is so easy, why isn’t there any in this room?” asks Ali Farhadi, founder and CEO of Xnor, gesturing around the conference room overlooking Lake Union in Seattle. And it’s true — despite a handful of displays, phones and other gadgets, the only things really capable of doing any kind of AI-type work are the phones each of us have set on the table. Yet we are always hearing about how AI is so accessible now, so flexible, so ubiquitous.
And in many cases, even those devices that can aren’t employing machine learning techniques themselves, but rather sending data off to the cloud where it can be done more efficiently. Because the processes that make up “AI” are often resource-intensive, sucking up CPU time and battery power.
That’s the problem Xnor aimed to solve, or at least mitigate, when it spun off from the Allen Institute for Artificial Intelligence in 2017. Its breakthrough was to make the execution of deep learning models on edge devices so efficient that a $5 Raspberry Pi Zero could perform state of the art computer vision processes nearly as well as a supercomputer.
The team achieved that, and Xnor’s hyper-efficient ML models are now integrated into a variety of devices and businesses. As a follow-up, the team set their sights higher — or lower, depending on your perspective.
Answering his own question on the dearth of AI-enabled devices, Farhadi pointed to the battery pack in the demo gadget they made to show off the Pi Zero platform and explained: “This thing right here. Power.”
Power was the bottleneck they overcame to get AI onto CPU- and power-limited devices like phones and the Pi Zero. So the team came up with a crazy goal: Why not make an AI platform that doesn’t need a battery at all? Less than a year later, they’d done it.
That thing right there performs a serious computer vision task in real time: It can detect in a fraction of a second whether and where a person, or car, or bird, or whatever, is in its field of view, and relay that information wirelessly. And it does this using the kind of power usually associated with solar-powered calculators.
The device Farhadi and hardware engineering head Saman Naderiparizi showed me is very simple — and necessarily so. A tiny camera with a 320×240 resolution, an FPGA loaded with the object recognition model, a bit of memory to handle the image and camera software and a small solar cell. A very simple wireless setup lets it send and receive data at a very modest rate.
“This thing has no power. It’s a two-dollar computer with an uber-crappy camera, and it can run state of the art object recognition,” enthused Farhadi, clearly more than pleased with what the Xnor team has created.
For reference, this video from the company’s debut shows the kind of work it’s doing inside:
As long as the cell is in any kind of significant light, it will power the image processor and object recognition algorithm. It needs about a hundred millivolts coming in to work, though at lower levels it could just snap images less often.
It can run on that current alone, but of course it’s impractical to not have some kind of energy storage; to that end this demo device has a supercapacitor that stores enough energy to keep it going all night, or just when its light source is obscured.
As a demonstration of its efficiency, let’s say you did decide to equip it with, say, a watch battery. Naderiparizi said it could probably run on that at one frame per second for more than 30 years.
Not a product
Of course the breakthrough isn’t really that there’s now a solar-powered smart camera. That could be useful, sure, but it’s not really what’s worth crowing about here. It’s the fact that a sophisticated deep learning model can run on a computer that costs pennies and uses less power than your phone does when it’s asleep.
“This isn’t a product,” Farhadi said of the tiny hardware platform. “It’s an enabler.”
The energy necessary for performing inference processes such as facial recognition, natural language processing and so on put hard limits on what can be done with them. A smart light bulb that turns on when you ask it to isn’t really a smart light bulb. It’s a board in a light bulb enclosure that relays your voice to a hub and probably a data center somewhere, which analyzes what you say and returns a result, turning the light on.
That’s not only convoluted, but it introduces latency and a whole spectrum of places where the process could break or be attacked. And meanwhile it requires a constant source of power or a battery!
On the other hand, imagine a camera you stick into a house plant’s pot, or stick to a wall, or set on top of the bookcase, or anything. This camera requires no more power than some light shining on it; it can recognize voice commands and analyze imagery without touching the cloud at all; it can’t really be hacked because it barely has an input at all; and its components cost maybe $10.
Only one of these things can be truly ubiquitous. Only the latter can scale to billions of devices without requiring immense investment in infrastructure.
And honestly, the latter sounds like a better bet for a ton of applications where there’s a question of privacy or latency. Would you rather have a baby monitor that streams its images to a cloud server where it’s monitored for movement? Or a baby monitor that absent an internet connection can still tell you if the kid is up and about? If they both work pretty well, the latter seems like the obvious choice. And that’s the case for numerous consumer applications.
Amazingly, the power cost of the platform isn’t anywhere near bottoming out. The FPGA used to do the computing on this demo unit isn’t particularly efficient for the processing power it provides. If they had a custom chip baked in, they could get another order of magnitude or two out of it, lowering the work cost for inference to the level of microjoules. The size is more limited by the optics of the camera and the size of the antenna, which must have certain dimensions to transmit and receive radio signals.
And again, this isn’t about selling a million of these particular little widgets. As Xnor has done already with its clients, the platform and software that runs on it can be customized for individual projects or hardware. One even wanted a model to run on MIPS — so now it does.
By drastically lowering the power and space required to run a self-contained inference engine, entirely new product categories can be created. Will they be creepy? Probably. But at least they won’t have to phone home.