AMRs Way Back in 2020
Rodney Brooks
The “paper” trail of academia
I am now working at my sixth startup that I have co-founded and I have been working at one or more of them for all but one period of six months for the last 41 years. But I have also spent much of that time in academia and there and in subsequent writings I have created a paper trail of my thoughts, ideas, and evaluations of the then current state of both science and technology.
Of course, ideas change in academia as we dig deeper, but this paper trail usually has better stewardship, if only by the individual academic, than any trail about what entrepreneurs have said about when their company will deliver what (e.g., check out 10+ years of promises of “fully self driving” being imminent, within 12 months).
You can see my last eight years of blogging here, and in particular my tracking of predictions about AI, robotics, crewed spaceflight, autonomous vehicles, and others that I made seven years ago and how they have held up. You can also find pointers to 16 articles I have written for the magazine IEEE Spectrum, over the years, as a short blog post there.
What is a robot?
I am particularly proud of a short piece that I wrote for IEEE Spectrum in the early dark days of Covid. I wrote it as a sonnet based on Shakespeare’s own 18th sonnet. This was in response to a request from the editors to give them a definition of a robot. And yes, like us all I was deeply depressed by the lack of in person human interaction, and so I decided to have some fun in the way I responded.
Shall I compare thee to creatures of God?
Thou art more simple and yet more remote.
You move about, but still today, a clod,
You sense and act but don't see or emote.
You make fast maps with laser light all spread,
Then compare shapes to object libraries,
And quickly plan a path, to move ahead,
Then roll and touch and grasp so clumsily.
You learn just the tiniest little bit,
And start to show some low intelligence,
But we, your makers, Gods not, we admit,
All pledge to quest for genuine sentience.
So long as mortals breathe, or eyes can see,
We shall endeavor to give life to thee.
Yeah, that “libraries”/”clumsily” rhyme is itself quite a bit clumsy, I do admit. But the great thing about this is that it is dated: March 17th, 2020. It has a timestamp. That was what I said was a robot on that day. And that gives me a basis for explaining how we have changed that at robust.ai in the intervening five years..
What was I talking about?
We had founded Robust.AI nine months previously to build a general purpose software stack for the various mobile robots that were being built by other companies for mostly indoor applications, in homes, hospitals, retail stores, construction inside the shell of a building, factories, and warehouses. By the end of 2020 we had software running on robots from three different companies, but their native software and ours on top all shared the properties I described in my sonnet.
Those robots were all known as AMRs, or Autonomous Mobile Robots. They were more sophisticated than earlier generations of mobile platforms in factories, AGVs, Automated Guided Vehicles.
So now let me go through my sonnet, clearly shaped by Shakespeare’s sonnet of love, two lines at a time.
Shall I compare thee to creatures of God?
Thou art more simple and yet more remote.
I was comparing robots to animals, and saying they come up rather short, in two ways. Their behavior was clearly simpler than that of an animal, and they did not relate to humans in the way a living creature, especially another human, would relate.
You move about, but still today, a clod,
You sense and act but don't see or emote.
I was saying that the robots of 2020 could move, and yes, fairly safely, but not in a very smart way, looking clodish to a human observer. In the second of these two lines I was distinguishing between sensing things in the world, mostly to avoid hitting them, but not actually seeing what the objects were. Deep learning had been around for eight years by this time, but commercial mobile robots did not have enough embedded computer power to run neural models, and so could not label parts of images. Instead they relied on their LIDAR sensors for detecting obstacles. I will talk about LIDARs (LIDAR is an acronym for “Light Detection And Ranging”) for AMRs in the next section, how they work, when in time they became practical, and why all AMRs have them.
If there were cameras on AMRs at the time they were mostly used for a video source for remote operators, or for very specialized vision that had to run much more slowly than algorithms based on the LIDARs.
You make fast maps with laser light all spread,
Then compare shapes to object libraries,
The first of these lines describes a key capability of the AMRs of 2020, which was very different from what was possible twenty years before. They were able to build their own 2D geometric maps of an environment as they wandered around in a new space. This used a technique called SLAM (Simultaneous Localization And Mapping) based on data from their LIDARs. I’ll talk about this technique in a later section. This is the hallmark of what we now call third generation AMRs. And the second line refers to how the map is represented as a collection of straight lines in a plane. Collections of these straight lines with consistent size and relative positions and angles are matched so that the robot can figure out both its coordinates and orientation in the map, a library of object fragments.
And quickly plan a path, to move ahead,
Then roll and touch and grasp so clumsily.
With their geometric maps third generation AMRs were able to plan paths for themselves in spaces spanning thousands of square meters, and it could happen at a reasonable speed (the geometric planners I had worked on as a post-doc in the early eighties took minutes to plan on what was then a $100,000 machine; small computers had gotten a lot faster in 40 years). But where the AMRs performed poorly or “roll[ed] clumsily”, was when the world had changed somewhere locally from what they had mapped; something new having been left in the middle of an aisle, or a person being nearby. And I was being generous suggesting that the AMRs could touch or grasp at all–the most they could do was use special purpose mechanical systems to pull or push a particular shaped bin on or off themselves at a docking station.
You learn just the tiniest little bit,
And start to show some low intelligence,
The commercial AMRs of 2020 only learned their geometric map through SLAM. They did not collect data that could be used for supervised or unsupervised deep learning, nor did any, then nor now, use any form of reinforcement learning. So their learning capabilities were quite low, and they only showed the tiniest bit of intelligence.
But we, your makers, Gods not, we admit,
All pledge to quest for genuine sentience.
Now I turned to the future, saying that there would be a push towards more sentience, even though those of us developing were not Gods, and did not yet know how to get there. (And I know this may sound a little snarky, but valuations of some start ups tend to suggest that some VCs today think that 20-something dropouts are Gods, over generalizing from 2 or 3 examples out of thousands.)
So long as mortals breathe, or eyes can see,
We shall endeavor to give life to thee.
And this is a paean to the last two lines of Shakespeare’s sonnet, while admitting that all people who have worked on AI since 1956 have always yearned to produce AGI (the G snuck in there as a marketing tool for those who felt left behind), inspired by what living systems, i.e., we humans, can do.
LIDARs, how they work and why we have them
LIDAR works by shooting a laser beam in a certain direction and measuring the time between when light is emitted and when it returns, if at all. The time for it to return gives an estimate of the distance to the nearest reflecting thing in that particular direction.
The speed of light in the atmosphere (which is where all LIDAR based AMRs currently work) is 299,702,547 meters per second. Approximating that by 300 million meters per second that means that to measure a reflection from something 1.5 meters away from the sensor it needs to measure a round trip of 1 nanosecond, and if one wants centimeter accuracy it needs about 8 bits of resolution in that measurement. Still today there are no digital circuits that can do those sorts of measurements.
One of the ways around this is to modulate the light at a much higher wavelength, so that the system just needs to measure a phaseshift when the total distance traveled by the light is less than the modulation wavelength. Even that was hard for digital circuits until early this century. So LIDAR only came into its own around then.
To control the overall cost of a LIDAR, those used on AMRs measure distance only in a single plane, typically between 20cm and 30cm above the ground. A laser and a receiver are rotated around a vertical axis many times per second. If the laser were held in a constant direction it would be known as a 1D LIDAR. By rotating it about a vertical axis it becomes a 2D LIDAR.
AMRs see the world at a single height, in a 1 dimensional slit view of the world, a circle around that axis. For a not too complex or changing world that is good enough to get measurements that can be fed to mapping and localization algorithms.
For more complex worlds, such as those seen by autonomous cars the LIDAR needs to go to 3D and splay out a vertical plane of light as it rotates. This gives a two dimensional depth image of the world–the distance to the nearest obstacles in a cylindrical pattern around the rotation axis. These sensors are much more expensive, and for automobiles they need to be augmented with both visual light cameras and traditional radar units.
The 1-D LIDARs of AMRs have been safety rated. And the safety regulations for AMRs (there are many standards and different geographies in the world may require different safety standards) are today based on these LIDARs. However, in reality, their very limited view of the world in a single plane 20cm or 30cm above the floor, means there are many situations in hospitals, factories, and warehouses where they are not truly safe. One needs much more sensing to be truly safe.
What is SLAM, and why is it good?
SLAM (Simultaneous Localization And Mapping) is a technique that took decades of research and gradual improvements to start to be practical. In a footnote below I tell the early story of how this was developed, starting 40 years ago.
The idea is that one can let a robot loose in an unknown environment and it builds a map of that environment by wandering around and observing what its sensors tell it. Simultaneously with building the map it is able to recognize where it is in its partial map. This is called localization.
Today we people use maps both indoors and outdoors. Outdoors the maps are in our phones or on the screens of our cars and those systems use signals from GPS satellites to localize. Indoors, say in a sports stadium, we see two dimensional maps that are provided for us, and try to localize in our heads and our remembered image of the map as we walk to where we want to go.
A few decades ago we carried around paper maps and were skilled at recognizing in the real world the landmarks indicated on the maps and localizing that way.
The explorers of yore had no maps and had to do SLAM as they climbed over mountain ranges and canoed along rivers in order to eventually get back home and bring with them their incredibly valuable and hard won maps.
When robots can be explorers using SLAM we don’t have to create maps for them by hand. That makes deployment of them in real world situations much easier. SLAM reduces the friction of adoption of robots.
AMRs frozen in 2020
Almost all AMRs that are sold today use the LIDAR based techniques I described in 2020, although there are a few sellers of even older generations of AMRs and some AGVs.
Footnote
In 1984 I joined the faculty at MIT and one of my first projects was to try to build a visually guided mobile robot (though I used sonar for detecting moving obstacles as vision processing was too slow to be able to react in the world in real time, and remained so for 20 years).
I had observed others who tried to build maps from observations as a robot moved and they would anchor each observation at coordinates which were its best guess at how the robot had moved from anchoring the last observation. But errors soon built up as estimates from odometry (wheel motion) are very inaccurate. I decided to place new observations with an uncertainty, in both orientation and x and y coordinates, in a chain where the uncertainty magnitudes grew at each step. Then I realized that when the robot was sure that it was seeing something it had seen before it could reduce its current uncertainty, and propagate a lesser reduction to the previous observation and so on all the way back along the loop. You can see it happening in 2D with a simple 2D LIDAR in this video from 2017.
This has become known as loop closing and it is critical for being able to dynamically build maps of environments without any external reference data such as GPS (outdoors), or visible fiducials at known locations indoors.
In March 1985 in St Louis, at the IEEE ICRA (International Conference on Robotics and Automation) I published an invited, unrefereed, paper about this titled “Visual Map Making for a Mobile Robot”. At the same conference my friends in Toulouse, Raja Chatila and Jean-Paul Laumond, had a paper “Position referencing and consistent world modeling for mobile robots”. We were both surprised by the others’ paper as we had invented the same thing.
Chatila and Laumond had a really quite bad way of making geometric inferences from the closing of a loop. I had a much much much worse geometric inference mechanism. This inspired Randall Smith and Peter Cheeseman to chastise us (as is the way it works in science) for our poor mathematical ideas and in 1986 published a paper “On the Representation and Estimation of Spatial Uncertainty“, which made the idea practical-ish.
But it was 30+ years before it really became practical with using computer vision despite the bold title of my paper. Pro-tip: just because an academic writes a paper with a provocative title it does not mean that the idea behind the title will be ready for deployment any time soon.
No deployed AMRs in 2020 were using pure vision methods – all relied on LIDARs. At Robust.AI we are now deploying pure vision based SLAM driven AMRs, and the act of doing that enables a whole buffet of almost free lunches.