Kiwi D1: An autonomous differentially steered robot
Author
Gabriel Forsberg
Date Published

This project was made as part of the course Autonomous Robots (TME290)
Meet Kiwi D1

Kiwi D1 is a differentially steered, silly looking robot (much like my personal favorite Pixar star Wall-e), made by Ola Benderius, examiner of TME290. While not yet quite as advanced as our Pixar friend, he is still equipped with a respectable amount of electronics and sensors, see the full spec sheet below:
- 2x Ultrasonic sensors (Devantech SRF08), range from 3cm – 6m
- 6x infrared sensors (GP2Y0A41SK0F ), range from 4 – 30cm
- 2x 8-megapixel cameras (front and back facing)
- 1x Inertial measurement unit (IMU)
- 2x wheel encoders
- 1x Raspberry Pi 3
- 2x DC motors
- 1x 3.7V battery
What's missing? The brains. Or more specifically: Agency. At this time, Kiwi only knows how to follow strict commands like "give power to left/right wheel". This is not only Orwellian, but also a waste of his abilities. The goal of the project is to fix that and the reason for the name of the course Autonomous Robots.
Some Technical Background
Most cyber-physical systems today use monolithic codebases, where every service/function lies in one large repository. This can initially speed up development by keeping everything in one logical stack, but it can also give rise to technical debt, where maintaining and updating the code gradually becomes very tedious and slow due to the ever rising complexity, which we of course want to avoid. Luckily for us, the solution to this problem already exists and goes by the name microservices.
The concept of microservices is quite simple. Instead of keeping everything in one common place, each feature/service is kept in an isolated microservice, which by some protocol, is free to communicate with any other microservice of our choosing. This is desired because it allows for a single service to easily be removed, replaced or updated without the need to consider any other part of the larger codebase. This not only simplifies the structure, but it also massively increases the security, as vulnerabilities often are kept to isolated microservices that can be temporarily disabled and then rapidly patched. The drawback of course is that the initial implementation can be more challenging to get off the ground, which can seem daunting, especially for smaller projects.
In robotics specifically, ROS 2 (Robot Operating System 2) is by far the most commonly used framework today. It is however not based on a microservice architecture. While ROS 2 uses a publish/subscribe model, with processes divided into nodes forming a larger graph type network, they are not typically fully independent, like in a true microservice architecture. Additionally, ROS 2 comes with a lot (>200) of dependencies to support a wide range of functionalities commonly used in robotics. This makes ROS 2 very powerful, but it can also make it difficult to maintain and to keep processes small in size and safe against vulnerabilities.
This is the reason Ola decided to create his own lightweight framework OpenDLV, built from day 1 to be a modern microservice based replacement for ROS 2. For comparison, OpenDLV only has 1 dependency (Libcluon protocol for messaging), with 0 registered vulnerabilities on Vulnerability Scout, compared to the 149 found for ROS 2. This also drastically reduces the minimum size of a docker image running OpenDLV (3.7mb), compared to the minimum 152.3mb needed to run ROS 2. The reduced complexity of OpenDLV also means it drastically reduces the initial barriers of setting up a microservice based framework, even for smaller projects, such as the Kiwi D1 platform.
The Tasks
Each function/component of the Kiwi had its own microservice, the motors, the wheelencoders, the IMU, the cameras and the infrared and ultrasonic sensors. What was missing was a perception microservice, which could read messages from the sensor microservices, process the information and then send appropriate commands to the motor microservices.
In the end, the Kiwi was to be able to autonomously complete 3 tasks:
1. Complete one full lap of a 12m long racetrack made of cones as quickly as possible
2. Find and park on top of targets (PostIt notes) inside of a maze
3. Find targets in an open space (with obstacles) and return home to "charge" every 2 minutes
The Team
The class was divided into 16 teams of 4 students each, all given the same tasks. But most importantly, there was to be a race at the end of the course, where each team would compete against each other for the fastest lap time. Unfortunately for my team, we lost two people on day 1. They didn't die or anything (guessing a bit here), but they never showed up. We were then given two choices: Join forces with another team of 3 that suffered a similar fate or, alternatively, go at it alone.
For us the decision basically came down to whether or not we wanted a slightly bloated team with potentially less individual work, or a more lean team where we were guaranteed to to both have to do a lot more ourselves. The discussion lasted about 30 seconds. We didn't know each other very well, but we had worked together in a larger group before and we both joined the class for the same reason: robotics is cool, and we wanted to learn as much as we could. It was a bit of a gamble, but a 2 person team didn't feel too bad at the time.
As every function of the Kiwi robot was designed with OpenDLV utilizing microservices, we also weren't really limited to any specific programming language. Each microservice could theoretically be built using completely different languages. In robotics, most tend to use c++, as it is a compiled language that offers low level control and real-time performance. While we both had some experience with c++ before, we weren't exactly experts. A good opportunity to learn more, right?
Task 1: The Race
Task description: "The robot should drive as quickly as possible around a closed track built from colored cones, blue to the right and yellow to the left. The track width is 40 cm, and the track length is 12m. The start and finish line (same line) is marked with red cones. Create a robot behavior that drivers around the track as quickly as possible and then stops after the finish line. Hitting cones is allowed, but the robot needs to partly stay in the intended track at all times."
Here we took a simple tried and tested approach (not to make our lack of c++ proficiency to get in the way). We decided to use two coupled controllers, one PI controller for the speed and one PD controller for the steering. The derivative was excluded to not risk amplifying noise in the speed controller and the integral component was excluded from the steering controller to avoid instability if any potential error were to persist. The coupling was also made so that the speed would decrease as the turns steering gets more aggressive.
For the actual perception, which in part consisted of identifying cones: blue for cones on the right, yellow for cones on the left and red for start/finish, we began by utilizing the computer vision library OpenCV. With which you easily can filter for color, shapes and reduce noise, all in real-time. To actual decide where to go we used an aim-point model, steering towards the mean position of the centroids of the detected cones to the left and right. It was however here we also needed to reconsider one of our first decisions.
The learning curve of c++ was not necessarily too steep, but we didn't manage to keep the iteration rate at a speed we thought necessary to have a chance in the final competition. We thus decided to make the change to Python, which we both have extensive knowledge of, making the transition very fast (less than one day). Considering the computational efficiency of OpenCV (even in Python), we also felt confident the Raspberry Pi onboard the Kiwi would be more than capable. If we however had some time left over at the end of the course, we would try to optimize everything again using c++.
Testing: Simulation & Replay

Snapshot of the track simulation environment
When the first code was finally ready to be tested, the Kiwi robot had still not actually been built. One beauty of OpenDLV and microservices though is that creating a simulation environment is quite straightforward, the barebones of which had already been prepared. One simply needs to replace some of the microservices with "virtual" ones, that mimic the behavior of the real ones (e.g. the sensors), while others (like the perception and control module) can remain exactly the same. As the modules are completely isolated, they simply do not care about who is sending the messages they subscribe to.
In the simulation environment, everything worked quite good (after some adjusting of the color filtering) and the robot was able to complete the simulated race lap without any major jittering or collisions. It did however turn a bit too soon in some sharper curves, which we later fixed by adding some distance-weighting of the centroids of the cones to give more "weight" to cones closer to the Kiwi robot (which are more relevant for immediate path planning). We then only made some small fine tuning of the control parameters to make it slightly more stable and faster, whilst being careful not to overfit our choices to the simulation.
About one week later, a first prototype of the Kiwi D1 was available for testing. Excited to see how well our implementation would transfer to the real world, we didn't waste much time before connecting and uploading the code to the robot. Something wasn't quite right though. The upload (through a SSH & SCP connection) was going painfully slow and would take longer than the 2 hour time slot we had been given in the lab. This turned out to be an unintended consequence from the way the network connection to the Kiwi was implemented. When they made the network setup for the Kiwi, they opted to make it act as a hotspot for the Wifi in the building, so that the laptop could connect to the Kiwi through SSH while also keeping the Kiwi itself connected to the Wifi. This evidently turned out to be a much slower connection than they had expected. The problem was then also exacerbated by the fact that OpenCV (our only dependency other than LibCluon) was also quite big...
As the very valuable time for testing was quickly ticking away, we tried to think of other means of transferring the code. The first obvious solution was to physically connect to the Raspberry, but that was quickly ruled out as the connection port had been hidden behind the body of the robot. Our next solution was upload the cross-compiled code to gitlab and the pull the final docker image down to the Kiwi directly, bypassing the slower laptop-Kiwi connection. By this time our 2 hour window was however nearing its end. As to not completely waste our time with the Kiwi, during the last 10 minutes we instead opted to physically push the robot around the laid out track, while recording and collecting the data from from the onboard cameras and sensors , which at least gave us some new data to work with.
Back home, a bit frustrated with the result of the test earlier, we began reviewing the data we had recorded. And again, as we're using a microservice architecture, switching from a simulation to a replay environment is also very easy. All we had to do was replace the simulated microservices with the recorded data. Once done, playing back the replays immediately revealed that our current implementation probably would not have worked very well on the real test track anyways. The blue cones were identified reasonably well, but the yellow ones were completely washed out, most likely due to the very yellow lighting in the test room. After fine-tuning the color, shape and noise filtering again, this time on the recorded data, we did however eventually manage to get good identification of all cones again. In addition, we added a persistence threshold, requiring cones to be detected over a few frames before influencing the aim-point, some fallback defaults and a stronger aim-point smoothening.

Cone detection and aim-point (red) during replay
We were however still not entirely pleased with the result as some cones further away were not correctly identified at times. But with the new recorded data available, we now realized we could use a neural network approach(!), instead of the more simple OpenCV filtering. While we began work on this, it was however now also time to start implementing a solution for task nr. 2.
Task 2: Target in a Maze
Task description: "In a maze, drive around and explore the area without hitting any walls. The walls are made from 30 cm high wooden boards. The layout of the maze can be different for every run. There might be small gaps (less than 5 cm) between wall segments. Somewhere inside the maze, there is a blue 75x75 mm paper glued to the floor (target). When the robot sees the target it should go to it and park on top of it."
There are many strategies to navigate a maze (BFS, A*, Tremaux's algorithm, Random walk etc.) with varying amounts of complexity. Considering our constrained time (roughly two weeks/task) and very limited physical testing opportunities (one 2 hour session/week), we opted for a wall-following strategy, which given the amount of distance sensors onboard also should be fairly straightforward to implement. This strategy, in which you pick a wall on the left or right and then keep it in constant contact, is guaranteed to eventually reach an exit, given that there are no isolated "islands" of walls. In our case however, this is not guaranteed to be the true, and finding an exit is also not the end goal (finding the target is). Since there is no real time constraint, this can however easily be solved by introducing some randomness to the movement.
To begin, we decided to go with a Finite State Machine (FSM), with clearly defined states and transitions. We began by defining a small set of states: FIND_WALL, FOLLOW_WALL, RANDOM_SPIN, CRUISE, DRIVE_TO_TARGET and PARK, with a PD controller to make sure that the correct distance from the wall was kept at all times. We used the longer range ultrasonic sensors facing the front and back of the Kiwi to locate walls very far away and the shorter and more precise infrared sensors to avoid collisions and keep an appropriate distance from the walls. The randomness was then introduced based on a timer. If a wall had been followed for ~60 seconds, it would spin for a random duration (2–5s), before entering a CRUISE state, where it would drive forward until one of the forward facing sensors once again detected a wall.

Snapshot of the maze simulation
Similarly to the race perception module, we opted to use OpenCV for the target identification, which were even simpler in this case, making a neural network implementation unnecessary complex. If an object of the correct size and color range was in view, it would simply drive towards the centroid of the object until a certain pixel size threshold was met, initializing a final creep towards the parking area.
Once tested it in simulation, it was clear we hade made a silly mistake and forgot to add a state for how to properly handle corners, as it now always would get stuck in one. This was however quickly remedied by simply implementing a CORNER state that would make the Kiwi turn right if a wall both to the left and in front of it was detected (inside corner), or turn left if was previously following a left wall but no longer can detect one on either its left or in front of it (outside corner), and vice versa if it were to follow a right wall. After this simple fix, the code worked exceptionally well and the Kiwi had no trouble completing any maze we threw at it, though the time to completion sometimes would be a bit longer as it (by chance) would traverse an already explored path. This would however easily be fixed by using the EKF-SLAM algorithm later implemented in task nr. 3.
Task 3: Targets and Home Localization in an Open Space
Task description: "In an open space, with a few obstructing walls, there will be a home base represented by a blue paper and five green papers representing treasure. Start at the home base and drive around looking for the targets. The home base and targets are made from 75x75 mm paper glued to the floor. When a target is found, let the robot park over it and then rotate back and forth in order to mark it. The robot needs to be repositioned at the home base every two minutes to "charge", otherwise the trial is over. When all targets are marked and the robot is returned home the task is completed."
As finding targets and a "home" in an open space by just using a random walk algorithm would be significantly more inefficient or even impossible with a tight time limit, it was now needed to give the Kiwi some kind of memory and mapping capability. For this, we decided to go with an EKF-SLAM approach, which achieves simultaneous localization and mapping by treating its own pose and some identified landmark positions as a single Gaussian state.
The landmarks were initialized along identified walls with a set interval and the pose was tracked by integrating the speed of the wheels through the wheel encoders (the onboard IMU hade not yet been connected internally and could sadly not be used). The same OpenCV method as for the maze could then be reused for finding the targets and home by just updating the color ranges. To find the targets the Kiwi would now simply drive in an unexplored direction until one was found or an internal timer of 1 minute was reached, indicating that it is now time to stop the search and drive back home.
This worked reasonably well! Most of the time, all targets would be found without missing a "re-charge". However, sometimes if the Kiwi traveled for a long time without revisiting known landmarks, odometry errors could accumulate. Then, if it later (mistakenly) matched a new observation of landmark A to an existing landmark B, a false loop-closure could trigger an incorrect global update of the entire state (through the EKF's cross-variances), corrupting the map. If this happened, the Kiwi could have trouble finding its way back home before the 2 minute limit. Here the disconnected IMU would probably have improved things a lot. Its higher update rate and precision would likely greatly reduce the accumulated errors of the odometry, which in turn drastically would decrease the chance of false loop-closures. The time for further updates was nevertheless over at this point and instead it was time to prepare for the final D-day—the race!
Final Demo Day
On the day of the demo, not all 16 teams were confident enough to compete. As it turned out, we were far from the only ones who faced strong headwind from the constant unexpected challenges of unreliable connections, faulty wiring, short timeframes and so much more. From our limited testing in the lab the previous week and the more recent simulations with the neural network cone detection working incredible well, we were however fairly confident in our ability to at least finish one lap.
While the other teams who decided to give it a try were busy recording new test data to calibrate their OpenCV color filtering (as the race was held in a previously untested room), we went straight to the track as we believed our neural network approach would be more robust to the change in lighting. Once turned on, our initial high confidence quickly faded though as the Kiwi made a single jittering motion before completely stopping, until about 20 seconds later when it unexpectedly jittered again. And then a third time 20 seconds following that. Something was obviously very wrong.
After 30 minutes of head-scratching and some quite desperate attempts at trouble shooting, we couldn't find anything wrong at all—until we by sheer luck inspected the rate at which the frames were processed on the Kiwi. As it turned out, the reason everything looked flawless in the simulations was that we did all the computation on our laptops with much more available RAM. This increased the processing frame rate from about 1ms all the way to, you guessed it, 20 seconds. Sadly, this meant no quick fix was possible. Adding salt to the injury was the fact that it most likely would have worked fine had we stuck with using c++, which drastically would have decreased the required RAM for the onboard processing.
At this point, some teams had managed to at least get their Kiwis moving in the intended direction, but no one was yet close to actually finishing the full lap. We thus figured we actually still had a chance if we returned to the old color filtering code used initially some five weeks prior. We quickly swapped out the more computational heavy parts of the code with the equivalent parts of the old one, while simultaneously recording new video data for the now needed color calibration, realizing that time once again was running short.
The result? It actually worked! After some fine tuning of the parameters and with less than 5 minutes left on the clock, it successfully completed the full track! It was however not perfect... some cones were knocked over and the steering was not perfectly smooth, but being the only Kiwi to get past the finish line, we considered it a success!

Kiwi past the finish line (he swears the yellow cone fell down before he passed over it)
Safe to say, there are lots of things I think we would do different if we were to start over again (sticking to c++ being one of them), but the things I learned the most from in this course were definitely from the often small (yet at the time very annoying) bugs and unexpected challenges that kept coming all the way until the end. Had we not eventually managed to finish the race though, I'm not sure how happy I'd be about them now though...
Other posts

How I developed and a trained deep neural network based on the EfficientNet architecture to automatically detect, classify and locate five different types of intracranial hemorrhages.

What finally made me buy a 3d printer (spoiler: not to actually print things).

A detailed walkthrough on how we designed, built and tested a 3d printed fully autonomous robotic arm.