One thing you learn pretty quickly when working on embedded systems is that USB sucks. Oh sure, it seems all innocuous when you’re hot-plugging thumbdrives into your computer or whatever, but there’s a huge amount of complexity hiding under the hood. With even the somewhat-outdated USB 2.0 spec clocking in at a hefty 650 pages, it’s not something one takes up lightly. I still have nightmares about the USB stack on STM32 chips, which I haven’t had to deal with for years.

Another problem with USB is its unreliability. This kind of makes sense; after all, USB is a standard intended to connect consumer devices in a way that your grandmother can figure out. It’s not really focused on extreme fault-tolerance in harsh environments. With that in mind, maybe we shouldn’t use it for these types of projects, but dammit, people keep doing it. Conveniently, they’re usually people who were in charge of the project before I was hired.

This leads to a cavalcade of predictable misery. In my case, it started all the way back with my high school robotics team. We designed a custom board to read sensor data which communicated back to the robot’s onboard computer over USB. Over the course of a season, this proved to be a source of so many headaches that, for the next year, we completely redesigned that system and switched to a plain old serial link. History repeated itself at my first “real” job out of college, where we were using almost the exact same cursed system design. The only difference was that this time around, it was in the guts of a military-grade avionics system. I remember buying a USB bus analyzer to get that one sorted out. Fun times.

USB on MARS-X

As luck would have it, MARS-X has also traditionally been heavily dependent on USB. This is in keeping with the “good enough” ethos that underpinned much of the first-generation control system design. This feature was particularly problematic for the motors and batteries, where it required a lengthy USB extension cable running between the two sides of the frame.

Diagram showing the topology of the USB network on MARS.
USB topology in the original design of the MARS-X robot. This requires cables to run to each of the battery modules, including from one side of the robot to the other.

We designed MARS-X with the two sides being as independent as possible. Each side has its own battery which powers only the motors on that side. The two battery packs are not connected to each-other. This is a recipe for electrical issues when you introduce USB into the mix. As soon as you string a cable between the controller (powered by the left battery) and the right battery, you have to contend with all sorts of ground loop issues. For awhile there, we kept frying the USB hub inside the right battery module. I eventually tried replacing it with an isolated USB hub in an attempt to stem the bleeding.

Another issue with running a USB cable across the robot is the cable itself. It was acceptable with the old, fixed 80-20 frame, but it causes some real problems with the newer frame that allows the robot’s width to be adjusted. Specifically, it severely limits that adjustment.

The robot equipped with the adjustable frame.

The final nail in USB’s coffin was our last data collection attempt in October. Basically from the moment we turned the robot on that morning, we had all sorts of electrical problems. We practically had to push it onto the trailer. Overall, data collection was delayed several hours, and the whole thing became kind of a dumpster fire.

It took me awhile to tease out exactly what was going on, but it looks like the issues were caused by a cascading failure of our USB hubs. First the one in the left battery module died, and then that killed an upstream hub. This culminated in an issue where the controller would randomly turn off when connected to the damaged hub. None of this is the kind of thing you want to be debugging at 11:30 at night when you just want to drive the robot off the trailer and back into the lab so you can go home.

The inevitable conclusion we reached after this sorry episode was that USB was a disaster and needed to go. But what could we replace it with?

All Aboard the CAN Bus

Most computer engineers will be familiar with CAN due to its heavy use in the automotive industry. Unlike USB, it’s a protocol that’s designed for reliability in these kinds of environments. With these advantages, we figured CAN was a good candidate to replace our USB-based architecture. As it happens, all of the devices on the robot, including the motors, BMS, and the Jetson we’re using as a controller, support CAN. The trick, as it happens, is wiring it up and getting the software working.

Readers who have been paying attention might have noticed that we still haven’t solved one of our bigger issues, which is the need for wires running across the robot. Sure, with CAN there are technically fewer of them, but even one wire is too many, as far as we’re concerned. We’ll need to get a little creative to deal with that one.

Wireless CAN

I want to make a brief digression into the evolution of the robot’s E-stop mechanism. (This is relevant, I promise.) E-stop is extremely simple: the way it’s supposed to work is that you press either of two big red buttons, and it shuts off power to the motors. Hopefully, this stops the robot from doing whatever dangerous thing it’s doing.

What happens when you don’t hit the E-stop in time.

The buttons are located on either side of the robot for user convenience. Originally, the way this was implemented is that both buttons were wired directly into relays inside each of the battery modules. The whole thing was one gigantic circuit that spanned both sides of the robot.

Obviously, this isn’t going to fly in the era of adjustable frames. Rui’s solution to this problem was to buy two Arduinos, put one in each battery module, and expect that I would figure out how to fix the E-stop. These particular Arduinos have 2.4 GHz radios onboard, so I got them talking to each-other. This way, when you press the E-stop on one side, the Arduino on the other side knows about it, and can E-stop that side as well. This approach didn’t always work early on, (in particular, I remember crouching in the mud, unscrewing the side panel on the battery so we could bypass an Arduino that had failed in the field) but by and large, it was mostly successful.

The Arduino inside the battery module, on a board with the motor relay and CAN bus transceiver.
Arduino in its natural habitat.

As it turns out, these Arduinos aren’t just equipped with wireless, they also support CAN. In that case, we have everything we need to make an Arduino-based wireless CAN bridge between the two sides of the robot! What could go wrong?

I banged out some code that simply forwards any CAN frames it receives over the wireless connection between the sides, and writes corresponding frames from the other side onto the bus. I figured that latency would probably be a killer, but we tested it, and it turns out it’s only around a millisecond. I’m pretty sure the Linux kernel on the Jetson adds more latency than that.

Some of you might be screaming that forcing critical motor control messages in a robotic system over an ad-hoc wireless link is stupidly unsafe and unreliable. Remember, though, that after the Great USB Hub Massacre of 2024, the bar for safety and reliability is not exactly high. Also, because the Arduinos are still handling E-stop functionality as well, we can program them to shut down the robot if they ever get disconnected. It’s nice how that works out.

So, now the MARS-X control system is entirely shifted to CAN, and there are finally no wires running across the robot. The two sides are completely isolated, as they should be. There are no weird electrical issues. Fantastic!

Topology of the CAN network on MARS-X.
Topology of the CAN network on MARS-X.

But unfortunately, all of our software is still written for the old USB architecture. How do we get the robot back to doing its robot things? I think that will be the topic of next week’s post.

Categories:

Tags:

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *