Some Notes on Cyclone DDS

12 Aug 2020 robots ros2

One of the biggest differences between ROS1 and ROS2 is the replacement of the single middleware with a plugin-based architecture. This allows ROS2 to use various Robotic Middle Ware (RMW) implementations. All these RMW implementations are currently based on DDS. You can read all about the details in the ROS2 Design Docs.

Over time, the supported RMW implementations have shifted and new ones have been introduced. The default is currently FastRTPS (which apparently has been renamed to FastDDS, but after the Foxy release). The newest option is CycloneDDS which uses Eclipse Cyclone DDS. Cyclone DDS has gotten a lot of praise lately, so let’s take a closer look.

RMW Implementations

Choosing between RMW implementations is still a bit of a challenge since ROS2 is still very much under active development. There are multiple tickets about FastDDS service discovery issues. CycloneDDS is less than two years old, which means it is still under very active development and might not be fully featured, but it is supposed to be really highly performant. Mixing multiple implementations at runtime has noted caveats.

Luckily, it’s very easy to switch between implementations by simply setting the RMW_IMPLEMENTATION environment variable (assuming the selected implementation is built/installed).

When switching between implementations, be sure to stop the ros2 dameon so that it gets restarted with the proper RMW implementation:

ros2 daemon stop

First you’ve heard of the ROS2 daemon? Check out this ROS Answers post which contains the best description I’ve seen.

Debugging Issues

While FastDDS was mostly working out of the box, the whole service problem was wreaking havoc on setting/getting parameters – and I’ve been tuning parameters frequently. I went ahead and set the RMW_IMPLEMENTATION to rmw_cyclonedds_cpp, or so I thought.

I noticed that service discovery wasn’t much better. Then I noticed on the robot I had set RMW_IMPLEMTATION - so I fixed the spelling mistake. Now everything should totally work great!

Wrong.

On the robot, discovery worked fine and services worked great - but half or more of the nodes couldn’t be seen by my laptop. Restarting launch files resulted in different nodes often missing!

I started to debug and came across the ddsperf tool. If you’re using ROS2 on MacOSX you’ll want to check out this issue on how to install ddsperf.

Multiple Network Interfaces

Running ddsperf sanity gave an interesting warning on the robot:

ddsperf: using network interface enp3s0 (udp/10.42.0.1) selected arbitrarily from: enp3s0, wlp2s0

The UBR-1 has two network interfaces: wlp2s0 is a wifi connection to the outside world and enp3s0 is an internal ethernet port which only talks to the robot hardware. Apparently, my nodes were frequently using the wrong network interface. The upstream Cyclone DDS README does mention, way down the page, that “proper use of multiple network interfaces simultaneously will come, but is not there yet.”

The configuration guide states that the selection of network adapter prefers non-link-local interfaces, but apparently something is tripping it up in detecting that the ethernet interface is configured that way.

The work around is to set a NetworkInterfaceAddress in the CYCLONEDDS_URI environment variable:

export CYCLONEDDS_URI='<CycloneDDS><Domain><General><NetworkInterfaceAddress>wlp2s0</NetworkInterfaceAddress></General></Domain></CycloneDDS>'

If you’re prone to typos, and want to make sure you’re actually running the expected RMW interface, I’d recommend this command:

ros2 doctor --report | grep middleware

After a few seconds, you should see:

middleware name    : rmw_cyclonedds_cpp

I actually setup an alias in my bashrc so that which_rmw runs that command. Once I settled on using Cyclone DDS as my new default, I also added the RMW_IMPLEMENTATION and CYCLONEDDS_URI settings to the bashrc on the robot.

Final Thoughts

Once I worked through the configuration issue, CycloneDDS appears to be the most stable of the few RMW implementations I’ve tried. I haven’t actually tested the performance head-to-head, but others have.

I would recommend looking at the Configuration section of the upstream Eclipse CycloneDDS project. This contains a bunch of useful information about what you can specify in the CYCLONEDDS_URI. The Guide to Configuring is also very worth reading. It’s honestly a great resource for simply understanding all those things you hoped you’d never need to learn about DDS.

ROS2 on MacOSX Catalina

10 Aug 2020 robots ros2

Back in 2014 or so, I had ROS1 running on my Mac. It took me a couple days to install and build dependencies. It was quite unstable. This weekend I got a new Macbook Pro (to replace my 2016 Macbook Pro, you know, the one with that great keyboard). I decided to also try setting up ROS2 on it, mainly for native RVIZ. It turned out to be somewhat straight-forward.

As a note, I really didn’t want to do too much mangling of my very nice and very new Macbook Pro - so I actually haven’t disabled System Integrity Protection. So far everything is working (with some caveats on workflow noted below).

First off, newer Macbooks are running Catalina (OSX 10.15) - which is not a supported release. ROS2 (even the newest Foxy release) still targets OSX 10.14 Mojave. This means we absolutely have to build from source for Catalina. I started by following the from-source installation instructions. I’d suggest going through the dependency installations listed there and then applying the patches in the next several sections of this post BEFORE actually running the colcon commands to build anything.

Installing XCode

The ROS2 instructions work for installing the XCode command line utilities, but it seems that I also needed to install XCode from the App store AND start the XCode GUI in order to finish the installation.

Some Paths

I had to add the following to my ~/.zshrc to get the various visual tools to compile:

export Qt5_DIR=/usr/local/opt/qt5/lib/cmake
export PATH=/usr/local/opt/qt5/bin:$PATH

The end of /usr/include

One of the bigger changes in MacOSX Catalina is the removal of /usr/include. Apparently the files have mostly moved to /Library/Developer. As far as I could tell, this really only affects the OGRE build - which needs access to various system header files. The fix is to set CMAKE_OSX_SYSROOT:

colcon build --symlink-install --cmake-args CMAKE_OSX_SYSROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk

I still had some build issues due to missing CoreFoundation, which were fixed by this hack:

sudo cp /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/CoreFoundation.framework/Versions/A/Headers/* /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/CoreFoundation

RVIZ Crashes

I was super excited to see a laser scan show up on my Macbook! I then decided to disable the laser scan and check out some other data - but RVIZ immediately crashed. I spent a while debugging (even after looking through the issues on GitHub) before realizing the fix I had come up with was already merged into the ros2 branch, but not the foxy branch I was building. You’ll want to run from the ros2 branch or at least include PR #572 to really be able to use RVIZ at all in MacOSX.

RQT Issues

Next I tried to run rqt_console - it wouldn’t run, giving me some crazy trace about not finding RMW implements - but I had already tested the Python demo nodes, so I knew that things were working there.

I eventually determined that I could run rqt and then load the desired plugin. I haven’t gone back and debugged this more yet.

Autocomplete Issues

When I first installed things, I got the following warning when sourcing my workspace:

fergs@MacBook-Pro foxy % source install/setup.zsh
zsh compinit: insecure directories, run compaudit for list.
Ignore insecure directories and continue [y] or abort compinit [n]?

I accepted the insecure directories a number of times, but eventually got frustrated that autocomplete seemed to not be working. Finally, I started looking into it:

fergs@MacBook-Pro foxy % compaudit
There are insecure directories:
/usr/local/share/zsh/site-functions
/usr/local/share/zsh

Apparently the fix is quite simple. From stack overflow:

sudo chmod -R 755 /usr/local/share/zsh

Debugging with SIP Enabled

One challenge I did come across was that you can’t just run lldb with ROS2 due to System Integrity Protection enabled. This is because the default lldb executable is located in one of those key system folders and so it strips off all the DYLD_LIBRARY_PATH stuff. The workaround is actually pretty simple - use a different lldb, for instance:

/Applications/Xcode.app/Contents/Developer/usr/bin/lldb ~/foxy/install/rviz2/bin/rviz2

Remaining Issues

There are still a few issues to resolve:

Dark mode has some issues - but I’ve opened a PR for that.
RQT tools not loading without starting rqt first and selecting plugin really slows down workflow. This one might actually be related to still having System Integrity Protection enabled?

5 Things ROS2 Needs in 2020

29 Jul 2020 robots ros2

I’ve been using ROS2 quite a bit over the past several months. As I’ve previously mentioned, it would appear there aren’t too many real robots running ROS2 yet. We have a bit of a chicken-and-egg problem where the tools are not yet fully ready for real robots, but until people start using ROS2 on real robots nobody knows the real pain points.

There are many, many things that could be done in ROS2. But there is limited time to implement them all - so we need to focus on those that enable robots and their developers to “survive”.

I often get asked if ROS2 is ready for prime time. My answer for a long time was “no”. I’m going to upgrade it to “maybe, depends on what you’re doing” at this point. This post describes five things that I think would make the answer “hell yes” for most roboticists. I actually hope this post ages poorly and that all these things come to happen in ROS2.

Automatic QoS for RVIZ, rcl2cli

Quality of Service (QoS) is probably the biggest change between ROS1 and ROS2 - it’s also the one that causes the most headaches from what I can tell. The ROS2 Foxy release adds a --verbose option to the ros2 topic info command which is a huge step in the right direction. This lets you quickly diagnose when a publisher and subscriber are using incompatible QoS.

rosbag2 got a huge upgrade in ROS2 Foxy: it automatically determines the proper settings for Quality of Service (QoS) so that it always connects to the publisher you’re trying to record (note: if multiple publishers are publishing to the same topic with different QoS it may not work - but really, who does that?).

Now we need that feature in RVIZ2 and the command line utilities (CLI). These are debugging tools, so they need to be able to “just work” in most scenarios.

Since most of the time you’re using RVIZ2 to connect to sensor data, which is often published with a non-default QoS (the sensor data profile), it’s absolutely bonkers that RVIZ uses the default QoS on everything (which is incompatible with sensor profile). Even something as simple as latched topics won’t work by default.

This is not an easy ask. It will involve significant changes to RVIZ as well as changes to lower level packages like message_filters, but I’m pretty sure this is the single biggest bang-for-your-buck improvement that will make ROS2 work better for robot developers.

Documentation

Ok, I’m sounding like a broken record (or the squeaky caster on your 8 year old mobile manipulator), but this is really important.

I’m not just talking about the lack of tutorials here. One of the things that made ROS great for new developers in the 2011-2014 era (when it experienced huge growth in the community), was a very polished and up-to-date wiki. If you wanted to find out about a package, you could go to wiki.ros.org/package_name - and the documentation was right there (or if it wasn’t, you had a pretty good idea this package wasn’t ready for prime time). With ROS2, we don’t have a centralized place for documentation yet - and I think that is holding the community growth back.

There is also the issue of “user documentation”. Nearly everything for ROS2 is written assuming an expert programming background (even more so than ROS1 documentation). Reading the source code is not how you’re supposed to learn how to run a ROS driver for a laser scanner.

Building out a community is super important. The best way to get a bug fixed is to find a developer who needs it fixed. I’ve only been using ROS2 on-and-off for a couple months - and in that time I’ve fixed half a dozen bugs across multiple ROS2 packages, and even taken on maintaining the ROS2 port of urg_node and the related packages.

Subscriber Connect Callbacks

Now we’ll jump into a super technical issue - but the impact is huge - especially for those doing perception (which is, you know, generally a big part of robotics). When creating a publisher in ROS1, you could register a callback which would get called whenever a subscriber connected or disconnected. This feature doesn’t yet exist in ROS2, but I think it is essential for real robotics systems. Here’s why:

Robots can generate lots of sensor data - especially when you add processing pipelines into the mix. Sometimes you might need a high-resolution point cloud with color and depth information. Sometimes you need a low-res colorless point cloud. This is especially true when the robot system does multiple tasks. For instance, imagine a robot that is both mobile and a manipulator - for navigating the environment it wants that high frame rate, low-res point cloud for collision avoidance. When the mobile manipulator gets to the destination it wants to switch to a high-res point cloud to decide what to grab.

Sometimes you literally cannot be publishing all the data streams possible because it would overwhelm the hardware (for instance, saturating the USB bus if you were to pull depth and color and IR from most RGBD sensors at the same time).

In ROS1, you could create “lazy publishers” so that the creators of these intensive data types would only create and publish the data when someone was listening. They would be alerted to someone listening by the connect callback. The lack of lazy publishers throughout various drivers and the image_proc and depth_image_proc packages is a real challenge to building high performance perception systems. When people ask me “is ROS2 ready?”, my first question these days is “how much perception/vision are you doing?”.

To be clear, there are workarounds available in some cases. If you’re creating a publisher yourself, you can:

Create a loop that “polls” whether there are subscribers (using get_subscription_count) as I did right now in the openni2_camera package.
Use parameters to dynamically reconfigure what is running. While this might work in some cases (and maybe even be a preferred solution for some use cases), it likely leads to a more brittle system.
Re-architect your system never need lazy publishers by hard coding exactly what you need for a given robot. While some of this is likely to happen in a more production environment, it doesn’t lend itself to code reuse and sharing which was one of the major selling points of ROS1.

Note that I said, “if you’re creating a publisher yourself”. There are lots of packages that are widely relied on in ROS1 whose ROS2 ports are crippled or broken due to the lack of subscriber connect callbacks:

message_filters
image_transport
image_proc
depth_image_proc

Developer Involvement

Note: in the month that I’ve been writing this post, a number of questions have been answered, so we’re already getting there!

I remember folks joking that ROS Answers was misnamed, because there were no answers there, just questions. It’s actually not true - unless you search for the ROS2 tag.

There are a lot of really good questions there. Like, stuff that’s not anywhere in the documentation and is probably quite relevant to a large number of users. Here’s a few examples:

ROS2 developers, please take note: we’ve got lots of great features in this system, please help your users learn to how to actually use them - maybe they’ll even help contribute back!

Your Robot on ROS2

There’s probably a bunch of other bugs/issues/etc hiding in the weeds. Your robot is probably not exactly the same as mine - and your use cases are going to be different. We need more robots running ROS2 to dig into things. The good news is: you can install ROS1 and ROS2 on the same system and switch back and forth pretty easily.

Older Newer

Robot & Chisel