Mapping and Localization in ROS2

Now that the drivers are pretty much operational for the UBR-1 robot under ROS2, I’m starting to work on the higher level applications. The first step was building a map and setting up localization against that map.

SLAM

In ROS1 there were several different Simultaneous Localization and Mapping (SLAM) packages that could be used to build a map: gmapping, karto, cartographer, and slam_toolbox. In ROS2, there was an early port of cartographer, but it is really not maintained. The other package that has been ported to ROS2 is slam_toolbox, which is basically slam_karto on steroids - the core scan matcher is the same, but everything else has been rewritten and upgraded.

Installation of slam_toolbox is super easy:

sudo apt-get install ros-foxy-slam-toolbox

I then created a launch file, which is an updated version of the online_sync_launch.py found within slam_toolbox:

from launch import LaunchDescription
from launch.actions import DeclareLaunchArgument
from launch.substitutions import LaunchConfiguration
from launch_ros.actions import Node
from ament_index_python.packages import get_package_share_directory


def generate_launch_description():
    use_sim_time = LaunchConfiguration('use_sim_time')

    declare_use_sim_time_argument = DeclareLaunchArgument(
        'use_sim_time',
        default_value='false',
        description='Use simulation/Gazebo clock')

    start_sync_slam_toolbox_node = Node(
        parameters=[
          get_package_share_directory("ubr1_navigation") + '/config/mapper_params_online_sync.yaml',
          {'use_sim_time': use_sim_time}
        ],
        package='slam_toolbox',
        node_executable='sync_slam_toolbox_node',
        name='slam_toolbox',
        output='screen')

    ld = LaunchDescription()

    ld.add_action(declare_use_sim_time_argument)
    ld.add_action(start_sync_slam_toolbox_node)

    return ld

My updates were basically just to use my own config.yaml file. In that YAML file I had to update the frame ids (I don’t use a base_footprint, and my robot has a base_scan topic rather than scan). There are dozens of parameters to the Karto scan matcher and you can see the entire file on GitHub but the basic changes I had to make were:

slam_toolbox:
  ros__parameters:

    # ROS Parameters
    odom_frame: odom
    map_frame: map
    base_frame: base_link
    scan_topic: /base_scan

Now we can run the launch file and drive the robot around to build a map. We can also view the map in RVIZ. To get the map to come through, you will likely have to expand the options under the topic name and change the durability to transient local. Even though the documentation on ROS2 QoS says that volatile subscriber is compatible with a transient local publisher, I’ve found it doesn’t always seem to work right:

Now that we’ve built a map, it is time to save the map. The command is quite similar to ROS1, except you must pass the base name of the map (so here, I’m passing map, which means it will save map.yaml and map.pgm in the local directory):

ros2 run nav2_map_server map_saver_cli -f map

Next we can create a launch file to display the map - I used the example in nav2_bringup as my starting place and changed which package the map was stored in. You can find my launch file in the ubr1_navigation package. I started my localization launch file and opened RVIZ to find:

It turned out I had to adjust the free_thresh threshold in the map.yaml down to 0.196 (the same value in ROS1) for the map to look correct:

There are numerous parameters in slam_toolbox and many more features than I could possibly cover here. For a good introduction, check out ROSCon 2019 Talk by Steve Macenski.

Localization

While there are a variety of mapping options in ROS1 and some in ROS2, for localization it really is just Adaptive Monte Carlo Localization (AMCL). There is some ongoing work towards more modern localization solutions in ROS2, but it would seem to be a long way off.

The launch file we copied over for running the map_server also included AMCL in it (hence the name localization.launch.py).

For the most part, there are only a few parameters to tune in AMCL to generally get decent results:

amcl:
  ros__parameters:
    alpha1: 0.25
    alpha2: 0.2
    alpha3: 0.15
    alpha4: 0.2
    base_frame_id: "base_link"
    global_frame_id: "map"
    laser_model_type: "likelihood_field"
    max_beams: 60
    max_particles: 2000
    min_particles: 500
    odom_frame_id: "odom"
    robot_model_type: "differential"
    tf_broadcast: true
    z_hit: 0.6
    z_max: 0.05
    z_rand: 0.3
    z_short: 0.05

Before trying to tune AMCL, you really need to make sure your TF and odometry are setup correctly, there are some points in the Navigation Tuning Guide, which was written for ROS1, but is generally very much true in ROS2.

The most important parameters are setting the alphaX parameters to model your odometry noise. By default all of them are set to 0.2, but they should be adjusted based on the quality of your odometry:

  • alpha1 - noise in rotation from rotational motion
  • alpha2 - noise in rotation from translational motion
  • alpha3 - noise in translation from translational motion
  • alpha4 - noise in translation from rotational motion

These are somewhat intuitive to understand. For most robots, if they drive forward in a straight line, the odometry is very accurate - thus alpha3 is often the lowest value parameter. When the robot turns in place, it probably has more noise (unless you have a fantastically tuned gyro being merged with the wheel odometry) - so alpha1 often gets bumped up. My alpha1 is currently set high since I have not yet integrated the IMU on the UBR-1 into the ROS2 odometry.

When the alpha parameters are set too low, the odometry ends up driving the distribution of particles in the cloud more than the scan matcher. If your odometry is inaccurate, the robot will slowly get delocalized because the particle distribution lacks particles located at the true pose of the robot.

If the alpha parameters are set too high, the particle distribution spreads out and can induce noise in your pose estimate (and cause delocalization).

One of the best ways to test these parameters is in RVIZ. Add your laser scan to the display, and set the fixed frame of RVIZ to your map frame. Then turn the “Decay Time” of the laser way up (20-100 seconds). If your parameters are correct, the laser scans will all line up very well. If the parameters are crap, the walls raycast by the laser scanner will be very “thick” or unaligned.

To tune these parameters, I will often drop all of them lower than the default, usually something like 0.05 to 0.1 for each parameter.

A final check is to display the /particlecloud published by AMCL and make sure it isn’t diverging too much - if it is, you might have to reduce your alpha parameters. To see the particle cloud, you’ll have to switch the QoS to best effort. The image below shows what the cloud looks like when the robot is first localized, it should be a lot less spread out during normal operation:

Some Notes on Cyclone DDS

One of the biggest differences between ROS1 and ROS2 is the replacement of the single middleware with a plugin-based architecture. This allows ROS2 to use various Robotic Middle Ware (RMW) implementations. All these RMW implementations are currently based on DDS. You can read all about the details in the ROS2 Design Docs.

Over time, the supported RMW implementations have shifted and new ones have been introduced. The default is currently FastRTPS (which apparently has been renamed to FastDDS, but after the Foxy release). The newest option is CycloneDDS which uses Eclipse Cyclone DDS. Cyclone DDS has gotten a lot of praise lately, so let’s take a closer look.

RMW Implementations

Choosing between RMW implementations is still a bit of a challenge since ROS2 is still very much under active development. There are multiple tickets about FastDDS service discovery issues. CycloneDDS is less than two years old, which means it is still under very active development and might not be fully featured, but it is supposed to be really highly performant. Mixing multiple implementations at runtime has noted caveats.

Luckily, it’s very easy to switch between implementations by simply setting the RMW_IMPLEMENTATION environment variable (assuming the selected implementation is built/installed).

When switching between implementations, be sure to stop the ros2 dameon so that it gets restarted with the proper RMW implementation:

ros2 daemon stop

First you’ve heard of the ROS2 daemon? Check out this ROS Answers post which contains the best description I’ve seen.

Debugging Issues

While FastDDS was mostly working out of the box, the whole service problem was wreaking havoc on setting/getting parameters – and I’ve been tuning parameters frequently. I went ahead and set the RMW_IMPLEMENTATION to rmw_cyclonedds_cpp, or so I thought.

I noticed that service discovery wasn’t much better. Then I noticed on the robot I had set RMW_IMPLEMTATION - so I fixed the spelling mistake. Now everything should totally work great!

Wrong.

On the robot, discovery worked fine and services worked great - but half or more of the nodes couldn’t be seen by my laptop. Restarting launch files resulted in different nodes often missing!

I started to debug and came across the ddsperf tool. If you’re using ROS2 on MacOSX you’ll want to check out this issue on how to install ddsperf.

Multiple Network Interfaces

Running ddsperf sanity gave an interesting warning on the robot:

ddsperf: using network interface enp3s0 (udp/10.42.0.1) selected arbitrarily from: enp3s0, wlp2s0

The UBR-1 has two network interfaces: wlp2s0 is a wifi connection to the outside world and enp3s0 is an internal ethernet port which only talks to the robot hardware. Apparently, my nodes were frequently using the wrong network interface. The upstream Cyclone DDS README does mention, way down the page, that “proper use of multiple network interfaces simultaneously will come, but is not there yet.”

The configuration guide states that the selection of network adapter prefers non-link-local interfaces, but apparently something is tripping it up in detecting that the ethernet interface is configured that way.

The work around is to set a NetworkInterfaceAddress in the CYCLONEDDS_URI environment variable:

export CYCLONEDDS_URI='<CycloneDDS><Domain><General><NetworkInterfaceAddress>wlp2s0</NetworkInterfaceAddress></General></Domain></CycloneDDS>'

If you’re prone to typos, and want to make sure you’re actually running the expected RMW interface, I’d recommend this command:

ros2 doctor --report | grep middleware

After a few seconds, you should see:

middleware name    : rmw_cyclonedds_cpp

I actually setup an alias in my bashrc so that which_rmw runs that command. Once I settled on using Cyclone DDS as my new default, I also added the RMW_IMPLEMENTATION and CYCLONEDDS_URI settings to the bashrc on the robot.

Final Thoughts

Once I worked through the configuration issue, CycloneDDS appears to be the most stable of the few RMW implementations I’ve tried. I haven’t actually tested the performance head-to-head, but others have.

I would recommend looking at the Configuration section of the upstream Eclipse CycloneDDS project. This contains a bunch of useful information about what you can specify in the CYCLONEDDS_URI. The Guide to Configuring is also very worth reading. It’s honestly a great resource for simply understanding all those things you hoped you’d never need to learn about DDS.

ROS2 on MacOSX Catalina

Back in 2014 or so, I had ROS1 running on my Mac. It took me a couple days to install and build dependencies. It was quite unstable. This weekend I got a new Macbook Pro (to replace my 2016 Macbook Pro, you know, the one with that great keyboard). I decided to also try setting up ROS2 on it, mainly for native RVIZ. It turned out to be somewhat straight-forward.

As a note, I really didn’t want to do too much mangling of my very nice and very new Macbook Pro - so I actually haven’t disabled System Integrity Protection. So far everything is working (with some caveats on workflow noted below).

First off, newer Macbooks are running Catalina (OSX 10.15) - which is not a supported release. ROS2 (even the newest Foxy release) still targets OSX 10.14 Mojave. This means we absolutely have to build from source for Catalina. I started by following the from-source installation instructions. I’d suggest going through the dependency installations listed there and then applying the patches in the next several sections of this post BEFORE actually running the colcon commands to build anything.

Installing XCode

The ROS2 instructions work for installing the XCode command line utilities, but it seems that I also needed to install XCode from the App store AND start the XCode GUI in order to finish the installation.

Some Paths

I had to add the following to my ~/.zshrc to get the various visual tools to compile:

export Qt5_DIR=/usr/local/opt/qt5/lib/cmake
export PATH=/usr/local/opt/qt5/bin:$PATH

The end of /usr/include

One of the bigger changes in MacOSX Catalina is the removal of /usr/include. Apparently the files have mostly moved to /Library/Developer. As far as I could tell, this really only affects the OGRE build - which needs access to various system header files. The fix is to set CMAKE_OSX_SYSROOT:

colcon build --symlink-install --cmake-args CMAKE_OSX_SYSROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk

I still had some build issues due to missing CoreFoundation, which were fixed by this hack:

sudo cp /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/CoreFoundation.framework/Versions/A/Headers/* /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/CoreFoundation

RVIZ Crashes

I was super excited to see a laser scan show up on my Macbook! I then decided to disable the laser scan and check out some other data - but RVIZ immediately crashed. I spent a while debugging (even after looking through the issues on GitHub) before realizing the fix I had come up with was already merged into the ros2 branch, but not the foxy branch I was building. You’ll want to run from the ros2 branch or at least include PR #572 to really be able to use RVIZ at all in MacOSX.

RQT Issues

Next I tried to run rqt_console - it wouldn’t run, giving me some crazy trace about not finding RMW implements - but I had already tested the Python demo nodes, so I knew that things were working there.

I eventually determined that I could run rqt and then load the desired plugin. I haven’t gone back and debugged this more yet.

Autocomplete Issues

When I first installed things, I got the following warning when sourcing my workspace:

fergs@MacBook-Pro foxy % source install/setup.zsh
zsh compinit: insecure directories, run compaudit for list.
Ignore insecure directories and continue [y] or abort compinit [n]?

I accepted the insecure directories a number of times, but eventually got frustrated that autocomplete seemed to not be working. Finally, I started looking into it:

fergs@MacBook-Pro foxy % compaudit
There are insecure directories:
/usr/local/share/zsh/site-functions
/usr/local/share/zsh

Apparently the fix is quite simple. From stack overflow:

sudo chmod -R 755 /usr/local/share/zsh

Debugging with SIP Enabled

One challenge I did come across was that you can’t just run lldb with ROS2 due to System Integrity Protection enabled. This is because the default lldb executable is located in one of those key system folders and so it strips off all the DYLD_LIBRARY_PATH stuff. The workaround is actually pretty simple - use a different lldb, for instance:

/Applications/Xcode.app/Contents/Developer/usr/bin/lldb ~/foxy/install/rviz2/bin/rviz2

Remaining Issues

There are still a few issues to resolve:

  • Dark mode has some issues - but I’ve opened a PR for that.
  • RQT tools not loading without starting rqt first and selecting plugin really slows down workflow. This one might actually be related to still having System Integrity Protection enabled?