Navigation in ROS2

With a map having been built and localization working, it was time to get autonomous navigation working on the UBR-1.

Comparing with ROS1

While many of the ROS1 to ROS2 ports basically amount to a find-and-replace of the various ROS interfaces and CMake directives, navigation got a fairly extensive re-architecture from the package that I’ve helped maintain over the past seven years.

A number of the plugin interfaces in ROS1 have been replaced with action interfaces. While the planners themselves are still plugins, they are each loaded into a server node which exposes an action interface to access the planning functions.

One of the biggest changes in ROS2 is the use of behavior trees to structure the recovery behaviors and connect the various action-based interfaces. This allows quite a bit of interesting new functionality, such as using different recovery behaviors for controller failures than are used for planning failures and allowing quite a bit of control over when to plan. There are already dozens of behavior tree nodes and a there is also a new tutorial on writing custom behavior tree nodes.

In ROS1, the navigation stack contains two local “planners”: trajectory_rollout and dwa (the Dynamic Window Approach). ROS2 fixes this horrid naming issue and properly calls these “controllers”, but only includes the updated dwb implementation of the Dynamic Window Approach. As far as I can remember, I’ve only ever used trajectory rollout as I was never sold on DWA. I’m still not sold on DWB.

Initial Launch Files

Setting up the navigation to run followed a pretty similar pattern to setting up SLAM and localization: I copied over the example launch files from the nav2_bringup package and started modifying things. The real difference was the magnitude of things to modify.

A note of caution: it is imperative that you use the files from the proper branch. Some behavior tree modules have been added in the main branch that do not yet exist in the Foxy release. Similarly some parameters have been renamed or added in new releases. Some of these will likely get backported, but the simplest approach is to use the proper launch and configuration files from the start.

My initial setup involved just the base laser scanner. I configured both the local and global costmaps to use the base laser. It is important to set the robot_radius for your robot (or the footprint if you aren’t circular). The full configuration can be found in the ubr1_navigation package, but here is a snippet of my local costmap configuration:

local_costmap:
   local_costmap:
     ros__parameters:
       global_frame: odom
       robot_base_frame: base_link
       rolling_window: true
       width: 4
       height: 4
       resolution: 0.05
       robot_radius: 0.2413
       plugins: ["voxel_layer", "inflation_layer"]
       inflation_layer:
         plugin: "nav2_costmap_2d::InflationLayer"
         cost_scaling_factor: 3.0
       voxel_layer:
         plugin: "nav2_costmap_2d::VoxelLayer"
         enabled: True
         publish_voxel_map: True
         origin_z: 0.0
         z_resolution: 0.05
         z_voxels: 16
         max_obstacle_height: 2.0
         mark_threshold: 0
         observation_sources: scan
         scan:
           topic: /base_scan
           max_obstacle_height: 2.0
           clearing: True
           marking: True
           data_type: "LaserScan"

A fairly late change to my configuration was to adjust the size of the local costmap. By default, the turtlebot3 configuration uses a 3x3 meter costmap, which is pretty small. Depending on your top speed and the simulation time used for the DWB controller, you will almost certainly need a larger map if your robot is faster than a turtlebot3.

With this minimal configuration, I was able to get the robot rolling around autonomously!

Tilting Head Node

The UBR-1 has a depth camera in the head and, in ROS1, would tilt the camera up and down to carve out a wider field of view when there was an active navigation goal. The tilt_head.py script also pointed the head in the direction of the the local plan. The first step in adding the head camera to the costmaps was porting the tilt_head.py script to ROS2.

One complication with a 3d sensor is the desire to use the floor plane for clearing the costmap, but not marking. A common approach for this is to setup two observation sources. The first source is setup to be the marking source and has a minimum obstacle height high enough to ignore most noise. A second source is set to be a clearing source and uses the full cloud. Since clearing sources are applied before marking sources, this works fine and won’t accidentally over clear:

observation_sources: base_scan tilting_cloud tilting_cloud_clearing
base_scan:
  topic: /base_scan
  max_obstacle_height: 2.0
  clearing: True
  marking: True
  data_type: "LaserScan"
tilting_cloud:
  topic: /head_camera/depth_downsample/points
  min_obstacle_height: 0.2
  max_obstacle_height: 2.0
  clearing: False
  marking: True
  data_type: "PointCloud2"
tilting_cloud_clearing:
  topic: /head_camera/depth_downsample/points
  min_obstacle_height: 0.0
  max_obstacle_height: 0.5
  clearing: True
  marking: False
  data_type: "PointCloud2"

In setting this up, I had to set the minimum obstacle height quite high (0.2 meters is almost 8 inches). This is a product of the robot not being entirely well calibrated and the timing accuracy of the sensor causing the points to sometimes rise out of the plane. We’ll improve that below.

You’ll notice I am using a “depth_downsample/points” topic. As inserting full VGA clouds into the costmap would be prohibitively costly, I downsample the depth image to 160x120 and then turn that into a point cloud (a common approach you’ll find on a number of ROS1 robots). This was added to my head_camera.launch.py:

# Decimate cloud to 160x120
ComposableNode(
  package='image_proc',
  plugin='image_proc::CropDecimateNode',
  name='depth_downsample',
  namespace=LaunchConfiguration('namespace'),
  parameters=[{'decimation_x': 4, 'decimation_y': 4}],
  remappings=[('in/image_raw', 'depth_registered/image_rect'),
              ('in/camera_info', 'depth/camera_info'),
              ('out/image_raw', 'depth_downsample/image_raw'),
              ('out/camera_info', 'depth_downsample/camera_info')],
),
# Downsampled XYZ point cloud (mainly for navigation)
ComposableNode(
  package='depth_image_proc',
  plugin='depth_image_proc::PointCloudXyzNode',
  name='points_downsample',
  namespace=LaunchConfiguration('namespace'),
  remappings=[('image_rect', 'depth_downsample/image_raw'),
              ('camera_info', 'depth_downsample/camera_info'),
              ('points', 'depth_downsample/points')],
),

As with the several other of the image_proc components I’ve work with, the CropDecimateNode needed some patches to actually function.

With this in place, things almost worked. But I was getting a bunch of errors about the sensor origin being off the map. This made no sense at first - the robot is clearly on the map - I can see it right in RVIZ! I then started reviewing the parameters:

z_resolution: 0.05
z_voxels: 16
max_obstacle_height: 2.0

At which point I realized that 0.05 * 16 = 0.8 meters. Which is shorter than my robot. So, the sensor was “off the map” - in the Z direction. Pesky 3d.

I updated the voxel configuration so that my map was indeed two meters tall and all my sensor data was now in the costmap.

z_resolution: 0.125
z_voxels: 16
max_obstacle_height: 2.0

Unfortunately, even with my 0.2 meter minimum obstacle height I was still getting stray noisy pixels causing the robot to navigate somewhat poorly at times. In particular, it decided to really come to a halt during a talk and demo to the Homebrew Robotics Club last week.

A Custom Costmap Layer

Setting the minimum obstacle height super high is really not a great idea to begin with. With the Fetch Mobile Manipulator we implemented a custom costmap layer that would find the ground plane using OpenCV and then split the cloud into clearing and marking pixels. This largely avoids the timing and calibration issues, although the marking pixels may be slightly off in their location in the costmap due to those timing and calibration issues. On the Fetch, we were able to get the minimum obstacle height of that moving sensor down to 0.06 meters. In addition, this layer subscribes to the depth image, rather than a 3d point cloud, which allows us to do certain pre-processing less expensively in 2d.

After the HBRC failures, I decided to port the FetchDepthlayer to ROS2. You can find it in the ubr1_navigation package. The initial port was pretty straight forward. The costmap_2d package hasn’t gotten too many updates, other than a nav2 prefix for the package and namespaces.

One interesting find was that the sensor_msgs/PointCloud message has been deprecated and slated for removal after Foxy. There are a number of places where the PointCloud interface was used a simple way to publish debug points (the message is simply an array of geometry_msgs/Point32 instead of the much more complicated PointCloud2 messages which has a variable set of fields and pretty much requires the use of a modifier and iterator to really fill or read). I decided to get rid of the deprecation notices and port to PointCloud2 for the debugging topics - you can see how much more complicated the code now looks.

Finally, as I started to test the code on the robot, I ran into a few further issues. Apparently, ROS2 does not just have Lifecyle Nodes, there are also Lifecycle Publishers. And nav2 uses them. And you need to call on_activate on them before publishing to them. You can see my final fixes in this commit.

A final improvement to the node was to remove the somewhat complicated (and I’m guessing quite slow) code that found outliers. Previously this was done by finding points in which less than seven neighbors were within 0.1m away, now I use cv::medianBlur on the depth image.

The image below shows the costmap filled in for a box that is shorter than my laser scanner, but detected by camera. The red and green points are the marking and clearing debug topics from the depth layer:

Test on Robots!

One of the more interesting moments occurred after I updated my sources for navigation2. Suddenly, the robot was unable to complete goals - it would get to the goal and then rotate back and forth and eventually give up. I ended up tracking down that a major bug had been introduced during a refactor which meant that when comparing the goal to the current pose they were not in the same frame! The goal would be in the map frame, but the local controller was taking robot pose in the odom frame. The only time a goal could succeed was if the origins of the map and odom frame were aligned (which, coincidentally, probably happens a lot in simulation). My fix was pretty simple and the bug never made it into released Debians in Foxy, but it did exist for almost a month on the main branch.

Tuning the Local Controller

As a side effect of the goal bug, I ended up spending quite a bit of time tuning the local controller (thinking that it was responsible for the issues I was seeing). Both the overall architecture and the parameters involved are somewhat different from ROS1.

Let’s first mention that the controller server implements a high pass filter on the odometry topic to which it subscribes. This filter has three parameters: min_x_velocity_threshold, min_y_velocity_threshold, and min_yaw_velocity_threshold. While debugging, I ended up updating the descriptions of these parameters in the navigation documentation because I was at first trying to use them as the minimum velocities to control, since the original description was simply “Minimum velocity to use”.

The controller server still loads the controller as a plugin, but also has separate plugins for the goal checker and progress checker. The SimpleProgressChecker is pretty straight forward, it has two parameters and requires that the robot move at least X distance in T time (default 0.5 meters in 10 seconds).

The SimpleGoalChecker implements the goal check that previously was part of the controller itself. As in ROS1, it has three parameters:

  • xy_goal_tolerance is how close the robot needs to get to the goal. By default, the xy tolerance is set quite course. I tightened that tolerance up on the UBR-1.
  • stateful is similar to “latching” in ROS1 stack. Once the robot has met the xy_goal_tolerance, it will stop moving and simply rotate in place.
  • yaw_goal_tolerance is how close to the heading is required to succeed.

One of the enhancements of DWB over the DWA implementation is that it splits each of the individual elements of trajectory scoring into a separate plugin. This makes it easier to enable or disable individual elements of the scoring, or add custom ones. For instance, you could entirely remove the PathAlign element if it is causing issues and you don’t care if your robot actually follows the path.

There are two major hurdles in tuning the DWB controller: balancing the path and goal scores, and balancing smooth operation versus actually getting to the end of the trajectory (as opposed to just stuttering towards the goal slowly). I think the first one is well tuned on the UBR-1, but I’ve not yet fixed the stuttering to the goal well enough to be happy with the controller. You can find that several others have also struggled to get the performance they were seeking.

Next Steps

Now that I’ve got navigation mostly working, the next big hurdle is manipulation. I have MoveIt2 compiled, but am still working through the requisite launch files and other updates to make things work for my robot. And then onto the real goal of every roboticist: having my robot fetch a beer.

Porting EtherbotiX to ROS2

Now that the UBR-1 is running pretty well under ROS2, I have started to also port Maxwell and the Robomagellan robot to ROS2. Both depend on my etherbotix drivers package, so that is the first package to port.

Etherbotix in ROS2

In ROS1, the etherbotix_python package was written in Python and leveraged some code from the arbotix_ros package (mainly for the controllers). I decided while porting things to ROS2 that I would migrate to using C++ and leverage the robot_controllers which I had recently ported to ROS2.

Since this was effectively a new package, I used the ros2 pkg create command to setup the CMakeLists.txt, package.xml and other boiler plate stuff.

I then started to setup the node based off my very old asio_ros example. At some point I should probably setup a test to see how accurate ROS2 timers are, but I knew for sure that this code would work so I stuck with boost::asio.

Python Wrappers

In ROS1 I had a number of scripts for interacting with the Etherbotix. For some of these, such as read_etherbotix, it was easy to port them to C++. For others, such as my motor_trace script which uses matplotlib, I really wanted to keep the majority of the script in Python. To accomplish this, I wrapped my underlying C++ drivers using Boost Python.

It required a bit of CMake:

find_package(PythonLibs REQUIRED)
find_package(Boost REQUIRED python)

ament_python_install_package(${PROJECT_NAME})

add_library(etherbotix_py SHARED ...)
set_target_properties(etherbotix_py PROPERTIES
  LIBRARY_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}
  PREFIX "")
target_link_libraries(etherbotix_py
  ${Boost_LIBRARIES}
  ${PYTHON_LIBRARIES}
)
ament_target_dependencies(etherbotix_py ...)

install(
  TARGETS etherbotix_py
  DESTINATION "${PYTHON_INSTALL_DIR}/${PROJECT_NAME}"
)

I then created a file etherbotix/__init__.py:

from etherbotix.etherbotix_py import Etherbotix

This allowed me to import my C++ class Etherbotix in my scripts with:

from etherbotix import Etherbotix

Why Boost Python? Aren’t there newer things out there? Yes, there are newer things out there, and I spent a while trying out pybind11 but I just couldn’t get it to work and reverted to what I knew already.

Exporting Libraries

Most of the ROS2 code I’ve ported has thus far been various nodes, rather than a library that will be used by other packages (the one exception being robot_controllers_interface). I hadn’t previously paid super close attention to how things are exported. There are a few declarations that get placed at the end of your CMakeLists.txt:

# Tell downstream packages where to find our headers
ament_export_include_directories(include)
# Tell downstream packages our libraries to link against
ament_export_libraries(my_library)
# Help downstream packages to find transitive dependencies
ament_export_dependencies(
  rclcpp
)
ament_package()

This code snippet is from my ros2_cookbook project on GitHub. You can also find the full commit for enabling downstream packages to build upon the library exported by the Etherbotix library.

Maxwell Bringup

So why did I have to export the Etherbotix libraries? I had to write a custom controller for the torso on Maxwell, and that controller had to access an Etherbotix instance. This involved a bit of custom code to add both a custom JointHandle and custom Controller. The controller also automatically loads the custom JointHandle.

One of the advantages of using robot_controllers is that several controllers worked out of the box. I had never actually updated the old Python code to work with Maxwell’s parallel jaw gripper, but with the ParallelGripperController and a ScaledMimicController for the second finger, everything worked out of the box.

The controllers for Maxwell are now all setup. You can see the configuration here.

Some Fancy-ish C++

Throughout the driver code we have to construct buffers of various bytes to send to the hardware - often involving arrays of bytes of varying length. This is generally really clean in Python, but in C++ usually results in something like this:

uint8_t len = 0;
buffer[len++] = 0xff;
buffer[len++] = 0xff;
buffer[len++] = Etherbotix::ETHERBOTIX_ID;
buffer[len++] = 5;  // Length of remaining packet
buffer[len++] = dynamixel::WRITE_DATA;
if (motor_idx == 1)
{
  buffer[len++] = Etherbotix::REG_MOTOR1_VEL;
}
else
{
  buffer[len++] = Etherbotix::REG_MOTOR2_VEL;
}
buffer[len++] = (desired_velocity_ & 0xff);
buffer[len++] = (desired_velocity_ >> 8);
buffer[len++] = dynamixel::compute_checksum(buffer, 9);

I decided to come with a cleaner approach, since there are quite a few instances of this throughout the code base. I ended up creating a get_write_packet function like I had in the Python code, with this signature:

inline uint8_t get_write_packet(
   uint8_t* buffer,
   uint8_t device_id,
   uint8_t address,
   std::vector<uint8_t> params)

And then using an initializer list to create the variable-size buffers of bytes to send:

uint8_t len = dynamixel::get_write_packet(
  buffer,
  Etherbotix::ETHERBOTIX_ID,
  (motor_idx == 1) ? Etherbotix::REG_MOTOR1_VEL : Etherbotix::REG_MOTOR2_VEL,
  {static_cast<uint8_t>(desired_velocity_ & 0xff),
   static_cast<uint8_t>(desired_velocity_ >> 8)}
);

Yes, this probably is not the fastest code (since it passes the vector of bytes by-value), but I like how it cleans up the code and none of these vectors are all that large. You can see the full commit on GitHub

Diagnostics

The final part of my etherbotix port was to add diagnostics back into the node. diagnostic_msgs are an underused feature in ROS. They offer a common way to send information about the status of things, mostly hardware. Drivers like urg_node, the joy node, and even higher level filters like robot_localization publish diagnostics.

While diagnostics can be passed through analyzers to output a diagnostics_agg topic, I often just use the rqt_runtime_monitor which access the raw diagnostics topic. I found it was missing in ROS2 - but there is a port, which hasn’t been merged yet. You can find that port here.

Next Steps

I’ve made some major progress on running navigation2 on the UBR-1 and that will be the subject of a post next week. After that I’ll be continuing on bringup of the robomagellan robot, including integrating robot_localization in ROS2.

Mapping and Localization in ROS2

Now that the drivers are pretty much operational for the UBR-1 robot under ROS2, I’m starting to work on the higher level applications. The first step was building a map and setting up localization against that map.

SLAM

In ROS1 there were several different Simultaneous Localization and Mapping (SLAM) packages that could be used to build a map: gmapping, karto, cartographer, and slam_toolbox. In ROS2, there was an early port of cartographer, but it is really not maintained. The other package that has been ported to ROS2 is slam_toolbox, which is basically slam_karto on steroids - the core scan matcher is the same, but everything else has been rewritten and upgraded.

Installation of slam_toolbox is super easy:

sudo apt-get install ros-foxy-slam-toolbox

I then created a launch file, which is an updated version of the online_sync_launch.py found within slam_toolbox:

from launch import LaunchDescription
from launch.actions import DeclareLaunchArgument
from launch.substitutions import LaunchConfiguration
from launch_ros.actions import Node
from ament_index_python.packages import get_package_share_directory


def generate_launch_description():
    use_sim_time = LaunchConfiguration('use_sim_time')

    declare_use_sim_time_argument = DeclareLaunchArgument(
        'use_sim_time',
        default_value='false',
        description='Use simulation/Gazebo clock')

    start_sync_slam_toolbox_node = Node(
        parameters=[
          get_package_share_directory("ubr1_navigation") + '/config/mapper_params_online_sync.yaml',
          {'use_sim_time': use_sim_time}
        ],
        package='slam_toolbox',
        node_executable='sync_slam_toolbox_node',
        name='slam_toolbox',
        output='screen')

    ld = LaunchDescription()

    ld.add_action(declare_use_sim_time_argument)
    ld.add_action(start_sync_slam_toolbox_node)

    return ld

My updates were basically just to use my own config.yaml file. In that YAML file I had to update the frame ids (I don’t use a base_footprint, and my robot has a base_scan topic rather than scan). There are dozens of parameters to the Karto scan matcher and you can see the entire file on GitHub but the basic changes I had to make were:

slam_toolbox:
  ros__parameters:

    # ROS Parameters
    odom_frame: odom
    map_frame: map
    base_frame: base_link
    scan_topic: /base_scan

Now we can run the launch file and drive the robot around to build a map. We can also view the map in RVIZ. To get the map to come through, you will likely have to expand the options under the topic name and change the durability to transient local. Even though the documentation on ROS2 QoS says that volatile subscriber is compatible with a transient local publisher, I’ve found it doesn’t always seem to work right:

Now that we’ve built a map, it is time to save the map. The command is quite similar to ROS1, except you must pass the base name of the map (so here, I’m passing map, which means it will save map.yaml and map.pgm in the local directory):

ros2 run nav2_map_server map_saver_cli -f map

Next we can create a launch file to display the map - I used the example in nav2_bringup as my starting place and changed which package the map was stored in. You can find my launch file in the ubr1_navigation package. I started my localization launch file and opened RVIZ to find:

It turned out I had to adjust the free_thresh threshold in the map.yaml down to 0.196 (the same value in ROS1) for the map to look correct:

There are numerous parameters in slam_toolbox and many more features than I could possibly cover here. For a good introduction, check out ROSCon 2019 Talk by Steve Macenski.

Localization

While there are a variety of mapping options in ROS1 and some in ROS2, for localization it really is just Adaptive Monte Carlo Localization (AMCL). There is some ongoing work towards more modern localization solutions in ROS2, but it would seem to be a long way off.

The launch file we copied over for running the map_server also included AMCL in it (hence the name localization.launch.py).

For the most part, there are only a few parameters to tune in AMCL to generally get decent results:

amcl:
  ros__parameters:
    alpha1: 0.25
    alpha2: 0.2
    alpha3: 0.15
    alpha4: 0.2
    base_frame_id: "base_link"
    global_frame_id: "map"
    laser_model_type: "likelihood_field"
    max_beams: 60
    max_particles: 2000
    min_particles: 500
    odom_frame_id: "odom"
    robot_model_type: "differential"
    tf_broadcast: true
    z_hit: 0.6
    z_max: 0.05
    z_rand: 0.3
    z_short: 0.05

Before trying to tune AMCL, you really need to make sure your TF and odometry are setup correctly, there are some points in the Navigation Tuning Guide, which was written for ROS1, but is generally very much true in ROS2.

The most important parameters are setting the alphaX parameters to model your odometry noise. By default all of them are set to 0.2, but they should be adjusted based on the quality of your odometry:

  • alpha1 - noise in rotation from rotational motion
  • alpha2 - noise in rotation from translational motion
  • alpha3 - noise in translation from translational motion
  • alpha4 - noise in translation from rotational motion

These are somewhat intuitive to understand. For most robots, if they drive forward in a straight line, the odometry is very accurate - thus alpha3 is often the lowest value parameter. When the robot turns in place, it probably has more noise (unless you have a fantastically tuned gyro being merged with the wheel odometry) - so alpha1 often gets bumped up. My alpha1 is currently set high since I have not yet integrated the IMU on the UBR-1 into the ROS2 odometry.

When the alpha parameters are set too low, the odometry ends up driving the distribution of particles in the cloud more than the scan matcher. If your odometry is inaccurate, the robot will slowly get delocalized because the particle distribution lacks particles located at the true pose of the robot.

If the alpha parameters are set too high, the particle distribution spreads out and can induce noise in your pose estimate (and cause delocalization).

One of the best ways to test these parameters is in RVIZ. Add your laser scan to the display, and set the fixed frame of RVIZ to your map frame. Then turn the “Decay Time” of the laser way up (20-100 seconds). If your parameters are correct, the laser scans will all line up very well. If the parameters are crap, the walls raycast by the laser scanner will be very “thick” or unaligned.

To tune these parameters, I will often drop all of them lower than the default, usually something like 0.05 to 0.1 for each parameter.

A final check is to display the /particlecloud published by AMCL and make sure it isn’t diverging too much - if it is, you might have to reduce your alpha parameters. To see the particle cloud, you’ll have to switch the QoS to best effort. The image below shows what the cloud looks like when the robot is first localized, it should be a lot less spread out during normal operation: