5 Things ROS2 Needs in 2020

I’ve been using ROS2 quite a bit over the past several months. As I’ve previously mentioned, it would appear there aren’t too many real robots running ROS2 yet. We have a bit of a chicken-and-egg problem where the tools are not yet fully ready for real robots, but until people start using ROS2 on real robots nobody knows the real pain points.

There are many, many things that could be done in ROS2. But there is limited time to implement them all - so we need to focus on those that enable robots and their developers to “survive”.

I often get asked if ROS2 is ready for prime time. My answer for a long time was “no”. I’m going to upgrade it to “maybe, depends on what you’re doing” at this point. This post describes five things that I think would make the answer “hell yes” for most roboticists. I actually hope this post ages poorly and that all these things come to happen in ROS2.

Automatic QoS for RVIZ, rcl2cli

Quality of Service (QoS) is probably the biggest change between ROS1 and ROS2 - it’s also the one that causes the most headaches from what I can tell. The ROS2 Foxy release adds a --verbose option to the ros2 topic info command which is a huge step in the right direction. This lets you quickly diagnose when a publisher and subscriber are using incompatible QoS.

rosbag2 got a huge upgrade in ROS2 Foxy: it automatically determines the proper settings for Quality of Service (QoS) so that it always connects to the publisher you’re trying to record (note: if multiple publishers are publishing to the same topic with different QoS it may not work - but really, who does that?).

Now we need that feature in RVIZ2 and the command line utilities (CLI). These are debugging tools, so they need to be able to “just work” in most scenarios.

Since most of the time you’re using RVIZ2 to connect to sensor data, which is often published with a non-default QoS (the sensor data profile), it’s absolutely bonkers that RVIZ uses the default QoS on everything (which is incompatible with sensor profile). Even something as simple as latched topics won’t work by default.

This is not an easy ask. It will involve significant changes to RVIZ as well as changes to lower level packages like message_filters, but I’m pretty sure this is the single biggest bang-for-your-buck improvement that will make ROS2 work better for robot developers.

Documentation

Ok, I’m sounding like a broken record (or the squeaky caster on your 8 year old mobile manipulator), but this is really important.

I’m not just talking about the lack of tutorials here. One of the things that made ROS great for new developers in the 2011-2014 era (when it experienced huge growth in the community), was a very polished and up-to-date wiki. If you wanted to find out about a package, you could go to wiki.ros.org/package_name - and the documentation was right there (or if it wasn’t, you had a pretty good idea this package wasn’t ready for prime time). With ROS2, we don’t have a centralized place for documentation yet - and I think that is holding the community growth back.

There is also the issue of “user documentation”. Nearly everything for ROS2 is written assuming an expert programming background (even more so than ROS1 documentation). Reading the source code is not how you’re supposed to learn how to run a ROS driver for a laser scanner.

Building out a community is super important. The best way to get a bug fixed is to find a developer who needs it fixed. I’ve only been using ROS2 on-and-off for a couple months - and in that time I’ve fixed half a dozen bugs across multiple ROS2 packages, and even taken on maintaining the ROS2 port of urg_node and the related packages.

Subscriber Connect Callbacks

Now we’ll jump into a super technical issue - but the impact is huge - especially for those doing perception (which is, you know, generally a big part of robotics). When creating a publisher in ROS1, you could register a callback which would get called whenever a subscriber connected or disconnected. This feature doesn’t yet exist in ROS2, but I think it is essential for real robotics systems. Here’s why:

Robots can generate lots of sensor data - especially when you add processing pipelines into the mix. Sometimes you might need a high-resolution point cloud with color and depth information. Sometimes you need a low-res colorless point cloud. This is especially true when the robot system does multiple tasks. For instance, imagine a robot that is both mobile and a manipulator - for navigating the environment it wants that high frame rate, low-res point cloud for collision avoidance. When the mobile manipulator gets to the destination it wants to switch to a high-res point cloud to decide what to grab.

Sometimes you literally cannot be publishing all the data streams possible because it would overwhelm the hardware (for instance, saturating the USB bus if you were to pull depth and color and IR from most RGBD sensors at the same time).

In ROS1, you could create “lazy publishers” so that the creators of these intensive data types would only create and publish the data when someone was listening. They would be alerted to someone listening by the connect callback. The lack of lazy publishers throughout various drivers and the image_proc and depth_image_proc> packages is a real challenge to building high performance perception systems. When people ask me “is ROS2 ready?”, my first question these days is “how much perception/vision are you doing?”.

To be clear, there are workarounds available in some cases. If you’re creating a publisher yourself, you can:

  • Create a loop that “polls” whether there are subscribers (using get_subscription_count) as I did right now in the openni2_camera package.
  • Use parameters to dynamically reconfigure what is running. While this might work in some cases (and maybe even be a preferred solution for some use cases), it likely leads to a more brittle system.
  • Re-architect your system never need lazy publishers by hard coding exactly what you need for a given robot. While some of this is likely to happen in a more production environment, it doesn’t lend itself to code reuse and sharing which was one of the major selling points of ROS1.

Note that I said, “if you’re creating a publisher yourself”. There are lots of packages that are widely relied on in ROS1 whose ROS2 ports are crippled or broken due to the lack of subscriber connect callbacks:

  • message_filters
  • image_transport
  • image_proc
  • depth_image_proc

Related answers.ros.org:

Developer Involvement

Note: in the month that I’ve been writing this post, a number of questions have been answered, so we’re already getting there!

I remember folks joking that ROS Answers was misnamed, because there were no answers there, just questions. It’s actually not true - unless you search for the ROS2 tag.

There are a lot of really good questions there. Like, stuff that’s not anywhere in the documentation and is probably quite relevant to a large number of users. Here’s a few examples:

ROS2 developers, please take note: we’ve got lots of great features in this system, please help your users learn to how to actually use them - maybe they’ll even help contribute back!

Your Robot on ROS2

There’s probably a bunch of other bugs/issues/etc hiding in the weeds. Your robot is probably not exactly the same as mine - and your use cases are going to be different. We need more robots running ROS2 to dig into things. The good news is: you can install ROS1 and ROS2 on the same system and switch back and forth pretty easily.

UBR-1 on ROS2 (Part 3)

In my last post I finished porting the robot_controllers package to ROS2. The last piece of hardware to support the UBR-1 in ROS2 was the head camera, and it turned out to be the most challenging part.

Note: I’ve created a new ros2 tag from which you can find all my ROS2 posts.

Porting openni2_camera

The head camera in the UBR-1 is a vintage 2013-era Primesense Carmine. Getting it to run in ROS2 meant porting the openni2_camera package.

In ROS1 the camera driver was implemented as a nodelet for efficiency when pairing it with components from image_pipeline. Nodelets have been replaced by components in ROS2. The setup for a ROS2 component is actually quite simple and this is a part of ROS2 that is very well documented:

An aside: most components are derived from rclcpp::Node. A number of calls in the rclcpp API require that you pass an rclcpp::Node::SharedPtr. No problem, Node derives from std::enable_shared_from_this, so you can call shared_from_this() to get the pointer. BUT - beware, this is only works if the instance was created as a shared pointer (not a problem when using the rclcpp ComponentManager to load your component), AND after the object is fully constructed. I spent a good 20-30 minutes trying to figure out whey my camera driver kept throwing a bad_weak_ptr - even reading through the code for ComponentManager, before realizing it was happening in a function being called from the constructor!

The next challenge in porting openni2_camera is that ROS2 doesn’t yet support subscriber connect callbacks. These are frequently used in ROS1 to implement lazy publishers. Basically, when creating a publisher you would register a callback which gets called whenever someone subscribes or unsubscribes from your topic. This way you can decide whether to actually do heavy work - or in the case of a driver, possibly start or stop data transmission from the hardware.

In the case of openni2_camera - it is actually extremely important to lazy publish. There are multiple camera streams, but you cannot actually stream them all at once so you NEED lazy publishers. I ended up working around this by creating a timer which calls what would be the subscriber connect callback every second. It’s a bit of overhead, but way less than publishing lots of unneeded data. Along the way, I found out that the count_subscribers function in rclcpp doesn’t respect remapping, and so the only way to properly get the subscriber count is to call get_subscription_count on the publisher. Since the openni2_camera uses image_transport this meant tracking down that getNumSubscribers on the the camera publisher class wasn’t implemented yet (I think the port predates get_subscription_count being added). If you want to try out the openni2_camera package in ROS2, for now you’ll need to use this PR that I created (but hasn’t been merged yet).

Loading Components

The openni2_camera driver only outputs the raw RGB camera and raw depth camera streams. To get things like sensor_msgs/PointCloud2, we need to construct pipelines using the components of depth_image_proc. In ROS1, these were implemented as nodelets. At some point they were ported to ROS2 as components, but not quite correctly (or maybe they were correct when they were ported). A pair of quick PRs got them working again as components, and got the parameters working.

The next step was to setup a launch file that loaded the camera driver, some components to rectify the RGB and Depth images, and a component to create a point cloud. In ROS1, this existed as the packages rgbd_launch and openni2_launch. Due to the lack of lazy publishers and subscribers, you really can’t build the same lazy graph that those packages defined right now. So I built a minimal graph of sensor processing that I needed in a single launch file.

With this prepared, I could run the launch file and ros2 topic hz /head_camera/depth_registered/points showed it was publishing!

ROS2 Parameters

Let’s back up and talk about that depth_image_proc parameter PR for a moment. Starting with ROS2 Dashing, parameters needed to be declared. If you try to access an undeclared parameter, you get an error, unless you use get_parameter_or - which won’t throw an error. Unfortunately, that means code that used get_parameter_or and was written pre-Dashing might just not actually load parameters correctly.

One of the big benefits of declaring all parameters is that you can do:

ros2 param list

And get all the parameters. This is actually huge - as a maintainer of navigation in ROS1, I have had more than a handful of occasions when I learned of a parameter that was undocumented but widely used, and which I didn’t even know existed. ROS2 might lack documentation for things - but the introspection is really nice and can be a work around for missing documentation.

RVIZ Woes

The next step is clearly to look at the point cloud in RVIZ2. Should be super easy, right?

Not so fast. In ROS1, any subscriber could connect to a publisher of the same topic message type. ROS2 introduces Quality of Service settings and there are incompatible matches. RVIZ connected to the topic, but didn’t show any data.

In ROS2 Foxy, you can run ros2 topic info topic_name --verbose and it will print the QoS information for each publisher and subscriber. This is super helpful - but I unfortunately didn’t find that until a few days after I was debugging this. As it turns out, depth_image_proc components publish “best effort” but RVIZ subscribes with “reliable” and this is incompatible. In Foxy, rosbag is apparently able to deduce the publisher QoS and connect correctly, I’m planning to look into how that was done and see if I can make RVIZ work a bit better (or at least add a drop down that lets us select the transport). It appears this will also require changes in the message_filters package, so it might take some time.

While we’re talking about RVIZ2 challenges, I’m also going to note this PR which works around the fact that RVIZ2 is apparently entirely single threaded and the single thread runs everything in the same update loop. Keep that in mind if things aren’t visualizing right, it may not be your code.

Finally, I got the QoS correct (by hacking depth_image_proc, since it compiles faster) and had my point cloud:

Just kidding, I still had to find out that my timestamps were wrong and message_filters was filtering out all my messages. Once I fixed that in the openni2_camera package, I got the view in RVIZ2.

Next Steps

For anyone wanting to try out the openni2_camera package in ROS2, there is more detailed information in the ticket on the openni2_camera package.

I’m continuing to clean up issues in the packages that I’m porting, but my next big step is getting navigation setup on the UBR-1 in ROS2. That post will probably come a little slower than these last few have. I’m also working on porting the robomagellan robot to ROS2.

UBR-1 on ROS2 (Part 2)

In my last post I ported robot_controller_msgs to ROS2. I’ve now ported the actual controllers and made a number of changes to the Controller and ControllerManager classes. I can now move all the UBR-1 joints, which makes it almost as happy as this:

Image from Melonee Wise

Porting Gravity Compensation

Porting the gravity compensation controller was quite easy - it has very little in the way of a ROS interface. The full commit is on GitHub.

As I mentioned in my last post, the way you declare plugins has changed a bit. In ROS1, you would have an export directive in the package.xml that noted the location of the .xml file describing your plugins. In ROS2, this has moved to the CMake:

pluginlib_export_plugin_description_file(robot_controllers_interface robot_controllers.xml)

The one interesting quirk I did come across was that my driver node uses the “robot_description” parameter, and so do most controllers. You can’t declare the same parameter twice, so I ended up with this in the controllers:

if (!node->has_parameter("robot_description"))
{
  node->declare_parameter<std::string>("robot_description", "");
}

I will also note one piece of code I use quite frequently, but haven’t really noticed used in the tutorials or demos. When declaring a parameter, you can get the value right then (very similar to how parameters in ROS1 were often used:

std::string root = node->declare_parameter<std::string>("root", "torso_lift_link");

Action Servers

Many controllers make extensive use of ROS actions. While Actions exist in ROS2, there isn’t a direct replacement for the SimpleActionServer and the API has changed quite a bit.

There are several tutorial on ROS2 actions, but they are limited to Python. For C++, you have to look at ros2/examples.

In ROS1 each of the action-based controllers in robot_controllers had an execute thread for the action server and the main thread would call update() periodically. With the new APIs things are a bit different. The examples show spinning up a new thread for the execute callback, but I ended up refactoring things to use the existing update() function to do more of the work and spun up a rclcpp::Timer to publish feedback outside of update().

We’ll walk through the FollowJointTrajectoryController as an example. In the handle_goal callback I do some minimal checking of the trajectory goal, but usually accept the goal:

if (goal_handle->trajectory.joint_names.size() != joints_.size())
{
  RCLCPP_ERROR(rclcpp::get_logger(getName()),
               "Trajectory goal size does not match controlled joints size.");
  return rclcpp_action::GoalResponse::REJECT;
}

return rclcpp_action::GoalResponse::ACCEPT_AND_EXECUTE;

The handle_accepted callback then sets us up to use the goal. Once a goal is fully accepted at the end of the callback, I save a shared_ptr to the goal handle, which is needed to eventually call succeed or abort. The first step in the callback is therefore to check if there is an active_goal_, and preempt it:

bool preempted = false;
if (active_goal_)
{
  result->error_code = -6;
  result->error_string = "preempted";
  active_goal_->abort(result);
  active_goal_.reset();
  preempted = true;
}

There is not currently a “preempted” state like in ROS1 (issue), so we just call abort. The preempted flag here gets used later on to splice the new trajectory into the old one and use the current velocity of the joints.

Then we can grab the actual goal message, and process it just like we would in a ROS1 action callback:

const auto goal = goal_handle->get_goal();

// process goal->trajectory so
// update() can use it
sampler_ = ...

// at the end of the callback
active_goal_ = goal_handle;

Then the update() function, which is called at some fixed rate, processes the goal and eventually marks it as succeeded or aborted:

// Is trajectory active?
if (active_goal_ && sampler_)
{
  // sample and command joints

  // If we've violated the tolerances
  active_goal_->abort(result);
  active_goal_.reset();

  // If we've completed the trajectory
  active_goal_->succeed(result);
  active_goal_.reset();  
}

Asynchronous Services

In ROS1, services were based on XML-RPC. Calling a service was synchronous - if the service provider died while processing your call then your client would hang. While there are work arounds (such as creating a thread for your service client), APIs should be easy to use correctly and hard to use incorrectly. For this reason, I often used actions in ROS1 where a service would have otherwise sufficed.

With ROS2, services can be called asynchronously. The ControllerManager exposes an API to start, stop, and load controllers. For ROS2, I converted this to a service. You can see the (commit) which does this conversion, as well as porting all of the Python scripts that interact with the service. This greatly simplifies the code - converting three callbacks into a single one with a significantly cleaner function signature.

ROS2 services are well documented with tutorials for both C++ and Python so I won’t go into any more detail here.

Using TF2

The TF library was one of the original hits in ROS1, making it easy to track where all your robot bits are located and transform data between different coordinate frames. ROS2 includes the TF2 library. The documentation on how to use it is currently quite sparse, at some point I’m planning to add to the tutorial, but in the meantime, here’s a quick highlight (in C++):

As with later ROS1 releases, you’ll want to depend on the tf2_ros package, and likely you want to depend on the tf2_geometry_msgs and tf2_sensor_msgs packages if you are converting those data types. Similarly, you still need a Buffer and a TransformListener:

std::shared_ptr<tf2_ros::Buffer> tf_buffer;
std::shared_ptr<tf2_ros::TransformListener> tf_listener;

The big change from ROS1 is the buffer now takes a clock in the constructor, so you need access to an rclcpp::Node instance:

tf_buffer.reset(new tf2_ros::Buffer(node.get_clock()));
tf_listener.reset(new tf2_ros::TransformListener(*tf_buffer_));

From that point, it is pretty much just like ROS1 to transform things (see a real example in robot_controllers/src/point_head.cpp):

#include <tf2_geometry_msgs/tf2_geometry_msgs.h>

geometry_msg::msgs::PointStamped in = <blah>;
geometry_msg::msgs::PointStamped out;

try
{
  tf_buffer->transform(in, out, "target_frame");
}
catch (const tf2::TransformException& ex)
{
  RCLCPP_ERROR(node.get_logger(), "Could not transform point.");
}

The broadcaster works pretty much identically to ROS1, other than the message renaming and the constructor taking an rclcpp::Node::SharedPtr:

std::unique_ptr<tf2_ros::TransformBroadcaster> broadcaster;

broadcaster = std::make_unique<tf2_ros::TransformBroadcaster>(nodePtr);

geometry_msgs::msg::TransformStamped transform;
transform.header.stamp = node->now();
transform.header.frame_id = "odom";
transform.child_frame_id = "base_link";
// Fill in transform.transform.translation
// Fill in transform.transform.rotation

broadcaster->sendTransform(transform);

You can find the full/real code in robot_controllers/src/diff_drive_base.cpp.

Next Steps

I’m currently wrapping up making the head camera work. That will be the subject of my next post in a few days.