A Real World Robotic Navigation Competition

Several of the EarthRover units used in the EarthRover Challenge 2024.

Clockwise from top left: Front camera view of an EarthRover unit in Francistown, Namibia; front camera view of an EarthRover unit in Port Louis, Mauritius; an EarthRover unit in Manila, Philippines; an EarthRover in Shenzen, China and an EarthRover unit in Wellington, New Zealand.

With the rise of generalist foundation models for robot navigation, new frontiers involving challenging open-world and open-vocabulary mobility scenarios are now of immense interest to the robotics community. Meanwhile, testing such models in real world "outside the lab" settings have remained outside the operational capabilities of most research labs.

Our competition aims to explore the possibility of globally distributed in-the-wild navigation evaluations at an unprecedented scale, along with the release of a substantial real-world navigation dataset collected from 10+ cities (>2k hours).

Leveraging a large fleet of outdoor navigation robots deployed across multiple cities, we aim to study whether state of the art open-world autonomous navigation models are able to effectively operate in truly open-world settings and how they fare against human tele-operated performance under the same environments.

Rules & Format

The EarthRover Challenge is a round-robin format competition, where AI teams and human gamers take turns to complete navigation missions in various seen and unseen environments.

Competition Rules

The Earth Rover Challenge is a distributed competition across remote environment scenarios spanning multiple cities (e.g., Abu Dhabi, Singapore, Taipei, Stockholm, etc.), where competition participants need to deploy their policies into realistic GPS goal-oriented navigation scenarios. This competition will test the robustness, generalization, and safety of navigation capabilities of robot foundation models.
‍

Navigation Missions

The goal of the competition is to remotely control small sidewalk robots in order to complete various navigation missions in outdoor urban environments across various cities in real time.

Each mission consists of a series of checkpoints that the robot needs to navigate through. A checkpoint is specified using GPS coordinates.

Every mission is given certain award points: 1-10 based on the difficulty, e.g., whether crossing roads, crowded space, and is considered “mission complete” when the robot returns to the end point of the given mission after registering at various checkpoints.

Example of a navigation mission in Wuhan, China (consisting of a sequence of GPS-defined checkpoints).

Competition Format

The competition will proceed in a round robin fashion where AI teams and human gamers will take turns to complete known and unknown navigation missions in various test locations/cities.

Concretely, there will be 2 robots at each location, with either robots controlled by AI team's model or a human gamer (it's also possible for both robots to be controlled by 2 AI team's models or 2 human gamers). Both robots will start at the same starting point, albeit with a few minutes difference in staggered starting time, while working towards completing the same navigation mission (with the same series of checkpoints).

Each mission will have a difficulty score ranging from 1 to 10. Example of a simple mission may include a short distance drive (eg. 100 meter) in a typically quiet park with wide sidewalk while a more difficult drive could involve a long drive (eg. 1 km) along crowded sidewalk while requiring crossing roads or even traveling directly on roads at times. It is also important to highlight that because this competition takes place in the wild, real world variability will naturally mean that the conditions of every drive may differ even for the same mission (eg. a typically quiet park getting crowded unexpectedly).

Successfully completing a given mission will earn the AI team or human gamer competition points that correspond to the difficulty score (ie. completing a level 1 mission will earn 1 point). Failure to complete a mission or needing any intervention by the "robot walker assistant" will mean the AI team or human gamer will not receive any point for that particular round/mission.At the end of the competition, the AI team or human gamer with the highest aggregate points will win the competition.

If there is a tie (in points), we will refer to the particular round where the 2 opponents faced each other in the same mission and whoever completed that mission sooner will be considered the ultimate winner.

Robot Platform

Each Earth Rover unit weighs less than 5 kg (11 lbs) and moves at a max speed of ~3 km/hr (~0.85 m/s). It is able to move forward/backwards, turn in-place and comes equipped with front/back cameras, 4G connection, GPS & IMU. It has limited edge computing and is meant to be 100% remotely controlled by human drivers or AI navigation models hosted on a remote server (e.g., your in-house compute, or the cloud).

Every participating team will be given 2 Earth Rover units for testing locally as well as up to 20 hours test time per week (in the coming months leading to the actual competition at ICRA) with robots deployed remotely around the world (along with human operators who will follow closely behind the robots to provide real-time operational support to the participating teams).

‍

FrodoBot unit & human drivers’ POV

Navigation Models Deployment

Competing teams shall host their own models in their own compute facilities while remotely accessing the assigned robots via a standard Remote Access SDK (GitHub Repo here). Effectively, AI team's model will receive video stream from the front camera of the robot while it can also send through a control data stream to the robot. In addition, GPS location of the robot, as well as other specific info related to a given navigation mission (eg. GPS of the next checkpoint, neighborhood map) will also be provided.

Dataset

FrodoBots has also open-sourced a significant dataset of human tele-operated drives collected from 10+ cities (>2k hours), which the teams can opt to adopt as part of their models training pipeline. There are 7 types of data that are associated with a typical Earth Rovers drive, as follows:

Control data
Gamer's control inputs captured at a frequency of 10Hz (Ideal) as well as the RPM (revolutions per minute) readings for each of the 4 wheels on the robot.
GPS data
‍Latitude, longitude, and timestamp info collected during the robot drives at a frequency of 1Hz.
IMU (Inertial Measurement Unit) data
9-DOF sensor data, including acceleration (captured at 100Hz), gyroscope (captured at 1Hz), and magnetometer info (captured at 1Hz), along with timestamp data.
Rear camera video
Video footage captured by the robot's rear-facing camera at a typical frame rate of 20 FPS with a resolution of 540x360.
Front camera video
Video footage captured by the robot's front-facing camera at a typical frame rate of 20 FPS with a resolution of 1024x576.
Microphone
Audio recordings captured by the robot's microphone, with a sample rate of 16000Hz, channel 1.
Speaker
Audio recordings of the robot's speaker output (ie. gamer's microphone), also with a sample rate of 16000Hz, channel

More information about the currently released dataset can be found here: https://huggingface.co/datasets/frodobots/FrodoBots-2K

Models Testing Phase

In the months leading up to the actual competition, participants can test out their models on open-world sites with pre-defined navigation missions (“seen environments”). During the actual competition, participants will attempt to complete navigation missions in both seen and unseen environments across at least 4 open-world sites.

Observation space
‍The robot will have access to a front-facing camera view that will be updated at roughly 20 Hz, depending on the network connection. Depending on network conditions, the latency of the streaming data will be around 500 milliseconds.
Action space
‍The robot will be able to move forwards and backwards, or turn left and right. More details on these actions can be found in the documentation of the Remote Access SDK.
Success criteria
The robot is deemed to have successfully reached the next checkpoint if it comes within 15 meters of that point, allowing for the tolerance of noisy GPS data.
Operation Support
In the months leading up to the competition date, we will provide locations across multiple citis (e.g., parks, campuses, public sidewalks) for teams to test their model in the real world, remotely. Teams can expect up to 20 hours per week of testing time, and a human "bot walker" will be following their robots to provide real-time ops support. The actual challenge at the conference will be similar, except new locations that the teams have not seen will be added.

Human Performance Benchmark

During the competition, 5 human drivers (winners selected from a humans-only tournaments held before The Earth Rover Challenge) will also attempt to complete the same missions alongside robots controlled by other participants’ models. The human drivers are subject to the same conditions of “seen” vs “unseen” environments. This will help to form the human performance benchmark.

EarthRover Challenge 2024

The first iteration of the EarthRover Challenge was held at the 2024 edition of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) in Abu Dhabi.

Testimonials

The participants and organizers of the first EarthRover Challenge at IROS 24 in Abu Dhabi.

"To really try and test robots in diverse locations, in all kinds of conditions... out there interacting with humans, interacting with pedestrians... The competition itself is over but we've built something quite cool, and we want to keep doing this and working on it."
‍

Joel Loo

National University of Singapore - AI Team

"The start of research is finding problems and this challenge helped us find many problems across our approach. We found many new research challenges and it was a great opportunity... there's no other competition like this that let's you test your models in the real world, including both urban and rural environments."

Hyung-Suk Yoon

Seoul National University -
AI Team

“I'd never seen a model that was tested in Manila, that was also tested in Africa before... but when you're forced to deploy something in the real world, it really makes you cut out all the fat... figure out what aspects you actually need to get right. ”

‍

Arthur Zhang

University of Texas at Austin - AI Team

Results

The field comprised 10 participants; 7 human gamers and 3 AI teams. A total of 8 rounds were held, across 8 cities: Wuhan and LiuZhou in China, Abu Dhabi in the UAE, Singapore, King'Ong'O and Kisumu in Kenya. Manila in the Philippines and Port Louis in Mauritius.

The location in Abu Dhabi was the only unseen location, whereas the 7 locations were seen environments in which both the human and AI participants had completed practice rounds before start of the competition.

The 1st Earth Rover Challenge (3 AI teams vs 7 Human Gamers taking on real-world robotic navigation missions in 8 cities) has finally ended! Humanity prevailed (for now)!

Here's what went down in the 2-day competition held at #iros2024 - 1/n 🧵: pic.twitter.com/OYw6yBXC1F
— FrodoBots (@frodobots) October 27, 2024

Filipino gamer Masterchi was the overall winner in the field of 10 participants, completing all missions for the maximum score of 42, with a cumulative time of 1 hour, 24 minutes and 27 seconds. Seoul National University (SNU) finished as the top AI team, with a score of 15.16 and a cumulative time of 4 hours, 28 minutes and 45 seconds.

The final results table of the first EarthRover Challenge.

The EarthRover Challenge