Estimator Evaluation Methods
General Methodology
The goal of our evaluation is to ensure fair comparison to other methods and our own. In doing so, we wish to provide insight into why our method does better and in what ways (as no method will outperform in all aspects). We additionally need to account for the inherent randomness of the methods and thus should treat each estimated trajectory as a random output. The key steps that we follow are the following:
Collection
The first step in any evaluation is to first collect the estimated trajectory of the proposed systems. Since we are interested in robotic application of our estimators we want to record the estimate at the current timestep (as compared to a "smoothed" output or one that includes loop-closures from future timesteps). Within the ROS framework, this means that we just need to publish the current estimate at the current timestep. We recommend using the following ov_eval::
<node name="recorder_estimate" pkg="ov_eval" type="pose_to_file" output="screen"> <param name="topic" type="str" value="/ov_msckf/poseimu" /> <param name="topic_type" type="str" value="PoseWithCovarianceStamped" /> <param name="output" type="str" value="/home/user/data/traj_log.txt" /> </node>
To evaluate the computational load, we have a python script that leverages the psutil python package to record percent CPU and memory consumption.
<node name="recorder_timing" pkg="ov_eval" type="pid_ros.py" output="screen"> <param name="nodes" type="str" value="/run_subscribe_msckf" /> <param name="output" type="str" value="/home/user/data/time_log.txt" /> </node>
Transformation
We now need to ensure both our estimated trajectory and groundtruth are in the correct formats for us to read in. We have a nice helper script that will transform ASL / EuRoC groundtruth files to the correct format. By default the EuRoC groundtruth has the timestamp in nanoseconds and the quaternion is in an incorrect order. A user can either process all CSV files in a given folder, or just a specific one.
rosrun ov_eval format_convert folder/path/ rosrun ov_eval format_convert file.csv
In addition we have a specific folder structure that is assumed. We store trajectories by first their algorithm name and then a folder for each dataset this algorithm was run on. The folder names of the datasets need to match the groundtruth trajectory files which should be in their own separate folder. Please see the example recorded datasets for how to structure your folders.
truth/
dateset_name_1.txt
dateset_name_2.txt
algorithms/
open_vins/
dataset_name_1/
run1.txt
run2.txt
run3.txt
dataset_name_2/
run1.txt
run2.txt
run3.txt
okvis_stereo/
dataset_name_1/
run1.txt
run2.txt
run3.txt
dataset_name_2/
run1.txt
run2.txt
run3.txt
vins_mono/
dataset_name_1/
run1.txt
run2.txt
run3.txt
dataset_name_2/
run1.txt
run2.txt
run3.txtProcessing & Plotting
Now that we have our data recorded and in the correct format we can now work on processing and plotting it. In the next few sections we detail how to do this for absolute trajectory error, relative pose error, normalized estimation error squared, and bounded root mean squared error plots. We will first process the data into a set of output text files which a user can then use to plot the results in their program or language of choice.
Absolute Trajectory Error (ATE)
The Absolute Trajectory Error (ATE) is given by the simple difference between the estimated trajectory and groundtruth after it has been aligned so that it has minimal error. First the "best" transform between the groundtruth and estimate is computed, afterwhich the error is computed at every timestep and then averaged. We recommend reading Zhang and Scaramuzza [15] paper for details. For a given dataset with runs of the same algorithm with pose measurements, we can compute the following for an aligned estimated trajectory :
Relative Pose Error (RPE)
The Relative Pose Error (RPE) is calculated for segments of the dataset and allows for introspection of how localization solutions drift as the length of the trajectory increases. The other key advantage over ATE error is that it is less sensitive to jumps in estimation error due to sampling the trajectory over many smaller segments. This allows for a much fairer comparision of methods and is what we recommend all authors publish results for. We recommend reading Zhang and Scaramuzza [15] paper for details. We first define a set of segment lengths which we compute the relative error for. We can define the relative error for a trajectory split into segments of length as follows:
Root Mean Squared Error (RMSE)
When evaluating a system on a single dataset is the Root Mean Squared Error (RMSE) plots. This plots the RMSE at every timestep of the trajectory and thus can provide insight into timesteps where the estimation performance suffers. For a given dataset with runs of the same algorithm we can compute the following at each timestep :
Normalized Estimation Error Squared (NEES)
Normalized Estimation Error Squared (NEES) is a standard way to characterize if the estimator is being consistent or not. In general NEES is just the normalized error which should be the degrees of freedoms of the state variables. Thus in the case of position and orientation we should get a NEES of three at every timestep. To compute the average NEES for a dataset with runs of the same algorithm we can compute the following at each timestep :
Single Run Consistency
When looking at a single run and wish to see if the system is consistent it is interesting to look a its error in respect to its estimated uncertainty. Specifically we plot the error and the estimator bound. This provides insight into if the estimator is becoming over confident at certain timesteps. Note this is for each component of the state, thus we need to plot x,y,z and orientation independently. We can directly compute the error at timestep :