Assessment

We will evaluate the accuracy of each submitted algorithm (in the form of deformation fields) based on segmentation labels. Specifically, the segmentation labels of each moving volume will be transformed by the deformation field, and the evaluation will be based on how well the deformed segmentation labels align with those of the corresponding fixed volume.

The following metrics will be used for assessing the quality of the alignment.

  • DSC: The Dice similarity coefficient (DSC) measures the overlap between two sets of segmentation labels.
  • DSC30: The 30th percentile of the DSC values over all segmentation labels will be used to assess algorithm robustness.
  • HD95: The Hausdorff distance measures the maximum distance between the 3D surfaces of corresponding segmentation labels. The 95th percentile distance (HD95) is used instead of the maximum for a robust score.  
  • SDlogJ: We will use the standard deviation of the logarithm of the Jacobian determinant of each deformation field to assess its smoothness.

Ranking  Mechanism: During the submission period, each algorithm's position on the leaderboard will be determined by its DSC score. To expedite and reduce the load of automatic evaluation, submissions will be evaluated at half resolution.

Once the submission period closes, all the submissions will be re-evaluated at full resolution, and their final ranks will be determined using the ranking mechanism described below and presented on this website separately from the leaderboard.

The final ranks for each species will be computed by combining all of the four metrics into a single ranking score. To find a participant's rank, significance ranks between algorithms will be computed. A significance rank of an algorithm represents the number of other algorithms that perform statistically significantly worse under a given metric (therefore, the larger the significance rank is, the better).