Difference between revisions of "F-Scores and Accuracy"

From Eyewire
Jump to: navigation, search
m
(Marked this version for translation)
 
Line 1: Line 1:
 
<translate>
 
<translate>
 +
<!--T:1-->
 
In Eyewire you are given an accuracy rating based on your F-score.  F-scores are a statistical method for determining accuracy accounting for both precision and recall.  Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:
 
In Eyewire you are given an accuracy rating based on your F-score.  F-scores are a statistical method for determining accuracy accounting for both precision and recall.  Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:
  
 +
<!--T:2-->
 
[[File:F_score_calculation.png‎|center]]  
 
[[File:F_score_calculation.png‎|center]]  
  
 +
<!--T:3-->
 
Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added.  A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.   
 
Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added.  A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.   
  
  
 +
<!--T:4-->
 
[[File:NewFScoreEyeWire.png|150px|left]] To the left is an example of a branch submitted by a player.  In this example the red and the green segments are what the player submitted, while the purple segment was left out.  
 
[[File:NewFScoreEyeWire.png|150px|left]] To the left is an example of a branch submitted by a player.  In this example the red and the green segments are what the player submitted, while the purple segment was left out.  
  
  
 +
<!--T:5-->
 
The red segment here is a false positive and the purple segment is a false negative.  The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.
 
The red segment here is a false positive and the purple segment is a false negative.  The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.
  
  
 +
<!--T:6-->
 
This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula: [[File:Precisionformula.png|center]]
 
This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula: [[File:Precisionformula.png|center]]
  
  
  
 +
<!--T:7-->
 
Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula: [[File:Recallformula.png|center]]
 
Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula: [[File:Recallformula.png|center]]
  
  
 +
<!--T:8-->
 
Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score.  Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.
 
Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score.  Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.
  
===How Accurate are F-Scores?===
+
===How Accurate are F-Scores?=== <!--T:9-->
  
 +
<!--T:10-->
 
One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the [[GrimReaper]]’s corrections with the EyeWirer [[The Consensus|consensus]]. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. EyeWire consensuses have proven to be quite accurate.  However, there is still a small chance that a consensus may contain a wrong piece.  This means that F-scores cannot prove user accuracy 100% of the time.  However, they are accurate enough that we feel confident using them as a player guide.
 
One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the [[GrimReaper]]’s corrections with the EyeWirer [[The Consensus|consensus]]. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. EyeWire consensuses have proven to be quite accurate.  However, there is still a small chance that a consensus may contain a wrong piece.  This means that F-scores cannot prove user accuracy 100% of the time.  However, they are accurate enough that we feel confident using them as a player guide.
 
</translate>
 
</translate>

Latest revision as of 16:03, 17 November 2015

In Eyewire you are given an accuracy rating based on your F-score. F-scores are a statistical method for determining accuracy accounting for both precision and recall. Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:

F score calculation.png

Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added. A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.


NewFScoreEyeWire.png
To the left is an example of a branch submitted by a player. In this example the red and the green segments are what the player submitted, while the purple segment was left out.


The red segment here is a false positive and the purple segment is a false negative. The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.


This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula:
Precisionformula.png


Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula:
Recallformula.png


Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score. Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.

How Accurate are F-Scores?

One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the GrimReaper’s corrections with the EyeWirer consensus. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. EyeWire consensuses have proven to be quite accurate. However, there is still a small chance that a consensus may contain a wrong piece. This means that F-scores cannot prove user accuracy 100% of the time. However, they are accurate enough that we feel confident using them as a player guide.