Difference between revisions of "F-Scores and Accuracy"

From Eyewire
Jump to: navigation, search
(Marked this version for translation)
 
(31 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[[File:f-score-for-accuracy.jpg|300px|thumb|left]]
+
<translate>
 
+
<!--T:1-->
 
In Eyewire you are given an accuracy rating based on your F-score.  F-scores are a statistical method for determining accuracy accounting for both precision and recall.  Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:
 
In Eyewire you are given an accuracy rating based on your F-score.  F-scores are a statistical method for determining accuracy accounting for both precision and recall.  Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:
  
[[File:fscoreformula.png|300px|thumb|left]]
+
<!--T:2-->
 +
[[File:F_score_calculation.png‎|center]]  
  
 +
<!--T:3-->
 
Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added.  A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.   
 
Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added.  A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.   
  
Here is an example of a branch submitted by a player.  In this example the red and the yellow segments are what the player submitted, while the green segment was left out.
 
  
[[File:New_tp_fn_fp-300x242.png‎|300px|thumb|left]]  
+
<!--T:4-->
 +
[[File:NewFScoreEyeWire.png|150px|left]] To the left is an example of a branch submitted by a player.  In this example the red and the green segments are what the player submitted, while the purple segment was left out.
  
The red segment here is a false positive and the green segment is a false negative.  The player mistakenly added the red segment when they should have added the green segment instead.
 
  
 +
<!--T:5-->
 +
The red segment here is a false positive and the purple segment is a false negative.  The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.
  
This brings us to precision, or how much of a given volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula:
 
  
[[File: Precisionformula.png|300px|thumb|left]]
+
<!--T:6-->
 +
This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula: [[File:Precisionformula.png|center]]
  
Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula:
 
  
[[File:recallformula.png|300px|thumb|left]]
 
  
 +
<!--T:7-->
 +
Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula: [[File:Recallformula.png|center]]
 +
 +
 +
<!--T:8-->
 
Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score.  Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.
 
Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score.  Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.
  
One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the GrimReaper’s corrections with the Eyewirer consensus. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. Eyewire consensuses have proven to be quite accurate.  However, there is still a small chance that a consensus may contain a wrong piece.  This means that F-scores cannot prove user accuracy 100% of the time.  However, they are accurate enough that we feel confident using them as a player guide.
+
===How Accurate are F-Scores?=== <!--T:9-->
 +
 
 +
<!--T:10-->
 +
One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the [[GrimReaper]]’s corrections with the EyeWirer [[The Consensus|consensus]]. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. EyeWire consensuses have proven to be quite accurate.  However, there is still a small chance that a consensus may contain a wrong piece.  This means that F-scores cannot prove user accuracy 100% of the time.  However, they are accurate enough that we feel confident using them as a player guide.
 +
</translate>

Latest revision as of 16:03, 17 November 2015

In Eyewire you are given an accuracy rating based on your F-score. F-scores are a statistical method for determining accuracy accounting for both precision and recall. Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:

Error creating thumbnail: Unable to save thumbnail to destination

Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added. A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.


NewFScoreEyeWire.png
To the left is an example of a branch submitted by a player. In this example the red and the green segments are what the player submitted, while the purple segment was left out.


The red segment here is a false positive and the purple segment is a false negative. The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.


This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula:
Error creating thumbnail: Unable to save thumbnail to destination


Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula:
Error creating thumbnail: Unable to save thumbnail to destination


Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score. Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.

How Accurate are F-Scores?

One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the GrimReaper’s corrections with the EyeWirer consensus. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. EyeWire consensuses have proven to be quite accurate. However, there is still a small chance that a consensus may contain a wrong piece. This means that F-scores cannot prove user accuracy 100% of the time. However, they are accurate enough that we feel confident using them as a player guide.