Difference between revisions of "F-Scores and Accuracy"

Revision as of 18:35, 6 June 2014

In Eyewire you are given an accuracy rating based on your F-score. F-scores are a statistical method for determining accuracy accounting for both precision and recall. Or more simply put F-scores are how HQ determines your accuracy based on what was added and what was missed. The formula for the traditional F-score is:

Error creating thumbnail: Unable to save thumbnail to destination

Before we can calculate the final F-score first we must calculate your individual precision and recall. When a player does a cube there are four possible outcomes for every segment in that cube: a true positive result, a false positive result, a false negative result and a true negative result. A true positive (tp) result is when a player adds a segment that should be added. A false positive (fp) is when a player adds a segment that should not be added. A false negative (fn) is when a player misses a segment they should have added. A true negative (tn) is when a player correctly leaves out a segment that does not belong. In the figure below you can see an example of false negative and of false positive.

To the left is an example of a branch submitted by a player. In this example the red and the green segments are what the player submitted, while the purple segment was left out.

The red segment here is a false positive and the purple segment is a false negative. The player mistakenly added the red segment when they should have added the purple segment instead. The green segment is correct.

This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula:

Error creating thumbnail: Unable to save thumbnail to destination

Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula:

Error creating thumbnail: Unable to save thumbnail to destination

Now we would take the results from both of those formulas and plug them into the formula above to get a player’s F-score. Another way to look at it is we take the harmonic mean of a player’s precision and recall to get their overall accuracy rating.

How Accurate are F-Scores?

One question we a get a lot is how do we know what is correct and what isn’t? What is correct is determined by combining the GrimReaper’s corrections with the Eyewirer consensus. If a cube does not have a GrimReaper correction we just use the EyeWirer consensus. Eyewire consensuses have proven to be quite accurate. However, there is still a small chance that a consensus may contain a wrong piece. This means that F-scores cannot prove user accuracy 100% of the time. However, they are accurate enough that we feel confident using them as a player guide.

Revision as of 18:34, 6 June 2014 (view source) DannyS (Talk \| contribs) ← Older edit		Revision as of 18:35, 6 June 2014 (view source) DannyS (Talk \| contribs) Newer edit →
Line 12:		Line 12:


−	This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula: [[File:Precisionformula.png\|~~right~~]]	+	This brings us to precision; precision is how much of a volume was added correctly. For example if Player A has a precision 0.9221 that means about 92% of what Player A added was correct and about 8% of what Player A added should not have been added. To determine a player’s precision we use their true positive (tp) results, correctly added, and their false positive (fp) results, incorrectly added, in this formula: [[File:Precisionformula.png\|center]]



−	Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula:	+	Recall measures how much of the volume was missed. Let’s say Player A has a recall of 0.9409. That means that Player A missed about 6% of the correct segments in the cubes Player A worked on. To determine a player’s recall we use their true positive (tp) results, correctly added, and false negative (fn) results, incorrectly missed, in this formula: [[File:Recallformula.png\|center]]

Difference between revisions of "F-Scores and Accuracy"

Revision as of 18:35, 6 June 2014

How Accurate are F-Scores?

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools