Matt_Simpson comments on 2012 Survey Results - Less Wrong

80 Post author: Yvain 07 December 2012 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (640)

You are viewing a single comment's thread. Show more comments above.

Comment author: Matt_Simpson 29 November 2012 09:43:21PM *  6 points [-]

use "as.numeric(as.character(dat$IQTest))"

The IQtest data is stored as factor. A factor variable has a set of levels, numbered 1,2,3,... that are the variable can possibly take on and labels for those factors. as.numeric(X) returns the level numbers of X. as.character returns the labels of X. In the case that the labels are actually numbers (usually integers that R is interpreting as character labels for some reason), as.numeric(as.character(X)) will return the numeric values that R is interpreting as labels.

EDIT:

In this case, when no value for IQtest was reported, it was stored as " " instead of "", which made R think the variable contained character data which R defaults to treating as factors. The " "'s should all be NA's once it's converted properly.