r - Remove rows with all or some NAs (missing values) in data.frame

ID : 1507

viewed : 114

Tags : rdataframefiltermissing-datar-faqr





Top 5 Answer for r - Remove rows with all or some NAs (missing values) in data.frame

vote vote

94

Also check complete.cases :

> final[complete.cases(final), ]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 6 ENSG00000221312    0    1    2    3    2 

na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

> final[complete.cases(final[ , 5:6]),]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 4 ENSG00000207604    0   NA   NA    1    2 6 ENSG00000221312    0    1    2    3    2 

Your solution can't work. If you insist on using is.na, then you have to do something like:

> final[rowSums(is.na(final[ , 5:6])) == 0, ]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 4 ENSG00000207604    0   NA   NA    1    2 6 ENSG00000221312    0    1    2    3    2 

but using complete.cases is quite a lot more clear, and faster.

vote vote

88

Try na.omit(your.data.frame). As for the second question, try posting it as another question (for clarity).

vote vote

78

tidyr has a new function drop_na:

library(tidyr) df %>% drop_na() #              gene hsap mmul mmus rnor cfam # 2 ENSG00000199674    0    2    2    2    2 # 6 ENSG00000221312    0    1    2    3    2 df %>% drop_na(rnor, cfam) #              gene hsap mmul mmus rnor cfam # 2 ENSG00000199674    0    2    2    2    2 # 4 ENSG00000207604    0   NA   NA    1    2 # 6 ENSG00000221312    0    1    2    3    2 
vote vote

69

I prefer following way to check whether rows contain any NAs:

row.has.na <- apply(final, 1, function(x){any(is.na(x))}) 

This returns logical vector with values denoting whether there is any NA in a row. You can use it to see how many rows you'll have to drop:

sum(row.has.na) 

and eventually drop them

final.filtered <- final[!row.has.na,] 

For filtering rows with certain part of NAs it becomes a little trickier (for example, you can feed 'final[,5:6]' to 'apply'). Generally, Joris Meys' solution seems to be more elegant.

vote vote

50

If you want control over how many NAs are valid for each row, try this function. For many survey data sets, too many blank question responses can ruin the results. So they are deleted after a certain threshold. This function will allow you to choose how many NAs the row can have before it's deleted:

delete.na <- function(DF, n=0) {   DF[rowSums(is.na(DF)) <= n,] } 

By default, it will eliminate all NAs:

delete.na(final)              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 6 ENSG00000221312    0    1    2    3    2 

Or specify the maximum number of NAs allowed:

delete.na(final, 2)              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 4 ENSG00000207604    0   NA   NA    1    2 6 ENSG00000221312    0    1    2    3    2 

Top 3 video Explaining r - Remove rows with all or some NAs (missing values) in data.frame







Related QUESTION?