r - Drop unused factor levels in a subsetted data frame

ID : 3660

viewed : 93

Tags : rdataframer-factorr-faqr

Top 5 Answer for r - Drop unused factor levels in a subsetted data frame

vote vote

95

Since R version 2.12, there's a droplevels() function.

levels(droplevels(subdf$letters)) 
vote vote

83

All you should have to do is to apply factor() to your variable again after subsetting:

> subdf$letters [1] a b c Levels: a b c d e subdf$letters <- factor(subdf$letters) > subdf$letters [1] a b c Levels: a b c 

EDIT

From the factor page example:

factor(ff)      # drops the levels that do not occur 

For dropping levels from all factor columns in a dataframe, you can use:

subdf <- subset(df, numbers <= 3) subdf[] <- lapply(subdf, function(x) if(is.factor(x)) factor(x) else x) 
vote vote

73

If you don't want this behaviour, don't use factors, use character vectors instead. I think this makes more sense than patching things up afterwards. Try the following before loading your data with read.table or read.csv:

options(stringsAsFactors = FALSE) 

The disadvantage is that you're restricted to alphabetical ordering. (reorder is your friend for plots)

vote vote

67

It is a known issue, and one possible remedy is provided by drop.levels() in the gdata package where your example becomes

> drop.levels(subdf)   letters numbers 1       a       1 2       b       2 3       c       3 > levels(drop.levels(subdf)$letters) [1] "a" "b" "c" 

There is also the dropUnusedLevels function in the Hmisc package. However, it only works by altering the subset operator [ and is not applicable here.

As a corollary, a direct approach on a per-column basis is a simple as.factor(as.character(data)):

> levels(subdf$letters) [1] "a" "b" "c" "d" "e" > subdf$letters <- as.factor(as.character(subdf$letters)) > levels(subdf$letters) [1] "a" "b" "c" 
vote vote

50

Another way of doing the same but with dplyr

library(dplyr) subdf <- df %>% filter(numbers <= 3) %>% droplevels() str(subdf) 

Edit:

Also Works ! Thanks to agenis

subdf <- df %>% filter(numbers <= 3) %>% droplevels levels(subdf$letters) 

Top 3 video Explaining r - Drop unused factor levels in a subsetted data frame

Related QUESTION?