Factors in R

Factors in R:

Factors in R is a data type used to store categorical variable, which can be either numeric or string. Most important advantage to convert integer or character to factor is, they can be used in statistical modeling where they will be implemented correctly. factor() is the function used to convert numerical or character variable to factor

We can also change the order of the factor according to our desire. Add an optional parameter called Levels within that create a vector of your desire order and add an another optional parameter ordered=TRUE.

Now lets look at the example

# Factors in R: Convert to Factor without order.
data = c(4,5,5,4,4,3,5,4,5,4,5,4,3,3)
fdata = factor(data) # converting numeric to factor
print(fdata)


# Factors in R: Convert to Factor with desired order.
week=c("sunday","monday","tuesday","wednesday","thursday","friday","saturday","wednesday","tuesday","thursday","wednesday")
table(week)

week_ordered=factor(week,levels=c("sunday","monday","tuesday","wednesday","thursday","friday","saturday"),ordered=TRUE)
table(week_ordered)

When we execute the above code, it produces the following result −

[1] 4 5 5 4 4 3 5 4 5 4 5 4 3 3

Levels: 3 4 5

week

friday    monday  saturday    sunday  thursday   tuesday wednesday

1               1               1               1               2                2              3

week_ordered

sunday    monday   tuesday wednesday  thursday    friday  saturday

1                1                2                3                2                1            1

In the above output you can see the difference between week (which has no order) and week_ordered (which has specific order)

 

Generating Factor Levels:

We can generate factor levels by using the gl() function. It takes two integers as input one indicates number of levels and other indicates the number of repetition.

Syntax:

gl(n, k, labels)

Following is the description of the parameters used:

  • n is a integer giving the number of levels.
  • k is a integer giving the number of replications.
  • labels is a vector of labels for the resulting factor levels

Example:

v <- gl(3, 4, labels = c("India", "USA", "Russia"))
print(v)

When we execute the above code, it produces the following result

Result:

[1] India  India  India  India  USA    USA    USA    USA    Russia Russia

Russia Russia

Levels: India USA Russia

previous-small factor in Rnext_small factor in R

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.