All About Data Frame in R – R Data frame Explained

A Data frame is a list of vectors of equal length. Data frame in R is used for storing data tables.

Following are the characteristics of a data frame.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain same number of data items.

Create Data Frame in R:

Following variable student_df is a data frame containing two vectors subject and percentage

# create data frame in R.
subject=c("English","Maths","Chemistry","Physics") # vector1 named as subject
percentage =c(80,100,85,95) # vector2 named as percentage
students_df=data.frame(subject,percentage) # Vector1 and vector2 together as dataframe

When we execute the above code, it produces the following result −

     subject     percentage

1   English         80

2   Maths           100

3   Chemistry    85

4   Physics         95

Which is in the form of a table.

Rename the columns of Data Frame in R: 

you can rename the columns of data frame in R. Just pass a vector of names to the names(df) function

  names(students_df)<-c("Course","Score")

When we execute the above code, subject is renamed with Course and percentage is renamed with Score and it produces the following result −

     Course    Score

1     English          80

2     Maths           100

3     Chemistry     85

4     Physics          95

 

Accessing elements of Data frame in R:

nrow(students_df) # number of rows in data frame
ncol(students_df) # number of columns in data frame.
dim(students_df) # Dimension of data frame
students_df[1,2] # Access first row and second column of the data frame
students_df[,1] # Access all the elements of the first column

When we execute the above code, it produces the following result −

[1] 4

[1] 2

[1] 4 2

[1] 80

[1] English   Maths     Chemistry Physics

Levels: Chemistry English Maths Physics

 

 

Working with data frame in Detail: Get the Structure of the Data Frame

The structure of the data frame in R can be seen by using str() function

str(students_df)

The above code produces the following output.

‘data.frame’:    4 obs. of  2 variables:

$ subject   : Factor w/ 4 levels “Chemistry”,”English”,..: 2 3 1 4

$ percentage: num  80 100 85 95

 

Summary of the data frame:

The statistical summary and nature of the data can be obtained by applying summary() function.

  summary(students_df)

The above code produces the following output.

subject                  percentage

Chemistry:1           Min.   : 80.00

English  :1              1st Qu.: 83.75

Maths    :1              Median : 90.00

Physics  :1              Mean   : 90.00

‘                                3rd Qu.: 96.25

‘                                Max.   :100.00

 

Extract Data from Data Frame in R:

We can Extract specific column from a data frame in R using column name. This can be done with the “$” Symbol. Dataframe_name$Column_name will give you the value of the specific column

it will be explained with employee table

# Create the data frame.
emp.data <- data.frame(emp_id = c (1:5),
emp_name = c("john","marsh","mitchell","lara","peter"),
salary = c(6213,1515,4113,3729,2843),
joining_date = as.Date(c("2012-12-01","2014-07-23","2012-11-15","2015-05-11","2016-03-27")),
stringsAsFactors=FALSE )

Table named emp.data is created with columns emp_id, emp_name, salary and joining_date. The above code produces following data frame

 

    emp_id     emp_name       salary       joining_date

1      1             john                  6213         2012-12-01

2      2             marsh              1515          2014-07-23

3      3             mitchell            4113          2012-11-15

4      4             lara                   3729           2015-05-11

5      5             peter                 2843          2016-03-27

As explained earlier Dataframe_name$Column_name will give you the value of the specific column

emp.data$emp_name

will produce the following output, that is list of employee names

[1] “john”     “marsh”    “mitchell”  “lara”     “peter”

 

Following code extracts the employee name and salary from “emp.data” data frame and creates a separate data frame named “result”

# get the column elements of data frame in R and create a new data frame

result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

Above code will produce the following output

      emp.data.emp_name  emp.data.salary

1              john                        6213

2             marsh                      1515

3          mitchell                      4113

4              lara                         3729

5             peter                       2843

 

Now lets Extract the first two rows and then all columns using the syntax dataframe_name[Row,Column] . If the column value left empty it will select all the Column.

# extract first two rows with all the columns

result <- emp.data[1:2,]
print(result)

Above code will produce the following output

emp_id emp_name salary joining_date

1             john            6213   2012-12-01

2             marsh        1515   2014-07-23

 

Now lets extract first  3 columns with all the rows so we have to use the same syntax dataframe.name[Row,Column] and row value should be left empty

# extract first 3 columns with all the rows

result <- emp.data[,1:3]
print(result)

Above code will produce the following output.  Which has all the rows and first three Columns

     emp_id emp_name  salary

1      1        john             6213

2      2       marsh          1515

3      3       mitchell       4113

4      4        lara             3729

5      5       peter           2843

 

Now lets Extract 3rd and 5th row with 2nd and 4th column

# Extract 3rd and 5th row with 2nd and 4th column of data frame in R.
result <- emp.data[c(3,5),c(2,4)]
print(result)

Above code will produce the following output

   emp_name joining_date

3   mitchell     2012-11-15

5   peter          2016-03-27

 

 

Row bind and column bind operation on data frame in R :

A data frame in R can be expanded by adding columns and rows. Following example is Just to add the column vector using a new column name. A vector is passed to data frame to add the column in data frame with “$” symbol in below example.

# Add the "dept" coulmn.

emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

 

Above code will produce the following output

    emp_id  emp_name  salary  joining_date   dept

1      1         john            6213   2012-12-01         IT

2      2        marsh         1515   2014-07-23 Operations

3      3        mitchell      4113   2012-11-15         IT

4      4        lara              3729   2015-05-11         HR

5      5        peter           2843   2016-03-27    Finance

 

Alternatively we can also use Row bind rbind() function and column bind cbind() function to add the Row and Column to the data frame. We can join multiple vectors to create a data frame using the cbind() function. Also we can merge two data frames using rbind() function.

# create a vector called designation

designation <- c ("Entry level","Manager","Technical specialist","Entry level","Senior Level")
emp.table<-cbind(emp.data,designation)
print(emp.table)

The above code uses Column bind function cbind() which binds the vector named designation with the data frame emp.data

Which produces the following result

      emp_id emp_name salary joining_date     dept          designation

1      1          john             6213   2012-12-01        IT          Entry level

2      2          marsh         1515   2014-07-23 Operations    Manager

3      3          mitchell       4113   2012-11-15        IT     Technical specialist

4      4          lara              3729    2015-05-11        HR          Entry level

5      5          peter           2843    2016-03-27    Finance       Senior Level

Similarly  Row bind function  rbind() binds two data frame into single data frame. Prerequisite is both the data frame should have same number of columns and should have same column names

lets look at the example for Row bind function rbind().

# Create the second data frame. 
 emp.table.new <- data.frame(
 emp_id = c (6:8),
 emp_name = c("lila","raj","tera"),
 salary = c(2413,5415,6413),
 joining_date = as.Date(c("2014-03-01","2015-07-23","2016-11-15")),
 dept=c("IT","Finance","Operartions"),
 designation = c("Manager","Entry level","Senior Level"),
 stringsAsFactors=FALSE
)

# Binds the two data frames.
emp.table.final<- rbind(emp.table,emp.table.new)
print(emp.table.final)

The above code uses row bind function rbind() which binds the data frame named emp.table.new with the data frame emp.table which produces an another data frame emp.table.final

Which produces the following result

    emp_id emp_name salary joining_date        dept          designation

1      1         john          6213    2012-12-01          IT             Entry level

2      2         marsh      1515     2014-07-23  Operations    Manager

3      3         mitchell    4113     2012-11-15          IT            Technical specialist

4      4         lara           3729     2015-05-11          HR           Entry level

5      5         peter        2843     2016-03-27     Finance        Senior Level

6      6          lila            2413     2014-03-01          IT              Manager

7      7          raj             5415     2015-07-23     Finance        Entry level

8      8          tera          6413     2016-11-15   Operartions   Senior Level

 

previous-small                                                                                                           next_small data frame in R

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.