Select random rows from PostgreSQL

In PostgreSQL, In order to select random rows from a table one can use TABLESAMPLE clause or the RANDOM() function in combination with the ORDER BY and LIMIT clauses. Let’s see how to

  • Get the random rows from PostgreSQL using RANDOM() function.
  • Get the random rows from PostgreSQL using TABLESAMPLE clause
  • Random N percent of the rows from Postgresql Table
  • Random N rows from Postgresql Table

 

Method 1: Using the TABLESAMPLE Clause

The TABLESAMPLE clause allows you to retrieve a sample of rows from a table. It is important to note that TABLESAMPLE is supported in PostgreSQL versions 9.5 and later.

 

SELECT * FROM your_table TABLESAMPLE BERNOULLI (percentage);
  • BERNOULLI (percentage): Specifies the percentage of the table to be sampled.

 

We will be using Student_detail table.

Select random rows from PostgreSQL 1

For example, to select approximately 30% of the rows from the ”student_detail” table:

 

SELECT *
FROM student_detail
TABLESAMPLE BERNOULLI (30);

Output:

Select random rows from PostgreSQL 2

 

 

Method 2: Using the RANDOM() Function

The RANDOM() function generates a random number between 0 and 1. By combining this with ORDER BY and LIMIT, you can select a specified number of random rows.

 


SELECT * FROM your_table ORDER BY RANDOM() LIMIT number_of_rows;

 

We will be using Student_detail table.

Select random rows from PostgreSQL 3

 

Example 1 : Select Random N Number of rows in PostgreSQL:

For example, to select 5 random rows from the student_detail table:

 

SELECT *
FROM student_detail
ORDER BY RANDOM()
LIMIT 5;

Output:

Select random rows from PostgreSQL 4

 

 

Example 2 : Select Random N Percentage of rows in PostgreSQL:

 

select * from student_detail where random() < 0.7

We will be generating random numbers between 0 and 1, then will be selecting with rows less than 0.7. So the resultant table will be

Output:

Select random rows from PostgreSQL 5

 

Performance Comparison of both methods

  • ORDER BY RANDOM(): This method is straightforward but can be inefficient for large tables because it requires generating a random number for each row and then sorting the entire table.
  • TABLESAMPLE: This method can be more efficient for large tables as it retrieves a sample of rows directly from the storage engine without the need to sort the entire table.

 

 

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts