The Python pandas function DataFrame.describe() is used to generate a sta­tis­ti­cal summary of the numerical columns in a DataFrame. This summary includes key sta­tis­ti­cal metrics like mean, standard deviation, minimum, maximum and different per­centiles.

Web Hosting
Hosting that scales with your ambitions
  • Stay online with 99.99% uptime and robust security
  • Add per­for­mance with a click as traffic grows
  • Includes free domain, SSL, email, and 24/7 support

What is the syntax for pandas’ describe() function?

The basic syntax of describe() for DataFrames is simple. It looks like this:

DataFrame.describe(percentiles=None, include=None, exclude=None)
python

Important pa­ra­me­ters for pandas’ DataFrame.describe()

Using the following pa­ra­me­ters, you can adjust the output of describe():

Parameter De­scrip­tion Default value
percentiles Lists the per­centiles that should be included in the summary [.25, .5, .75]
include Specifies which data types to include in the de­scrip­tion; possible values are numpy.number, numpy.object, all or None None
exclude Specifies which data types to exclude from the de­scrip­tion; functions like the include parameter None
De­f­i­n­i­tion

Sta­tis­ti­cal per­centiles are values that divide a sorted dataset into equal parts, showing what per­cent­age of data points fall below a specific threshold. These include metrics like the median (50th per­centile), the 25th per­centile and the 75th per­centile. This in­for­ma­tion helps to provide a clearer picture of data dis­tri­b­u­tion.

Examples of how to use pandas describe()

If you need a quick overview of the key sta­tis­ti­cal metrics of a dataset, the pandas DataFrame.describe() function is extremely useful.

Example 1: Sta­tis­ti­cal summary of numerical data

In the following example, we take a look at the DataFrame df, which contains different types of sales data.

import pandas as pd
import numpy as np
# Example DataFrame with sales data
data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Quantity': [10, 20, 15, 5, 30],
    'Price': [100, 150, 200, 80, 120],
    'Revenue': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(data)
print(df)
python

Now, you can use pandas describe() to get a sta­tis­ti­cal summary of the numerical data in the columns:

summary = df.describe()
print(summary)
python

The output of the pandas DataFrame.describe() function is as follows:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
25%    10.000000  100.000000  1000.000000
50%    15.000000  120.000000  3000.000000
75%    20.000000  150.000000  3000.000000
max    30.000000  200.000000  3600.000000

The key metrics shown in the output are:

  • count: Number of non-NaN (Not a Number) entries
  • mean: Average of the values (also ac­ces­si­ble via DataFrame.mean())
  • std: Standard deviation of the values
  • min, 25%, 50%, 75%, max: Minimum, 25th per­centile, median (50th per­centile), 75th per­centile, and maximum values

Example 2: Cus­tomiz­ing per­centiles

You can customize the per­centiles in the pandas DataFrame.describe() output with the percentiles parameter:

# Statistical summary with custom percentiles
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)
python

This function call provides the following output:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
10%     7.000000   88.000000   640.000000
50%    15.000000  120.000000  3000.000000
90%    26.000000  180.000000  3360.000000
max    30.000000  200.000000  3600.000000

In the output, 10%, 50% and 90% are included instead of the standard per­centiles output in the previous example.

Go to Main Menu