The DataFrame.mean() function in Python pandas is used to calculate averages across one or more axes of a DataFrame. Pandas mean() is essential for analyzing numerical data. In addition to computing average values, it also offers insights on the distribution of data.

Web Hosting
Fast, scalable hosting for any website
  • 99.9% uptime
  • PHP 8.3 with JIT compiler
  • SSL, DDoS protection, and backups

What is the syntax for DataFrame.mean()?

The pandas mean() function accepts up to three parameters and has the following syntax:

DataFrame.mean(axis=None, skipna=True, numeric_only=None)
python

What parameters can be used with pandas Dataframe.mean?

You can use different parameters to customize how pandas DataFrame.mean() works.

Parameter Description Default Value
axis Specifies whether the calculation is done over rows (axis=0) or columns (axis=1) 0
skipna If set to True, NaN values will be ignored True
numeric_only If set to True, only numeric data types will be included in the calculation False

How to use pandas mean()

You can apply the pandas DataFrame.mean() function to both columns and rows.

Calculating average values for columns

First, we’re going to create a pandas DataFrame with some numerical data:

import pandas as pd
data = {
    'A': [1, 2, 3, 4],
    'B': [4, 5, 6, 7],
    'C': [7, 8, 9, 10]
}
df = pd.DataFrame(data)
print(df)
python

The resulting DataFrame looks like this:

A  B    C
0  1  4    7
1  2  5    8
2  3  6    9
3  4  7  10

To calculate the average of each column, you can use the pandas mean() function. By default, the axis parameter is set to 0, which corresponds to columns.

column_means = df.mean()
print(column_means)
python

The code above calculates the mean for each column (A, B and C) by finding the sum of the elements in the respective column and then dividing it by the number of elements in the column. The result is the following pandas Series:

A    2.5
B    5.5
C    8.5
dtype: float64

Calculating average values for rows

If you want to find the average for rows, simply set the axis parameter to 1:

row_means = df.mean(axis=1)
print(row_means)
python

Pandas mean() calculates row averages by dividing the sum of elements in a row by the number of elements it has. Calling the function above produces the following output:

0    4.0
1    5.0
2    6.0
3    7.0
dtype: float64

Handling NaN values

In this example, we’ll use a different DataFrame, which contains NaN values:

import pandas as pd
import numpy as np
data = {
    'A': [1, 2, np.nan, 4],
    'B': [4, np.nan, 6, 7],
    'C': [7, 8, 9, np.nan]
}
df = pd.DataFrame(data)
print(df)
python

The code above produces the following DataFrame:

A    B    C
0  1.0  4.0  7.0
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  4.0  7.0  NaN

When calculating the averages for columns, the skipna parameter determines whether NaN values should be included or ignored. By default, skipna is set to True, so df.mean() automatically ignores NaN values. If you want to include NaN values, you need to add skipna=False as a parameter. Doing so will cause any column with at least one NaN to return NaN as its mean.

mean_with_nan = df.mean() 
print(mean_with_nan)
python

Calling df.mean() produces the following output:

A    2.333333
B    5.666667
C    8.000000
dtype: float64
Was this article helpful?
Go to Main Menu