How to filter for distinct values with pandas DataFrame[].unique()
In Python pandas, you can use the unique()
function to identify unique values in a column of a DataFrame. This makes it easy to get a quick overview of the different values within your dataset.
- 99.9% uptime
- PHP 8.3 with JIT compiler
- SSL, DDoS protection, and backups
What is the syntax of pandas DataFrame[].unique()
?
The basic syntax for using pandas unique()
is simple. This is because the function doesn’t take any parameters:
DataFrame['column_name'].unique()
pythonKeep in mind that unique()
can only be applied to one column. Before calling the function, you’ll need to indicate which column you want to evaluate. The unique()
function returns a numpy array containing all the different values in the order they appear, with duplicate values in the column removed. It doesn’t, however, sort the values.
If you’ve been working with Python for a while, you may be familiar with the numpy equivalent to pandas unique()
. For efficiency reasons, the pandas version is generally preferable.
How to use pandas DataFrame[].unique()
To use unique()
in a pandas DataFrame, you need to first specify the column you want to check. In the following example, we’ll use a DataFrame that contains information about the age and city of residence of a group of individuals.
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)
pythonThe resulting DataFrame looks like this:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 New York
3 David 32 Chicago
4 Edward 29 Los Angeles
Now, let’s say we want to create a list of all the cities where the people in the DataFrame live. We can apply the pandas unique()
function to the column that contains the cities.
# Find different cities
unique_cities = df['City'].unique()
print(unique_cities)
pythonThe output is a numpy array that lists each city once, showing that the individuals in the DataFrame are from a total of three cities: New York, Los Angeles and Chicago.
['New York' 'Los Angeles' 'Chicago']