How to select data from pandas DataFrames with loc[]
In the Python pandas library, DataFrame.loc[]
is a property that lets you select data from a DataFrame using labels. This makes it easy to extract specific rows and columns from a DataFrame.
- 99.9% uptime
- PHP 8.3 with JIT compiler
- SSL, DDoS protection, and backups
What is the syntax for pandas loc[]
?
The syntax for loc[]
is quite simple. All you need to do is pass the labels of the columns and rows you want to select as a parameter:
DataFrame.loc[selection]
pythonWith pandas loc[]
, selections are primarily made using labels. This means the parameter you provide can be a single label, a list or a slice of labels. Boolean arrays can also be used as well.
What is the difference between loc[]
and iloc[]
?
While pandas DataFrame.loc[]
selects data based on labels, DataFrame.iloc selects data based on integer-based positions. Here’s a code example to help illustrate the differences. First, we’re going to create a pandas DataFrame:
import pandas as pd
# Example DataFrame
data = {'Name': ['Alyssa', 'Brandon', 'Carmen'], 'Age': [23, 35, 30]}
df = pd.DataFrame(data)
print(df)
pythonHere’s what the DataFrame looks like:
Name Age
0 Alyssa 23
1 Brandon 35
2 Carmen 30
To extract “Alyssa” from the DataFrame, you can use both pandas loc[]
and iloc[]
. Although the approach differs, the result is the same:
# Using loc and labels to extract Alyssa
print(df.loc[0, 'Name']) # Output: 'Alyssa'
# Using iloc and integers to extract Alysa
print(df.iloc[0, 0]) # Output: 'Alyssa'
pythonHow to use pandas DataFrame.loc[]
Pandas loc[]
helps you extract subsets of your DataFrame. With loc[]
, you can extract a single row or column, multiple rows and columns or even apply conditions for filtering. This flexibility makes it suitable for a variety of use cases.
Selecting a single row
Let’s look at a DataFrame example:
import pandas as pd
data = {
'Name': ['Alyssa', 'Brandon', 'Carmen'],
'Age': [23, 35, 30],
'City': ['Detroit', 'Atlanta', 'Seattle']
}
df = pd.DataFrame(data)
print(df)
pythonHere’s what the resulting DataFrame looks like:
Name Age City
0 Alyssa 23 Detroit
1 Brandon 35 Atlanta
2 Carmen 30 Seattle
To select the data from the row that contains information about Brandon (index 1), you can use pandas loc[]
:
brandon_data = df.loc[1]
print(brandon_data)
pythonHere’s the result:
Name Brandon
Age 35
City Atlanta
Name: 1, dtype: object
Selecting multiple columns
You can also use DataFrame.loc[]
to select a subset of columns. The following code selects the columns “Name” and “City”:
name_city = df.loc[:, ['Name', 'City']]
print(name_city)
pythonThe result is a subset of the original DataFrame:
Name City
0 Alyssa Detroit
1 Brandon Atlanta
2 Carmen Seattle
Selecting rows based on conditions
With pandas loc[]
, you can also select rows that meet specific criteria. You can do this with Boolean comparison operators. For example, here’s how to filter out all individuals who are older than 25:
older_than_25 = df.loc[df['Age'] > 25]
print(older_than_25)
pythonThe code above produces a DataFrame that only includes data for individuals in the DataFrame who are older than 25:
Name Age City
1 Brandon 35 Atlanta
2 Carmen 30 Seattle