After the built-in package ‘re’ has been imported, regular ex­pres­sions, or RegEx for short, can be used to search for certain patterns within strings.

What is Python RegEx?

RegEx, or regular ex­pres­sions, are strings that are used to define specific search patterns. Once a search pattern has been defined, you can check Python strings for the specified pattern. Python RegEx has its own syntax and semantics.

Note

Many Python tutorials do not cover advanced pro­gram­ming con­structs like regular ex­pres­sions in detail. If you are in­ter­est­ed in learning about other advanced Python pro­gram­ming concepts, check out the following articles:

Python RegEx ap­pli­ca­tions

Regular ex­pres­sions are often used to check user input that needs to fit a specific format.

An example you might be familiar with is when a user needs to create a password that contains at least one uppercase letter and one number. Python RegEx can be used to check the input against these rules.

Regular ex­pres­sions are also used for online forms to check if user input is valid. For example, they can check whether or not a user has entered a valid email address format when filling out a form or reg­is­ter­ing for a website.

Tip

If you’re working on a web project with Python, regular ex­pres­sions can help you in many areas of your project. Another valuable resource to consider is Deploy Now by IONOS. With Deploy Now, you can directly build and deploy your web projects with GitHub.

What are the semantics and syntax for Python RegEx?

Metachar­ac­ters are used in regular ex­pres­sions. Each of these char­ac­ters has a specific meaning and distinct function within the context of Python RegEx The following table gives an overview of the most important metachar­ac­ters and their meaning along with an example:

Char­ac­ters De­scrip­tion Example
. Stands for any character except newline “he..o” -> finds all strings that start with “he” followed by any two char­ac­ters and then followed by “o”, e.g., “hello”
[] Finds all letters specified between the brackets “[a-e]” -> finds all lowercase letters between a and e
^ Checks if a string starts with a specified character or string “hello” -> checks if string starts with “hello”
$ Checks if a string ends with a specified character or string “$world” -> checks if string ends with “world”
* Zero or more oc­cur­rences of one character “a*” -> matches any number of a’s as well as no a’s at all
+ One or more oc­cur­rences of a character “a+” -> matches at least one oc­cur­rence of a
? One or no oc­cur­rence of a character “a?” -> matches exactly one a or none
{} Checks if a character occurs as often as specified in the curly braces “hel{2}o” -> matches the string “hello”

Sets

Sets are RegEx patterns that start and end with a square bracket. These are very important for Python RegEx. The table above shows an example of a set that finds all lowercase letters between a and e. Below is an overview of sets in Python RegEx:

Set De­scrip­tion
[abc] Matches if one of the char­ac­ters specified between the brackets (i.e., a, b, or c) occurs within a string
[^abc] Matches for all char­ac­ters not specified inside the brackets
[a-z] Matches for all lowercase letters between a and z
[a-zA-Z] Matches for all letters between a and z in both upper- and lowercase
[0-9] Matches for any number between 0 and 9
[1-5][0-9] Matches for all two-digit numbers between 10 and 59

As you can see, sets are a powerful tool for various regular ex­pres­sions. However, when using sets, keep in mind that the metachar­ac­ters presented in the first table do not carry a special meaning when placed inside square brackets. So, for example, the set [] would match every in a string.

Sequences

In addition to metachar­ac­ters, there are also special, pre­de­fined sequences for creating precise search patterns in Python RegEx.

Sequence De­scrip­tion Example
\A Matches if the specified string is found at the beginning of a string “\AMonday”
  • Matches for “Monday is a great day.”
  • Does not match for “It’s Monday.”
\b Matches if the specified string is found at the beginning or at the end of a word “\bes”
  • Matches for “The symbol for the element Ein­steini­um is Es.”
  • Matches for “es­pe­cial­ly”
  • Does not match for “west” “es\b”
  • Matches for “watches”
  • Matches for “She teaches math.”
  • Does not match for “rest”
\B Matches if the specified string is not found at the beginning or end of a word (opposite of \b) { “\Bes”.
  • Does not match for “The symbol for the element Ein­steini­um is Es.”
  • Does not match for “es­pe­cial­ly”
  • Matches for “west” { “es\B”
  • Does not match for “watches”
  • Does not match for “She teaches math.”
  • Matches for “rest”
\d Matches every digit between 0 and 9 (equiv­a­lent to [0-9]) “123”
  • \d finds three matches for 1, 2 and 3
\D Matches all char­ac­ters that are not digits (equiv­a­lent to [^0-9]) “123acb&”
  • \D finds four matches for a, c, b and &
\s Matches if the string contains a space “Python RegEx”
  • \s matches because there is a space character
\S Matches if the string does not contain a space (opposite of \s) “Python RegEx”
  • \S does not match because there is a space character
\w Matches all al­phanu­mer­ic char­ac­ters “1abc$%3”
  • \w finds four matches for 1, a, b, c and 3
\W Matches for all char­ac­ters that are not al­phanu­mer­ic char­ac­ters (opposite of \w) “1abc$%3”
  • \W finds two matches for $ and %
\Z Matches if the specified string is at the end of a string “Python\Z”
  • Matches for “RegEx in Python”
  • Does not match for “Python RegEx”

What functions can I use for Python RegEx?

Several pre­de­fined functions will assist you when using RegEx in Python. These functions are located in a Python module called “re”. You’ll need to import these before you can start working with regular ex­pres­sions:

import re
Python

re.findall()

The findall() function is probably the most important function when using Python RegEx . It takes a search pattern and a Python string and returns a Python list. The list consists of strings con­tain­ing all matches in the order that they were found. The findall() call will return an empty list if no match is found.

The following code example il­lus­trates this function:

import re
string = "python 3.0"
regex = "\D"
result = re.findall(regex, string)
print(result)
Python

Notice that in the code snippet above the re module is imported first. The “string” variable is then used to store the string “python 3.0”. The search pattern stored in the “regex” variable is in the sequence table and matches all char­ac­ters that are not digits. The findall() function carries out the matching. It takes the search pattern as an argument and examines the string. The list returned by the function is stored in the “result” variable and is output to the screen with a call to Python print. The output looks like this:

['p', 'y', 't', 'h', 'o', 'n', ' ', '.']

The list contains every character from the string except the digits. Keep in mind that the space character counts as a separate character and as such appears in the list.

re.sub()

The sub() function over­writes all matches with a text of your choice. Like findall(), this function takes a regular ex­pres­sion as the first parameter. In the second parameter, you need to pass the text that you want to replace the matches with. The function’s third parameter is the string that you want to search for. If you only want to replace a certain number of matches, you can specify a number as the fourth parameter. This indicates how many matches should be replaced starting with the first match.

This following code example will help to clarify how this works:

import re
string = "python is a great programming language"
regex = "\s"
result1 = re.sub(regex, "0", string)
print(result1)
result2 = re.sub(regex, "0", string, 2)
print(result2)
Python

As you can see, re is imported first and a string is stored in the variable “string”. The search pattern should match all spaces in the string.

This is followed by two similar calls to sub(). The first function call should replace every space in the passed string with a 0 and store the result in the variable “result1”. The second function call limits the number of spaces using the fourth parameter, which is optional. The first two spaces in the passed string should be replaced with a 0 and it should store the result in the variable “result2”.

The code’s output will look like this:

'python0is0a0great0programming0language'
'python0is0a great programming language'

re.split()

The split() function from the re module is similar to the built-in Python split() function, with both allowing you to split a string into a list. In this function, the first parameter is a search pattern, and the second parameter contains the string that should be split. After each match, the string is in­ter­rupt­ed with a regular ex­pres­sion. If you want to split a string a certain number of times, you can pass a number in the third parameter. This will determine the maximum number of splits. The third parameter, however, is optional. Here’s an example of how this works:

import re
string = "python is a great programming language"
regex = "\s"
result1 = re.split(regex, string)
print(result1)
result2 = re.split(regex, string, 1)
print(result2)
Python

Most of the code in this example is similar to the previous example. The split() function call is the only dif­fer­ence. The split() function is called on the string and should split it every time a space occurs. The resulting list is assigned to the variable “result1”. The second split() call limits the number of splits to 1 by spec­i­fy­ing the optional third parameter. It assigns the result to the variable named “result2”. The results are as follows when the program is executed:

['python', 'is', 'a', 'great', 'programming language']
['python', 'is a great programming language']

re.search()

The search() function searches a string for a match. It takes the regular ex­pres­sion first and the string you want to examine as second parameter. Then it returns a Python match object, which is the first match found. If no match is found, the function returns the value “None”. To better un­der­stand how the function works, take a look at the example below:

import re
string = "python is a great programming language"
regex = "\s"
match = re.search(regex, string)
if match:
	print("RegEx was found.")
else:
	print("RegEx was not found.")
Python

The search() function is called with a regular ex­pres­sion that searches for spaces, and a string. The match object returned by the function call is stored in the “match” variable. The Python if else statement is used to help il­lus­trate this. If a match is found, the match object is not empty and the if-path is chosen. The program returns the following output:

'RegEx was found.'

What is the match object?

The match object is returned by a search() call and contains in­for­ma­tion about the search pattern results. You can access this in­for­ma­tion with various functions:

  • object.start() returns the index of the first character of the Python substring that matches your search pattern.
  • object.end() returns the index of the last character.
  • object.span() combines start() and end(). The function returns a Python tuple con­tain­ing the substring’s first and last index.
  • object.string returns the string you searched for.
  • object.re returns the Python RegEx that you passed to search().

You can get a better idea of these functions by adding the function calls to the last code example:

import re
string = "python is a great programming language"
regex = "\s"
match = re.search(regex, string)
if match:
	print("RegEx was found.")
else:
	print("RegEx was not found.")
print(match.start())
print(match.end())
print(match.span())
print(match.string)
print(match.re())
Python

The output looks like this:

'RegEx was found.'
6
7
(6, 7)
'python is a great programming language'
re.compile('\\s')

The string “RegEx was found.” is output. This is because the match object is not empty, and this makes the if-condition true. The first match’s index is then displayed. The value may have been easy to guess since the first blank has the index “6”. This is also the case for the value “7”, which is output by calling the end() function. The tuple “(6, 7)” unites the call to start() and end() by spec­i­fy­ing both indices at the same time. The string passed to the match object is also as expected.

But what about the output “re.compile(‘\s’)”? This is a Python RegEx object. It is created when the string that you passed as a regular ex­pres­sion is processed as such. You can display the RegEx object using your match object.

Go to Main Menu