Regular Expressions
Introduction#
A regular expression is a powerful way of specifying a pattern for a complex search.
REGEXP / RLIKE
The REGEXP
(or its synonym, RLIKE
) operator allows pattern matching based on regular expressions.
Consider the following employee
table:
+-------------+-------------+-------------+--------------+----------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | PHONE_NUMBER | SALARY |
+-------------+-------------+-------------+--------------+----------+
| 100 | Steven | King | 515.123.4567 | 24000.00 |
| 101 | Neena | Kochhar | 515.123.4568 | 17000.00 |
| 102 | Lex | De Haan | 515.123.4569 | 17000.00 |
| 103 | Alexander | Hunold | 590.423.4567 | 9000.00 |
| 104 | Bruce | Ernst | 590.423.4568 | 6000.00 |
| 105 | David | Austin | 590.423.4569 | 4800.00 |
| 106 | Valli | Pataballa | 590.423.4560 | 4800.00 |
| 107 | Diana | Lorentz | 590.423.5567 | 4200.00 |
| 108 | Nancy | Greenberg | 515.124.4569 | 12000.00 |
| 109 | Daniel | Faviet | 515.124.4169 | 9000.00 |
| 110 | John | Chen | 515.124.4269 | 8200.00 |
+-------------+-------------+-------------+--------------+----------+
Pattern ^
Select all employees whose FIRST_NAME
starts with N.
Query
SELECT * FROM employees WHERE FIRST_NAME REGEXP '^N'
-- Pattern start with----------------------------^
Pattern $**
Select all employees whose PHONE_NUMBER
ends with 4569.
Query
SELECT * FROM employees WHERE PHONE_NUMBER REGEXP '4569$'
-- Pattern end with----------------------------------^
NOT REGEXP
Select all employees whose FIRST_NAME
does not start with N.
Query
SELECT * FROM employees WHERE FIRST_NAME NOT REGEXP '^N'
-- Pattern does not start with---------------^
Regex Contain
Select all employees whose LAST_NAME
contains in and whose FIRST_NAME
contains a
.
Query
SELECT * FROM employees WHERE FIRST_NAME REGEXP 'a' AND LAST_NAME REGEXP 'in'
-- No ^ or $, pattern can be anywhere -------------------------------------^
Any character between [ ]
Select all employees whose FIRST_NAME
starts with A or B or C.
Query
SELECT * FROM employees WHERE FIRST_NAME REGEXP '^[ABC]'
-------------------------------------------------^^---^
Pattern or |
Select all employees whose FIRST_NAME
starts with A or B or C and ends with r, e, or i.
Query
SELECT * FROM employees WHERE FIRST_NAME REGEXP '^[ABC]|[rei]$'
-- ----------------------------------------------^^---^^^---^^
Counting regular expression matches
Consider the following query:
SELECT FIRST_NAME, FIRST_NAME REGEXP '^N' as matching FROM employees
FIRST_NAME REGEXP '^N'
is 1 or 0 depending on the fact that FIRST_NAME
matches ^N
.
To visualize it better:
SELECT
FIRST_NAME,
IF(FIRST_NAME REGEXP '^N', 'matches ^N', 'does not match ^N') as matching
FROM employees
Finally, count total number of matching and non-matching rows with:
SELECT
IF(FIRST_NAME REGEXP '^N', 'matches ^N', 'does not match ^N') as matching,
COUNT(*)
FROM employees
GROUP BY matching