Regular Expressions

Matching Simple Patterns

Match a single digit character using [0-9] or \d (Java)

[0-9] and \d are equivalent patterns (unless your Regex engine is unicode-aware and \d also matches things like ②). They will both match a single digit character so you can use whichever notation you find more readable.

Create a string of the pattern you wish to match. If using the \d notation, you will need to add a second backslash to escape the first backslash.

String pattern = "\\d";

Create a Pattern object. Pass the pattern string into the compile() method.

Pattern p = Pattern.compile(pattern);

Create a Matcher object. Pass the string you are looking to find the pattern in to the matcher() method. Check to see if the pattern is found.

Matcher m1 = p.matcher("0");
m1.matches(); //will return true

Matcher m2 = p.matcher("5");
m2.matches(); //will return true

Matcher m3 = p.matcher("12345");
m3.matches(); //will return false since your pattern is only for a single integer

Matching various numbers

[a-b] where a and b are digits in the range 0 to 9

[3-7] will match a single digit in the range 3 to 7.

Matching multiple digits

\d\d       will match 2 consecutive digits
\d+        will match 1 or more consecutive digits
\d*        will match 0 or more consecutive digits
\d{3}      will match 3 consecutive digits
\d{3,6}    will match 3 to 6 consecutive digits
\d{3,}     will match 3 or more consecutive digits

The \d in the above examples can be replaced with a number range:

[3-7][3-7]    will match 2 consecutive digits that are in the range 3 to 7
[3-7]+        will match 1 or more consecutive digits that are in the range 3 to 7
[3-7]*        will match 0 or more consecutive digits that are in the range 3 to 7
[3-7]{3}      will match 3 consecutive digits that are in the range 3 to 7
[3-7]{3,6}    will match 3 to 6 consecutive digits that are in the range 3 to 7
[3-7]{3,}     will match 3 or more consecutive digits that are in the range 3 to 7

You can also select specific digits:

[13579]       will only match "odd" digits
[02468]       will only match "even" digits
1|3|5|7|9     another way of matching "odd" digits - the | symbol means OR

Matching numbers in ranges that contain more than one digit:

\d|10        matches 0 to 10    single digit OR 10.  The | symbol means OR
[1-9]|10     matches 1 to 10    digit in range 1 to 9 OR 10
[1-9]|1[0-5] matches 1 to 15    digit in range 1 to 9 OR 1 followed by digit 1 to 5
\d{1,2}|100  matches 0 to 100   one to two digits OR 100

Matching numbers that divide by other numbers:

\d*0         matches any number that divides by 10  - any number ending in 0
\d*00        matches any number that divides by 100 - any number ending in 00
\d*[05]      matches any number that divides by 5   - any number ending in 0 or 5
\d*[02468]   matches any number that divides by 2   - any number ending in 0,2,4,6 or 8

matching numbers that divide by 4 - any number that is 0, 4 or 8 or ends in 00, 04, 08, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92 or 96

[048]|\d*(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96)

This can be shortened. For example, instead of using 20|24|28 we can use 2[048]. Also, as the 40s, 60s and 80s have the same pattern we can include them: [02468][048] and the others have a pattern too [13579][26]. So the whole sequence can be reduce to:

[048]|\d*([02468][048]|[13579][26])    - numbers divisible by 4

Matching numbers that don’t have a pattern like those divisible by 2,4,5,10 etc can’t always be done succinctly and you usually have to resort to a range of numbers. For example matching all numbers that divide by 7 within the range of 1 to 50 can be done simple by listing all those numbers:

7|14|21|28|35|42|49

or you could do it this way

7|14|2[18]|35|4[29]

Matching leading/trailing whitespace

Trailing spaces

\s*$: This will match any (*) whitespace (\s) at the end ($) of the text

Leading spaces

^\s*: This will match any (*) whitespace (\s) at the beginning (^) of the text

Remarks

\s is a common metacharacter for several RegExp engines, and is meant to capture whitespace characters (spaces, newlines and tabs for example). Note: it probably won’t capture all the unicode space characters. Check your engines documentation to be sure about this.

Match any float

[\+\-]?\d+(\.\d*)?

This will match any signed float, if you don’t want signs or are parsing an equation remove [\+\-]? so you have \d+(\.\d+)?

Explanation:

  • \d+ matches any integer
  • ()? means the contents of the parentheses are optional but always have to appear together
  • ’\.’ matches ’.’, we have to escape this since ’.’ normally matches any character

So this expression will match

5
+5
-5
5.5
+5.5
-5.5

Selecting a certain line from a list based on a word in certain location

I have the following list:

1. Alon Cohen
2. Elad Yaron
3. Yaron Amrani
4. Yogev Yaron

I want to select the first name of the guys with the Yaron surname.

Since I don’t care about what number it is I’ll just put it as whatever digit it is and a matching dot and space after it from the beginning of the line, like this: ^[\d]+\.\s.

Now we’ll have to match the space and the first name, since we can’t tell whether it’s capital or small letters we’ll just match both: [a-zA-Z]+\s or [a-Z]+\s and can also be [\w]+\s.

Now we’ll specify the required surname to get only the lines containing Yaron as a surname (at the end of the line): \sYaron$.

Putting this all together ^[\d]+\.\s[\w]+\sYaron$.

Live example: https://regex101.com/r/nW4fH8/1


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow