Matching Simple Patterns
Match a single digit character using [0-9] or \d (Java)
[0-9]
and \d
are equivalent patterns (unless your Regex engine is unicode-aware and \d
also matches things like ②). They will both match a single digit character so you can use whichever notation you find more readable.
Create a string of the pattern you wish to match. If using the \d notation, you will need to add a second backslash to escape the first backslash.
String pattern = "\\d";
Create a Pattern object. Pass the pattern string into the compile() method.
Pattern p = Pattern.compile(pattern);
Create a Matcher object. Pass the string you are looking to find the pattern in to the matcher() method. Check to see if the pattern is found.
Matcher m1 = p.matcher("0");
m1.matches(); //will return true
Matcher m2 = p.matcher("5");
m2.matches(); //will return true
Matcher m3 = p.matcher("12345");
m3.matches(); //will return false since your pattern is only for a single integer
Matching various numbers
[a-b]
where a and b are digits in the range 0
to 9
[3-7] will match a single digit in the range 3 to 7.
Matching multiple digits
\d\d will match 2 consecutive digits
\d+ will match 1 or more consecutive digits
\d* will match 0 or more consecutive digits
\d{3} will match 3 consecutive digits
\d{3,6} will match 3 to 6 consecutive digits
\d{3,} will match 3 or more consecutive digits
The \d
in the above examples can be replaced with a number range:
[3-7][3-7] will match 2 consecutive digits that are in the range 3 to 7
[3-7]+ will match 1 or more consecutive digits that are in the range 3 to 7
[3-7]* will match 0 or more consecutive digits that are in the range 3 to 7
[3-7]{3} will match 3 consecutive digits that are in the range 3 to 7
[3-7]{3,6} will match 3 to 6 consecutive digits that are in the range 3 to 7
[3-7]{3,} will match 3 or more consecutive digits that are in the range 3 to 7
You can also select specific digits:
[13579] will only match "odd" digits
[02468] will only match "even" digits
1|3|5|7|9 another way of matching "odd" digits - the | symbol means OR
Matching numbers in ranges that contain more than one digit:
\d|10 matches 0 to 10 single digit OR 10. The | symbol means OR
[1-9]|10 matches 1 to 10 digit in range 1 to 9 OR 10
[1-9]|1[0-5] matches 1 to 15 digit in range 1 to 9 OR 1 followed by digit 1 to 5
\d{1,2}|100 matches 0 to 100 one to two digits OR 100
Matching numbers that divide by other numbers:
\d*0 matches any number that divides by 10 - any number ending in 0
\d*00 matches any number that divides by 100 - any number ending in 00
\d*[05] matches any number that divides by 5 - any number ending in 0 or 5
\d*[02468] matches any number that divides by 2 - any number ending in 0,2,4,6 or 8
matching numbers that divide by 4 - any number that is 0, 4 or 8 or ends in 00, 04, 08, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92 or 96
[048]|\d*(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96)
This can be shortened. For example, instead of using 20|24|28
we can use 2[048]
. Also, as the 40s, 60s and 80s have the same pattern we can include them: [02468][048]
and the others have a pattern too [13579][26]
. So the whole sequence can be reduce to:
[048]|\d*([02468][048]|[13579][26]) - numbers divisible by 4
Matching numbers that don’t have a pattern like those divisible by 2,4,5,10 etc can’t always be done succinctly and you usually have to resort to a range of numbers. For example matching all numbers that divide by 7 within the range of 1 to 50 can be done simple by listing all those numbers:
7|14|21|28|35|42|49
or you could do it this way
7|14|2[18]|35|4[29]
Matching leading/trailing whitespace
Trailing spaces
\s*$
: This will match any (*
) whitespace (\s
) at the end ($
) of the text
Leading spaces
^\s*
: This will match any (*
) whitespace (\s
) at the beginning (^
) of the text
Remarks
\s
is a common metacharacter for several RegExp engines, and is meant to capture whitespace characters (spaces, newlines and tabs for example). Note: it probably won’t capture all the unicode space characters. Check your engines documentation to be sure about this.
Match any float
[\+\-]?\d+(\.\d*)?
This will match any signed float, if you don’t want signs or are parsing an equation remove [\+\-]?
so you have \d+(\.\d+)?
Explanation:
\d+
matches any integer()?
means the contents of the parentheses are optional but always have to appear together- ’\.’ matches ’.’, we have to escape this since ’.’ normally matches any character
So this expression will match
5
+5
-5
5.5
+5.5
-5.5
Selecting a certain line from a list based on a word in certain location
I have the following list:
1. Alon Cohen
2. Elad Yaron
3. Yaron Amrani
4. Yogev Yaron
I want to select the first name of the guys with the Yaron surname.
Since I don’t care about what number it is I’ll just put it as whatever digit it is and a matching dot and space after it from the beginning of the line, like this: ^[\d]+\.\s
.
Now we’ll have to match the space and the first name, since we can’t tell whether it’s capital or small letters we’ll just match both: [a-zA-Z]+\s
or [a-Z]+\s
and can also be [\w]+\s
.
Now we’ll specify the required surname to get only the lines containing Yaron as a surname (at the end of the line): \sYaron$
.
Putting this all together ^[\d]+\.\s[\w]+\sYaron$
.
Live example: https://regex101.com/r/nW4fH8/1