分类: Javascript

Professional Javascript For Web Developers – Regular Expressions






Learning the Basics
. – Matches any character, except for line breaks if dotall is false.
//For example, /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but not 'nay'.
* – Matches 0 or more of the preceding character.
//For example, /bo*/ matches 'boooo' in "A ghost booooed" and 'b' in "A bird warbled", but nothing in "A goat grunted".
+ – Matches 1 or more of the preceding character.
//For example, /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy".
? – Preceding character is optional. Matches 0 or 1 occurrence.
//For example, /e?le?/ matches the 'el' in "angel" and the 'le' in "angle" and also the 'l' in "oslo".
  If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the fewest possible characters), as opposed to the default, which is greedy (matching as many characters as possible). For example, applying /\d+/ to "123abc" matches "123". But applying /\d+?/ to that same string matches only the "1".
\d – Matches any single digit
//For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."
\w – Matches any word character (alphanumeric & underscore).
//For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."
[XYZ] – Matches any single character from the character class.
[XYZ]+ – Matches one or more of any of the characters in the set.
$ – Matches the end of the string.
^ – Matches the beginning of a string.
[^a-z] – When inside of a character class, the ^ means NOT; in this case, match anything that is NOT a lowercase letter.

Regular Expressions and JavaScript

ECMAScript supports regular expressions through the RegExp type. Regular expressions are easy to create using syntax similar to Perl, as shown here:

	var expression = /pattern/flags;

The pattern part of the expression can be any simple or complicated regular expression, including character classes, quantifiers, grouping, lookaheads, and backreferences. Each expression can have zero or more flags indicating how the expression should behave. Three supported flags represent matching modes, as follows:
➤ g — Indicates global mode, meaning the pattern will be applied to all of the string instead of stopping after the first match is found.
➤ i — Indicates case-insensitive mode, meaning the case of the pattern and the string are ignored when determining matches.
➤ m — Indicates multiline mode, meaning the pattern will continue looking for matches after reaching the end of one line of text.
A regular expression is created using a combination of a pattern and these flags to produce different results, as in this example:

	/*
	* Match all instances of “at” in a string. */
	var pattern1 = /at/g; /*

As with regular expressions in other languages, all metacharacters must be escaped when used as part of the pattern. The metacharacters are as follows:

([{\^$|)]}?* +.

Each metacharacter has one or more uses in regular-expression syntax and so must be escaped by a backslash(\) when you want to match the character in a string. Here are some examples:

	/*
	* Match the first instance of “bat” or “cat”, regardless of case. */
	var pattern1 = /[bc]at/i;
	
	/*
	* Match the first instance of ”[bc]at”, regardless of case. */
	var pattern2 = /\[bc\]at/i;
	
	/*
	* Match all three-character combinations ending with ”at”, regardless of case. */
	var pattern3 = /.at/gi;
	
	/*
	* Match all instances of ”.at”, regardless of case. */
	var pattern4 = /\.at/gi;

Keep in mind that creating a regular expression using a literal is not exactly the same as creating a regular expression using the RegExp constructor. In ECMAScript 3, regular-expression literals always share the same RegExp instance, while creating a new RegExp via constructor always results in a new instance. Consider the following:

	var re = null, i;
	for (i=0; i < 10; i++){ 
		re = /cat/g;
		re.test(“catastrophe”); 
	}
	for (i=0; i < 10; i++){
		re = new RegExp(“cat”, “g”); 
		re.test(“catastrophe”);
	}
In the first loop, there is only one instance of RegExp created for /cat/, even though it is specified in the body of the loop. Instance properties are not reset, so calling test() fails every other time through the loop. This happens because the “cat” is found in the first call to test(), but the second call begins its search from index 3 (the end of the last match) and can’t find it. Since the end of the string is found, the subsequent call to test() starts at the beginning again. The second loop uses the RegExp constructor to create the regular expression each time through the loop. Each call to test() returns true since a new instance of RegExp is created for each iteration. RegExp Instance Properties Each instance of RegExp has the following properties that allow you to get information about the pattern: ➤ global — A Boolean value indicating whether the g flag has been set. ➤ ignoreCase — A Boolean value indicating whether the i flag has been set. ➤ lastIndex — An integer indicating the character position where the next match will be attempted in the source string. This value always begins as 0. ➤ multiline — A Boolean value indicating whether the m flag has been set. ➤ source — The string source of the regular expression. This is always returned as if specified in literal form (without opening and closing slashes) rather than a string pattern as passed into the constructor.
	var pattern1 = /\[bc\]at/i;
	alert(pattern1.global); 		//false
	alert(pattern1.ignoreCase); 	        //true
	alert(pattern1.multiline); 	 	//false
	alert(pattern1.lastIndex); 		//0
	alert(pattern1.source);			//”\[bc\]at”

 	var pattern2 = new RegExp(“\\[bc\\]at”, “i”);
	alert(pattern2.global); 		//false
	alert(pattern2.ignoreCase); 	        //true
	alert(pattern2.multiline); 		//false
	alert(pattern2.lastIndex);		//0 
	alert(pattern2.source);			//”\[bc\]at”
RegExp Instance Methods ----------------------------------------------------------------------------------------------------- Method |Description ----------------------------------------------------------------------------------------------------- exec |executes a search for a match in a string. It returns an array of information. test |tests for a match in a string. It returns true or false. match |a search for a match in a string. It returns an array of information or null on a mismatch. search |tests for a match in a string. It returns the index of the match, or -1 if the search fails. replace |executes a search for a match in a string, and replaces the matched substring with a replacement split |uses a regular expression or a fixed string to break a string into an array of substrings. ----------------------------------------------------------------------------------------------------- more reference & example : (<a href="https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions" target="_blank">https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions) (<a href="http://net.tutsplus.com/tutorials/javascript-ajax/you-dont-know-anything-about-regular-expressions/" target="_blank">http://net.tutsplus.com/tutorials/javascript-ajax/you-dont-know-anything-about-regular-expressions/</a>) The primary method of a RegExp object is exec(), which is intended for use with capturing groups. This method accepts a single argument, which is the string on which to apply the pattern, and returns an array of information about the first match or null if no match was found. The returned array, though an instance of Array, contains two additional properties: index, which is the location in the string where the pattern was matched, and input, which is the string that the expression was run against. In the array, the first item is the string that matches the entire pattern. Any additional items represent captured groups inside the expression (if there are no capturing groups in the pattern, then the array has only one item). Consider the following:
	var text = “mom and dad and baby”;
	var pattern = /mom( and dad( and baby)?)?/gi;
	var matches = pattern.exec(text);
	alert(matches.index); 				//0
	alert(matches.input); 				//”mom and dad and baby” 
	alert(matches[0]); 				//”mom and dad and baby”
	alert(matches[1]); 				//” and dad and baby” 
	alert(matches[2]);				//” and baby”

The test() method is often used in if statements, such as the following:

	var text = “000-00-0000”;
	var pattern = /\d{3}-\d{2}-\d{4}/;
	if (pattern.test(text)){
		alert(”The pattern was matched.”);
	}




发表评论