EmEditor (text editor) Forum Index
   EmEditor Core Bug Reports
     Regex bug related to ignoring \r when regex operator appears after it
Register To Post

Threaded | Oldest First Previous Topic | Next Topic | Bottom
Poster Thread
Yutaka
Posted on: 9/16/2011 8:37 am
Webmaster
Joined: 9/28/2006
From: Redmond
Posts: 2397
Re: Regex bug related to ignoring \r when regex operator appears after it
Hi Deipotent,

As you wrote, EmEditor strips out '\r' from regular expressions when you use Find. This is because a new line is represented as a '\n' and not '\r\n' no matter whether '\r' or '\n' or '\r\n' is used for a new line. Currently, this is the specification because many users were confused whey they needed to specify '\r' or '\r\n' for a new line in earlier versions.

When you specify a new line, please use '\n', and not '\r'.

Thank you!


----------------
Yutaka Emura
Developer of EmEditor
http://www.emeditor.com/

Deipotent
Posted on: 8/25/2011 1:27 pm
Just can't stay away
Joined: 2/15/2008
From:
Posts: 118
Regex bug related to ignoring \r when regex operator appears after it
Related to my original post about a regex for finding duplicate lines:

http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=3&forum=3&post_id=5812#forumpost5812

and my subsequent enhancement request about ignoring \r in a regex:

http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=1811&post_id=5888&forum=4#forumpost5888

it would appear there is a bug when "\r" appears in a regex, and is followed by a regex operator like ? or * (and probably others).

From playing around with it, it looks like EmEditor just removes the "\r", so when there is a regex like

^(.*)(\r?\n\1)+$


EmEditor silently strips out "\r", leaving

^(.*)(?\n\1)+$


which is incorrect.

A few possible solutions involve:

1) Checking if the "\r" is followed by an operator which applies to the previous character (eg. ? or * etc.), and stripping that operator as well.

2) Replacing the "\r" with "(\r)" seems to solve the problem, but that would then affect back references.

3) If possible, tell the regex engine to ignore "\r".

Option (3) may be the best option if available, since it will probably handle all cases properly.

Either way, it's not as easy at it would first seem.
Threaded | Oldest First Previous Topic | Next Topic | Top


Register To Post
 
English čeština Deutsch español français italiano 日本語 한국어 Русский 简体中文 繁體中文