EmEditor (text editor) Forum Index
   EmEditor Core Enhancement Suggestions
     Ignore \r in regex
Register To Post

Threaded | Newest First Previous Topic | Next Topic | Bottom
Poster Thread
Deipotent
Posted on: 8/24/2011 9:10 am
Just can't stay away
Joined: 2/15/2008
From:
Posts: 118
Ignore \r in regex
Further to my post on a regular expression to delete duplicate lines, and your reply that \r is ignored, it would be useful if EmEditor stripped out \r from regexes. Obviously this is a bit more complicated that just removing "\r" since the \r might be in parentheses or referred to afterwards (as is the case in the linked regex).

http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=3&forum=3&post_id=5812#forumpost5812


Also, how come a search on with a regex of "\r" will match every character ?


Stefan
Posted on: 8/24/2011 12:02 pm
Home away from home
Joined: 7/14/2008
From: Germany, EU
Posts: 262
Re: Ignore \r in regex
> "it would be useful if EmEditor stripped out \r from regexes"

I think that would be too smart and unexpected for many.
The user just have to learn how it works.
If not already, there should be an explanation incl. example in the help, that should be enough.
Deipotent
Posted on: 8/24/2011 5:08 pm
Just can't stay away
Joined: 2/15/2008
From:
Posts: 118
Re: Ignore \r in regex
I don't mean strip it out of the edit box, I mean just strip it out of the regex that EmEditor will use internally - so the user wouldn't know any difference, but these valid regex will work.

At the moment, some valid regex are reported as incorrect by EmEditor (see link to my other post for an example of finding duplicates)
Stefan
Posted on: 8/25/2011 4:54 am
Home away from home
Joined: 7/14/2008
From: Germany, EU
Posts: 262
Re: Ignore \r in regex
Quote:

Deipotent wrote:
I don't mean strip it out of the edit box, I mean just strip it out of the regex


Yes, basically the very same.
I (the user) try to do smtg, and the app do smtg other in the back.
And i (the user) get the impression that i did it right, what is not the truth.

I know that that is common (see IE which autom. add http:// in front of an URL, or how Win7 lies with UAC virtualizing)
but i did not want to see that more often.
Deipotent
Posted on: 8/25/2011 6:20 am
Just can't stay away
Joined: 2/15/2008
From:
Posts: 118
Re: Ignore \r in regex
Quote:
And i (the user) get the impression that i did it right, what is not the truth.
But you would have done it right, with a valid regex. Currently, EmEditor errors out when some valid regex are used.

I think it's preferable to make EmEditor adhere to standards, rather than making people who are familiar with regex's modify them to just to get them to work with EmEditor.

To use your analogy, EmEditor currently does it like IE of old, where people have to put in workarounds to get a regex to work. My proposal is to make EmEditor more like the latest versions of IE which are more standards compliant, and so work with valid regex's.

Deipotent
Posted on: 8/25/2011 9:46 am
Just can't stay away
Joined: 2/15/2008
From:
Posts: 118
Re: Ignore \r in regex
After thinking about this again, I re-looked at the regex in my linked post for finding duplicate lines, and realised that there appears to a bug in EmEditor's regex parser.

My suggestion for ignoring \r already appears to be implemented. Let me give some examples to explain:

1) Even if EmEditor didn't ignore \r, the following regex should work as the part, "\r?", is saying match zero or one occurrences of \r. So it should match whether there is a \r or not. But it doesn't.

^(.*)(\r?\n\1)+$


2) However, if you remove the '?', so the regex becomes as follows, it works, and matches duplicate lines:

^(.*)(\r\n\1)+$


3) It also works if you have the original regex, but replace "\r" with "(\r)":

^(.*)((\r)?\n\1)+$



So, the bug appears to be that EmEditor seems to have a problem applying '?', when it follows a \r. It already seems to just remove the "\r", so the original regex ends up being turned into

^(.*)(?\n\1)+$


which doesn't work.
CrashNBurn
Posted on: 9/22/2011 3:37 pm
Just can't stay away
Joined: 4/8/2010
From:
Posts: 130
Re: Ignore \r in regex
The (? syntax is something entirely different.

Example, given the following Text
Quote:
This is a test sentence
This is a test sentence
of a regex string
of a regex string


Try this regex:
Search: ^(.*)(?|(a test )|(a regex ))(.*)$
Replace: \3

With the (?| syntax, there will only be 3 pieces, since the section inside (?|...) isn't counted as a bracket set.
For the first line:
\1: This is_
\2: a test_
\3: sentence

The lines will be replaced with "sentence" and "string".


Now without the (?| syntax, eg:
Search: ^(.*)((a test )|(a regex ))(.*)$ that regex would contain 5 pieces that would be matched with \1 thru \5, and some of them would be empty, depending on the line.

For the first line:
\1: This is_
\2: a test_
\3: a test_
\4:
\5: sentence

The (? syntax becomes important if you utilize more complex regexes due to the nature of regex to match everything it absolutely can when things like ".*" are present.


Example 2A:
Search: ^(.*)( is a | a regex )?
Replace: \1_

That regex will match the whole line, and the replace wont change anything -- as the .* consumes the whole line, since the part in (brackets)? is optional, the greedy regex wont see it.


Example 2B:
Search: ^(.*)(?|( is a )|( a regex ))
Replace: \1_

This regex will actually remove " is a " and " a regex " from the lines in question.

A lot of editors don't even implement the full regex spec, e.g. Notepad2 does a small subset which doesn't even handle the optional "?" correctly, and certainly doesn't support the (?| syntax.
Threaded | Newest First Previous Topic | Next Topic | Top


Register To Post
 
English čeština Deutsch español français italiano 日本語 한국어 Русский 简体中文 繁體中文