Ignore r in regex – EmEditor (Text Editor)

Viewing 7 posts - 1 through 7 (of 7 total)

Author
Posts
August 24, 2011 at 5:10 pm #9594
Deipotent
Participant
Further to my post on a regular expression to delete duplicate lines, and your reply that r is ignored, it would be useful if EmEditor stripped out r from regexes. Obviously this is a bit more complicated that just removing “r” since the r might be in parentheses or referred to afterwards (as is the case in the linked regex).
http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=3&forum=3&post_id=5812#forumpost5812
Also, how come a search on with a regex of “r” will match every character ?
August 24, 2011 at 8:02 pm #9599
Stefan
Participant
> “it would be useful if EmEditor stripped out r from regexes”
I think that would be too smart and unexpected for many.
The user just have to learn how it works.
If not already, there should be an explanation incl. example in the help, that should be enough.
August 25, 2011 at 1:08 am #9602
Deipotent
Participant
I don’t mean strip it out of the edit box, I mean just strip it out of the regex that EmEditor will use internally – so the user wouldn’t know any difference, but these valid regex will work.
At the moment, some valid regex are reported as incorrect by EmEditor (see link to my other post for an example of finding duplicates)
August 25, 2011 at 12:54 pm #9603
Stefan
Participant
Deipotent wrote:
I don’t mean strip it out of the edit box, I mean just strip it out of the regex
Yes, basically the very same.
I (the user) try to do smtg, and the app do smtg other in the back.
And i (the user) get the impression that i did it right, what is not the truth.
I know that that is common (see IE which autom. add http:// in front of an URL, or how Win7 lies with UAC virtualizing)
but i did not want to see that more often.
August 25, 2011 at 2:20 pm #9604
Deipotent
Participant
And i (the user) get the impression that i did it right, what is not the truth.
But you would have done it right, with a valid regex. Currently, EmEditor errors out when some valid regex are used.
I think it’s preferable to make EmEditor adhere to standards, rather than making people who are familiar with regex’s modify them to just to get them to work with EmEditor.
To use your analogy, EmEditor currently does it like IE of old, where people have to put in workarounds to get a regex to work. My proposal is to make EmEditor more like the latest versions of IE which are more standards compliant, and so work with valid regex’s.
August 25, 2011 at 5:46 pm #9605
Deipotent
Participant
After thinking about this again, I re-looked at the regex in my linked post for finding duplicate lines, and realised that there appears to a bug in EmEditor’s regex parser.
My suggestion for ignoring r already appears to be implemented. Let me give some examples to explain:
1) Even if EmEditor didn’t ignore r, the following regex should work as the part, “r?”, is saying match zero or one occurrences of r. So it should match whether there is a r or not. But it doesn’t.
```
^(.*)(r?n1)+$
```
2) However, if you remove the ‘?’, so the regex becomes as follows, it works, and matches duplicate lines:
```
^(.*)(rn1)+$
```
3) It also works if you have the original regex, but replace “r” with “(r)”:
```
^(.*)((r)?n1)+$
```
So, the bug appears to be that EmEditor seems to have a problem applying ‘?’, when it follows a r. It already seems to just remove the “r”, so the original regex ends up being turned into
```
^(.*)(?n1)+$
```
which doesn’t work.
September 22, 2011 at 11:37 pm #9663
CrashNBurn
Member
The (? syntax is something entirely different.
Example, given the following Text
This is a test sentence
This is a test sentence
of a regex string
of a regex string
Try this regex:
Search: ^(.*)(?|(a test )|(a regex ))(.*)$
Replace: 3
With the (?| syntax, there will only be 3 pieces, since the section inside (?|…) isn’t counted as a bracket set.
For the first line:
1: This is_
2: a test_
3: sentence
The lines will be replaced with “sentence” and “string”.
Now without the (?| syntax, eg:
Search: ^(.*)((a test )|(a regex ))(.*)$ that regex would contain 5 pieces that would be matched with 1 thru 5, and some of them would be empty, depending on the line.
For the first line:
1: This is_
2: a test_
3: a test_
4:
5: sentence
The (? syntax becomes important if you utilize more complex regexes due to the nature of regex to match everything it absolutely can when things like “.*” are present.
Example 2A:
Search: ^(.*)( is a | a regex )?
Replace: 1_
That regex will match the whole line, and the replace wont change anything — as the .* consumes the whole line, since the part in (brackets)? is optional, the greedy regex wont see it.
Example 2B:
Search: ^(.*)(?|( is a )|( a regex ))
Replace: 1_
This regex will actually remove ” is a ” and ” a regex ” from the lines in question.
A lot of editors don’t even implement the full regex spec, e.g. Notepad2 does a small subset which doesn’t even handle the optional “?” correctly, and certainly doesn’t support the (?| syntax.
Author
Posts

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.