Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #9594
    Deipotent
    Participant

    Further to my post on a regular expression to delete duplicate lines, and your reply that r is ignored, it would be useful if EmEditor stripped out r from regexes. Obviously this is a bit more complicated that just removing “r” since the r might be in parentheses or referred to afterwards (as is the case in the linked regex).

    http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=3&forum=3&post_id=5812#forumpost5812

    Also, how come a search on with a regex of “r” will match every character ?

    #9599
    Stefan
    Participant

    > “it would be useful if EmEditor stripped out r from regexes”

    I think that would be too smart and unexpected for many.
    The user just have to learn how it works.
    If not already, there should be an explanation incl. example in the help, that should be enough.

    #9602
    Deipotent
    Participant

    I don’t mean strip it out of the edit box, I mean just strip it out of the regex that EmEditor will use internally – so the user wouldn’t know any difference, but these valid regex will work.

    At the moment, some valid regex are reported as incorrect by EmEditor (see link to my other post for an example of finding duplicates)

    #9603
    Stefan
    Participant

    Deipotent wrote:
    I don’t mean strip it out of the edit box, I mean just strip it out of the regex

    Yes, basically the very same.
    I (the user) try to do smtg, and the app do smtg other in the back.
    And i (the user) get the impression that i did it right, what is not the truth.

    I know that that is common (see IE which autom. add http:// in front of an URL, or how Win7 lies with UAC virtualizing)
    but i did not want to see that more often.

    #9604
    Deipotent
    Participant

    And i (the user) get the impression that i did it right, what is not the truth.

    But you would have done it right, with a valid regex. Currently, EmEditor errors out when some valid regex are used.

    I think it’s preferable to make EmEditor adhere to standards, rather than making people who are familiar with regex’s modify them to just to get them to work with EmEditor.

    To use your analogy, EmEditor currently does it like IE of old, where people have to put in workarounds to get a regex to work. My proposal is to make EmEditor more like the latest versions of IE which are more standards compliant, and so work with valid regex’s.

    #9605
    Deipotent
    Participant

    After thinking about this again, I re-looked at the regex in my linked post for finding duplicate lines, and realised that there appears to a bug in EmEditor’s regex parser.

    My suggestion for ignoring r already appears to be implemented. Let me give some examples to explain:

    1) Even if EmEditor didn’t ignore r, the following regex should work as the part, “r?”, is saying match zero or one occurrences of r. So it should match whether there is a r or not. But it doesn’t.

    ^(.*)(r?n1)+$

    2) However, if you remove the ‘?’, so the regex becomes as follows, it works, and matches duplicate lines:

    ^(.*)(rn1)+$

    3) It also works if you have the original regex, but replace “r” with “(r)”:

    ^(.*)((r)?n1)+$

    So, the bug appears to be that EmEditor seems to have a problem applying ‘?’, when it follows a r. It already seems to just remove the “r”, so the original regex ends up being turned into

    ^(.*)(?n1)+$

    which doesn’t work.

    #9663
    CrashNBurn
    Member

    The (? syntax is something entirely different.

    Example, given the following Text

    This is a test sentence
    This is a test sentence
    of a regex string
    of a regex string

    Try this regex:
    Search: ^(.*)(?|(a test )|(a regex ))(.*)$
    Replace: 3

    With the (?| syntax, there will only be 3 pieces, since the section inside (?|…) isn’t counted as a bracket set.
    For the first line:
    1: This is_
    2: a test_
    3: sentence

    The lines will be replaced with “sentence” and “string”.

    Now without the (?| syntax, eg:
    Search: ^(.*)((a test )|(a regex ))(.*)$ that regex would contain 5 pieces that would be matched with 1 thru 5, and some of them would be empty, depending on the line.

    For the first line:
    1: This is_
    2: a test_
    3: a test_
    4:
    5: sentence

    The (? syntax becomes important if you utilize more complex regexes due to the nature of regex to match everything it absolutely can when things like “.*” are present.

    Example 2A:
    Search: ^(.*)( is a | a regex )?
    Replace: 1_

    That regex will match the whole line, and the replace wont change anything — as the .* consumes the whole line, since the part in (brackets)? is optional, the greedy regex wont see it.

    Example 2B:
    Search: ^(.*)(?|( is a )|( a regex ))
    Replace: 1_

    This regex will actually remove ” is a ” and ” a regex ” from the lines in question.

    A lot of editors don’t even implement the full regex spec, e.g. Notepad2 does a small subset which doesn’t even handle the optional “?” correctly, and certainly doesn’t support the (?| syntax.

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.