Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • #7612
    Salabim
    Participant

    Hi,

    I have a problem with a macro I created which consists of 1 line :

    document.selection.Replace("(rn|-|,| )(denk|wil|dacht|kan|wilde|ben|zie)je( |,|.|!|?|(r)*n)","12 je3",eeFindNext | eeReplaceAll | eeFindReplaceRegExp);

    … having the line :

    denkje wilje dachtje kanje wildeje zieje

    … I expected it would replace it like this :

    denk je wil je dacht je kan je wilde je zie je

    … but I only replaced the 1st, 3rd, and 5th match, like this :

    denk je wilje dacht je kanje wilde je zieje

    Why doesn’t it replace the 2nd (wilje) the 4th (kanje) and the 6th (zieje) item please ? :-(

    #7623
    Salabim
    Participant

    Really nobody ?

    #7624
    Yutaka Emura
    Keymaster

    Salabim wrote:
    Really nobody ?

    Is it possible to make your sample as simple as possible? That will make it easy to reproduce the issue. Thanks!

    #7625
    Salabim
    Participant

    Hello Yutaka,

    I’m trying to make it as easy as possible, I just want to check for any of these combo’s and put a space between them and je, only if these are surrounded by a space or are at the beginning or end of a line.

    So…. I don’t think I can make it simpler than that. :-D

    document.selection.Replace("(rn| )(denk|wil|dacht|kan|wilde|ben|zie)je(rn| )","12 je3",eeFindNext | eeReplaceAll | eeFindReplaceRegExp);
    #7628
    thr
    Member

    Try this:

    (rn| )(denk|wil|dacht|kan|wilde|ben|zie)je(?=rn| )

    The problem is that your original expression “consumes” both separators, at the beginning of a word and at the end of a word. If the separator at the end of the first word is “consumed” by the expression, it can’t be “consumed” by the expression that should match the second word. Using

    (?=rn| )

    performs a positive lookahead that only matches but doesn’t “consume” anything. You can write this even simpler, without having to use a positive lookahead, by using constraints (in this case beginning of a word and ending of a word):

    <(denk|wil|dacht|kan|wilde|ben|zie)je>
    #7629
    Salabim
    Participant

    Aaaaagh! Exactly, shame on me that I didn’t think about that.
    After all, I’m a real newbie regarding regexp’s,
    Only problem, I use bot the first (newline OR space) and the last (newline OR space) in the replacement as 1 and 3 , that means the ?= will NOT assign the following newline or space to 3 ?

    so with the replacement line of 12 je3 3 will be empty even if there IS a space or newline after “je”, am I right ?

    Thanks for the help!

    #7630
    thr
    Member

    Yes, a positive lookahead only matches, it doesn’t “consume” anything. But you can simplify your whole construct like so:

    document.selection.Replace("(<(denk|wil|dacht|kan|wilde|ben|zie)je>","1 je",eeFindNext | eeReplaceAll | eeFindReplaceRegExp);

    That way you don’t need to pay attention to matching the word separators at all. That gives you a lot more leeway. For example, you can then have something like

    denkje, wilje;dachtje-kanje
    wildeje/zieje

    and it will still work as intended, while keeping all word separators intact.

    Even more simpler with the word shorthand class:

    document.selection.Replace("(<(w+)je>","1 je",eeFindNext | eeReplaceAll | eeFindReplaceRegExp);
    #7631
    Salabim
    Participant

    Thank you very much for the detailed explanation thr !
    I owe you one ! :-)

    #7633
    Salabim
    Participant

    EDIT:

    Second line error’ed out, missing ).
    Even when corrected, doesn’t do anything.

    First line did it completely wrong also, even when I corrected thr’s suggestions to have double backslashes, and correct ().

    It’s not the aspect of simplifying things for example to match a whole word followed by [je], I need to use the words in my top post matched, since those MAY NEVER be followed by [je], only when there’s a space in between.
    Other words can be followed directly by je, since I’m correcting Dutch texts, and there je at the end of a word means “little” in English.

    So, these words translated from Dutch to English.
    Pak = Box
    Pakje = Little box
    Kind = Child
    Kindje = Little child
    Hond = Dog
    Hondje = Little dog
    … but
    Denk = Think
    Wil = Want
    Dacht = Thought
    …. so above cannot be followed by [je] since that is not correct, there doesn’t exist any little think, or little want, or little thought.

    Please, can anyone help me out ? :-(

    P.S. Sorry for my bad English.

    #7634
    Salabim
    Participant

    Sorry if this all sounds so difficult, the biggest problem is my little knowledge of English to explain the problem
    properly, and of course I’m far from a pro into brewing regular expressions.

    Perhaps, I can give some kind of English example that gives an
    idea of what I mean.

    So, In English, correcting the following line(s) properly.
    Is that possible to properly correct with 1 single line of regexp ?

    English example :
    I wantto lookto the car, while doing a salto justto celebrate how great is it.

    with the example above, you cannot simple say : Look for any word that ends with to and put
    a space inbetween it, because then the word salto would improperly been turn apart. And of course,
    checking that the words to search for, are indeed surrounded by a space or a comma, or a newline, is not possible too,
    because in the English example I gave the space after wantto is being captured, so the regexp doesn’t correct
    the lookto right after it, because there it doesn’t see the preceding space
    because it is handled by the match before it.

    So, what can I do, I guess I can only write all my regexp’s double, or is there any
    special switch to make it work correctly with 1 line of regexp ?

    EDIT: before you say look to is not correct and has to be look at, I know ! :-D it’s
    only just to give an example.

    #7635
    thr
    Member

    Yes, there were some typos in my posts. Also, I only tested the regular expressions through the Replace dialog. But I just created a macro containing this:

    document.selection.Replace("<(denk|wil|dacht|kan|wilde|ben|zie)je>","1 je",eeFindNext | eeReplaceAll | eeFindReplaceRegExp);

    Here at least, it works as expected.

    #7637
    Salabim
    Participant

    It works PERFECTLY thr !

    Thank you so much !!!! :-)

    #7813
    TimGreen
    Member

    By the way: For anyone working with complex RegExes, Regex Buddy is an absolute must. It’s worth many times the $30 or so that it costs and comes with the best tutorial for RegEx that I’ve ever seen. It has fantastic tools to help you build, test and debug RegExes and apply them in most common languages and environments. I use it all the time for building the RegExes that I use in EmEditor and elsewhere and I learn something new every time I use it.

    http://www.regexbuddy.com

    I don’t have anything to do with the company that produces it, I’m just a very happy customer… :-D

Viewing 13 posts - 1 through 13 (of 13 total)
  • You must be logged in to reply to this topic.