Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #5068
    jugaor
    Participant

    Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):

    a por (eeFindReplaceOnlyWord)
    matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)

    any accented vowel (eeFindReplaceOnlyWord)
    matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)

    In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)

    (?!es |son )esta(s?)(!|?)
    discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options

    If I simplify the expression
    (?!es) esta(!|?)
    (?!es )esta(!|?)
    or
    (?!son) estas(!|?)
    (?!son )estas(!|?)

    it has the same behavior. However,
    (¡|¿)esta(s?)(?! es| son)
    excepts the correct ones.

    If you need more information, please email-me.
    TIA.
    jugaor

    #5071
    Yutaka Emura
    Keymaster

    jugaor wrote:
    Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):

    a por (eeFindReplaceOnlyWord)
    matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)

    any accented vowel (eeFindReplaceOnlyWord)
    matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)

    In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)

    (?!es |son )esta(s?)(!|?)
    discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options

    If I simplify the expression
    (?!es) esta(!|?)
    (?!es )esta(!|?)
    or
    (?!son) estas(!|?)
    (?!son )estas(!|?)

    it has the same behavior. However,
    (¡|¿)esta(s?)(?! es| son)
    excepts the correct ones.

    If you need more information, please email-me.
    TIA.
    jugaor

    As far as your first question is concerned, EmEditor did not try to check unicode characters (character code > U+0080) in previous versions for the speed. However, I will add a routine to check some Latin character (ch >= 0x00c0 && ch <= 0x02b8) in the next beta version. This addition will not cover all the Unicode characters but still improve "whole word" accuracy in most cases while not sacrificing much speed.

    I was not sure about your latter question, but there are two unnecessary spaces in your regular expression: (?!es |son )esta(s?)(!|?)

    One between “s” and “|”, and the other between ‘n’ and ‘)’.

    Removing these spaces does not solve your issue?

    #5074
    jugaor
    Participant

    Hi, thank you very much for your response.

    1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)

    2. The spaces are needed, since they’re two whole words:
    “esta” = “this” / “estas” = “these”, both feminine.
    “es” = “is” (singular, verb to be)
    “son” = “are” (plural, verb to be)
    The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).

    I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.

    Thanks for your attention,
    jugaor

    PS: please, write me when the new beta is ready :)

    #5077
    Yutaka Emura
    Keymaster

    jugaor wrote:
    Hi, thank you very much for your response.

    1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)

    2. The spaces are needed, since they’re two whole words:
    “esta” = “this” / “estas” = “these”, both feminine.
    “es” = “is” (singular, verb to be)
    “son” = “are” (plural, verb to be)
    The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).

    I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.

    Thanks for your attention,
    jugaor

    PS: please, write me when the new beta is ready :)

    (?=pattern) (positive lookahead search) and (?!pattern) (negative lookahead search) look ahead from the position where search begins.

    For example, expression “(?=x)x” always matches, and expression “(?!x)x” never matches.

    So it doesn’t make sense to place (?=pattern) or (?!pattern) at the beginning of a search term.

    I will release beta 41 today or tomorrow.

    #5080
    jugaor
    Participant

    THANK YOU VERY MUCH! I tried the 41 beta and the ‘special chars’ issue is gone! :D
    Also, I saw that I misunderstood the “look ahead” expression :-?
    I needed to use the “look behind” one (?<!pattern). Excuse me!

    Congratulations for your excellent job!
    jugaor

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.