Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #20980
    LifeTimer
    Participant

    While trying to search for “lines not beginning with a tab, and also followed by an empty line”, I found some strange behavior in the EmEditor Find function that clearly seems like bugs to me, as follows:

    The following regular expression works as expected (finds and colors green all lines that do not begin with a tab):

    ^[^\t].*

    As soon as I add a linebreak though, the green-coloring of search results stops working completely. The actual finding of the lines still works though (the lines are selected in grey and jumped to correctly when pressing “Find next”):

    ^[^\t].*\n

    If adding another linebreak to the regular expression though, both the green-coloring and the correct finding of the lines stop working completely though, and seemingly completely random lines are identified as “found” (I tried bookmarking the found lines from the Find dialog, since this was the only way of identifying them at all with the highlighting not working correctly):

    ^[^\t].*\n\n

    I am not using any special search settings, and all checkboxes are unchecked in the “Advanced…” dialog of the Find dialog. Also I tried using both regular expression engines without any difference.

    ADDITION:
    I just noticed that the inability to find the correct lines happens only when the filter function is used (typically when there is really a line following the seemingly single lines, but it has been filtered away), but the lack-of-green-coloring bug happens no matter if there is a filter applied or not, so it’s a separate bug.

    Here is a test case:

    Try to use the regular expression ^[^\t].*\n\n on the following, without any filtering, and you will see that the green-coloring of search results will be missing:

    aaa

    bbb

    ccc
    111

    ddd

    eee

    Now, filter away the line containing the “111” (and a tab before it) and try to search again with the same regexp, and you will see that the “ccc” line will still not be matched, although it should be now (since it is also a “single line” after that filtering-away of the subsequent “111” line).

    #20981
    LifeTimer
    Participant

    I just noticed that the “111” line does not display with a TAB before it in the post above, but it has a TAB in front of the “111” in my test data. It does not really matter though, since it still proves that the “ccc” line will still not be “found” after the “111” line has been filtered away.

    #20987
    Yutaka Emura
    Keymaster

    Hello,

    EmEditor will not highlight (green) if the highlight word contains a newline. The Filter feature will not search for a newline.

    I hope this clarifies your question. Please let me know if you can’t really find any regular expressions.

    Thanks,

    #20991
    LifeTimer
    Participant

    Thanks for the reply about the green-coloring! May I ask though what the purpose is for skipping the green-coloring if a search match contains a newline? Aside from being confusing in general with such an inconsistency, it’s actually even more useful to have the search-matches colored when using a (multi-line or non-multi-line) regexp than when searching for static strings (since the search results can be harder to anticipate with a regexp). Would you therefore consider implementing this coloring also in cases when newlines are included in the search-results?

    Regarding your reply to the second issue though (“The Filter feature will not search for a newline”), I think you may have misunderstood my question? I’m not using the filter to for this search, I’m only filtering out some rows first and then use the regular search feature in EmEditor on the resulting remaining data, which then fails as described above. It therefore still seems to me that this is a (quite serious) bug, i.e. that multi-line searches fail when performing them on filtered data?

    #20992
    Yutaka Emura
    Keymaster

    Hello,

    In order to highlight multi-line search terms, EmEditor will need additional flags for each line, which increases memory usage. It also increases the computation time, and slows down the EmEditor speed. Therefore, EmEditor does not highlight multi-line search terms, and it has been this way for many years.

    I understand the filtering out means. Thanks for clarification.

    #20995
    LifeTimer
    Participant

    Ok, thanks, I understand about the green coloring then (although wouldn’t single-line matches take up much memory too if there are just more of them, i.e. would the memory problem really be that significant if the number of lines per match is just only somewhat reasonable? – perhaps there could be a max-limit for the number of rows in matches that will be green-colored, for example 5-10 lines per match or something?).

    About the second issue (i.e. the failure to correctly search in filtered data), do you acknowledge this as a bug that will be fixed then you mean?

    #20996
    Yutaka Emura
    Keymaster

    Yes, I try EmEditor use very little memory per line, and one flag per line is very significant increase.

    I am not sure if I understand your question about filtering. If this is a bug, can you please write a simple test case so that I can reproduce the issue.

    Thank you,

    #20998
    LifeTimer
    Participant

    Ok, I will try to make my test case above even more easy then (NOTE: even though the description is a bit lengthy, the example itself is still very simple, the detailed description of it is just for maximum clarity), here:

    Create a document containing the following exact contents:

    Line with empty row after it (1)
    
    Line NOT having an empty row after it
    This line makes the above line not match the regexp
    
    Line with empty row after it (2)
    
    Last line

    Now, perform a normal search in EmEditor (i.e. via the normal “Find” dialogue), using the following search term (which is a regexp that matches all lines that have an empty line right after them):

    ^.*\n\n

    (leave all checkboxes non-checked in the Find dialogue, except the “Use Regular Expressions” checkbox, and also leave all checkboxes non-checked in the “Advanced…” sub-dialogue of the Find dialogue, as is default.

    Now execute the search (i.e. press “Find next”).

    You will now see, that just as expected, the following three lines match the search (i.e. they will be highlighted one at a time when you press the “Find next” button):

    Line with empty row after it (1)

    This line makes the above line not match the regexp

    Line with empty row after it (2)

    So far so good!

    Now, filter away the line “This line makes the above line not match the regexp“, by using the following filter expression in EmEditor (i.e. in the Filter box in the toolbar):

    This line makes the above line not match the regexp

    The remaining displayed contents of the document after this will now be as follows:

    Line with empty row after it (1)
    
    Line NOT having an empty row after it
    
    Line with empty row after it (2)
    
    Last line

    As you see, the line containing the string “Line NOT having an empty row after it” does now indeed HAVE an empty line after it, which makes the stage set for our last step, showing the bug / unexpected behavior:

    Now perform the exact same search in the document as above again.

    Contrary to expectation and correct regexp behavior, only the exact same lines matching the first search will match this second search, even though the line containing “Line NOT having an empty row after it” should now also match the search (again, because it now HAS an empty line after it).

    As a programmer, I understand the problem. You run the regular expression on the entire contents of the original document, even though some of its lines are filtered out from view, and then only display the search matches that are located in lines that are not filtered out of view. BUT, this obviously doesn’t work with regular expression searches involving linebreaks (as in my example above).

    This means that for this to work correctly, you must either allocate a contiguous buffer containing the contents of the current filtered data, to perform regexp searches on (which will of course involve both a memory and a CPU hit for really large files), or make a restriction (or at least clear and visible warning) that regexp searches whose hits (or lookaheads/lookbehinds) involve more than one row cannot be used at all if the contents of the current document are filtered.

    Personally, I would prefer a compromise that makes such regexp searches of filtered data possible for documents that just aren’t really big (say, more than a couple of hundred megabytes or so?), and restrict it only for documents larger than that. (and I would also do the same thing for the green markings of multi-row hits by the way, since the memory usage for an extra flag per row as you mention will only be a problem at all for really big documents there too, which are, after all, not nearly as common as much smaller documents).

    I hope this was clear enough to understand the issue/problem?

    #21005
    Yutaka Emura
    Keymaster

    Hello,

    I understand what you mean. However, the current behavior of EmEditor is correct. The hidden line is only HIDDEN, and not DELETED. Therefore, when you do a search, EmEditor will do the search according to the real text. EmEditor will only skip hidden lines if the search term doesn’t begin with the visible lines, but once the visible portion matches, EmEditor will check the hidden text. If you really want to delete those lines, you should extract filtered text into another document (You can use the Extract All button in the Filter toolbar). The logic is similar to hidden text by Outline Guide. It is also similar to the result of “Compare” commands. After comparison, some lines are separated by blank lines, but the search will not consider them real blank lines.

    If this contradicts any other behaviors of EmEditor, please let me know.

    Thank you,

    #21021
    LifeTimer
    Participant

    Ok, I see, thanks for your reply.

    It would be very useful with the possibility to perform multi-line searches also on filtered text contents like this though, so in that case I would very much like to suggest such a feature to be added to EmEditor, for example called “Search Filtered”.

    I understand that it would require that the filtered contents are first copied to their own memory buffer before they can be regexp-searched (i.e. an initial CPU + memory hit for each such search operation), but that would absolutely be worth it (and as long as it’s not a giant file, it would also practically be negligible to begin with), but that wouldn’t be a problem at all as long as the user knows this and has chosen this special kind of search to begin with.

    Any chance you can add such a feature?

    The filtering capabilities of EmEditor are extremely useful, but being unable to search the filtered data all-out like this unfortunately reduces the full potential of this otherwise excellent filtering quite a bit for more advanced usage. :-/

Viewing 10 posts - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.