EmEditor (Text Editor)

July 27, 2016 at 2:17 pm #20998

Participant

Ok, I will try to make my test case above even more easy then (NOTE: even though the description is a bit lengthy, the example itself is still very simple, the detailed description of it is just for maximum clarity), here:

Create a document containing the following exact contents:

Line with empty row after it (1)

Line NOT having an empty row after it
This line makes the above line not match the regexp

Line with empty row after it (2)

Last line

Now, perform a normal search in EmEditor (i.e. via the normal “Find” dialogue), using the following search term (which is a regexp that matches all lines that have an empty line right after them):

^.*\n\n

(leave all checkboxes non-checked in the Find dialogue, except the “Use Regular Expressions” checkbox, and also leave all checkboxes non-checked in the “Advanced…” sub-dialogue of the Find dialogue, as is default.

Now execute the search (i.e. press “Find next”).

You will now see, that just as expected, the following three lines match the search (i.e. they will be highlighted one at a time when you press the “Find next” button):

Line with empty row after it (1)

This line makes the above line not match the regexp

Line with empty row after it (2)

So far so good!

Now, filter away the line “This line makes the above line not match the regexp“, by using the following filter expression in EmEditor (i.e. in the Filter box in the toolbar):

This line makes the above line not match the regexp

The remaining displayed contents of the document after this will now be as follows:

Line with empty row after it (1)

Line NOT having an empty row after it

Line with empty row after it (2)

Last line

As you see, the line containing the string “Line NOT having an empty row after it” does now indeed HAVE an empty line after it, which makes the stage set for our last step, showing the bug / unexpected behavior:

Now perform the exact same search in the document as above again.

Contrary to expectation and correct regexp behavior, only the exact same lines matching the first search will match this second search, even though the line containing “Line NOT having an empty row after it” should now also match the search (again, because it now HAS an empty line after it).

As a programmer, I understand the problem. You run the regular expression on the entire contents of the original document, even though some of its lines are filtered out from view, and then only display the search matches that are located in lines that are not filtered out of view. BUT, this obviously doesn’t work with regular expression searches involving linebreaks (as in my example above).

This means that for this to work correctly, you must either allocate a contiguous buffer containing the contents of the current filtered data, to perform regexp searches on (which will of course involve both a memory and a CPU hit for really large files), or make a restriction (or at least clear and visible warning) that regexp searches whose hits (or lookaheads/lookbehinds) involve more than one row cannot be used at all if the contents of the current document are filtered.

Personally, I would prefer a compromise that makes such regexp searches of filtered data possible for documents that just aren’t really big (say, more than a couple of hundred megabytes or so?), and restrict it only for documents larger than that. (and I would also do the same thing for the green markings of multi-row hits by the way, since the memory usage for an extra flag per row as you mention will only be a problem at all for really big documents there too, which are, after all, not nearly as common as much smaller documents).

I hope this was clear enough to understand the issue/problem?

Reply To: Bug in regular expression find, both in green-coloring and actual finding