Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #22678

    sky
    Participant

    Test file a: 2,000,000 lines of “ab”, Enable regular expressions.
    Replace All “b\n” with “” (empty), it just take around 3 seconds, no matter use Onigmo or Boost.Regex.

    Test file b: 4,000,000 lines of “ab”, Enable regular expressions.
    Replace All “b\n” with “” (empty), it will take around 30 seconds with Onigmo, and around 22 seconds with Boost.Regex.
    The file b is two times bigger, but spent about 7~10 times time for replacement.

    Look at the Replacing messagebox, it seems the progress start from 2,000,000 lines (buffer ?),
    If the file line numbers smaller than 2,000,000 lines, the replacement process will be finished within few seconds.
    Therefore, suggest to increase the replacement buffer size if possible, as “Remove Newline Characters” command did. Thanks.

    Additional test for reference.
    Test file b: 4,000,000 lines of “ab”, select all, and then use Remove Newline Characters command.
    The progress start from 3,999,999 lines, so it can be finished around 3 seconds as well.

    Test file c: 10,000,000 lines of “ab”, select all, and then use Remove Newline Characters command.
    The progress start from 9,999,999 lines, it finished around 7 seconds only.

    #22681

    Yutaka Emura
    Keymaster

    That is probably related to the settings in the Advanced page of the Customize dialog box.
    You might want to increase the number in the Minimum File Size to Use Temporary File text box so that your file is under this number if you need to do many replaces. This will avoid using the temporary file, and will load all the file contents to the memory (RAM) so that EmEditor will replace many lines faster.
    Please also use the latest version of EmEditor.

    #22686

    sky
    Participant

    Test file: 4,000,000 lines of “ab”, File Size is 16MB, tested on EmEditor v17.3.0 beta 1.

    I have tried to increase the number in the Minimum File Size to Use Temporary File text box to 3,000 MB.
    And also increase the number in the Memory Size text box to 26.
    mem

    The result is the same, the first 9%(1,800,000 lines) is finished within 1 second, almost immediately.
    but the rest of 91%(2,200,000 lines) spent more than 20 seconds.
    test

    In this simple test case, if use “Remove Newline Characters” command first, and then remove “b” character, will be finished quickly.
    But it is not easy to handle complex replacement with Regexp for some real big files(200~400MB with 4,000,000~20,000,000 lines).
    Therefore I will hope the “Replace” function still can optimize as “Remove Newline Characters” command.
    Thanks.

    #22687

    Yutaka Emura
    Keymaster

    Since you are using a regular expression to replace, it won’t become as fast as the “Remove Newline Characters” command. You might want to try the “Use Escape Sequence” option instead of the “Use Regular Expressions” option. Nevertheless, the “Remove Newline Characters” command is the fastest way among these 3 methods.

    #22772

    sky
    Participant

    Many thanks for the improvements in the New year.

    Test file b again with v17.4.0 beta 1: 4,000,000 lines of “ab”, Enable regular expressions.
    Replace All “b\n” with “” (empty), it just take around 5 seconds, 4 times faster than v17.3.2.

    Test file c again with v17.4.0 beta 1: 10,000,000 lines of “ab”, Enable regular expressions.
    Replace All “b\n” with “” (empty), it just take around 11 seconds, 9 times faster than v17.3.2 !!

    The result is very good~

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.