Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #4957
    tungwaiyip
    Member

    I’m searching for a string among 2500 files spreading over a directory tree. The performance number:

    Another program: 2 seconds
    EmEditor: 9 minutes

    This translates to 4.6 files per second for EmEditor v.s. 1250 files per second for the other program.

    Another interesting observation. There are two types of files, one with .rst extension and one with .html extension. The search seems to go much faster with .html files. It takes 65 second to search through 1000 .html files. This translates to 15 files per second. Not great, but much better than average. The total files size of the .html files actually make up 75\% of the original search.

    #4959
    Yutaka Emura
    Keymaster

    Thanks for descriptions, but it might be more helpful if you can write which encodings those files are written with, and whether you use regular expressions, or escape sequences including new lines. A possible reason for the delay is that EmEditor internally works in Unicode, and it support Unicode-aware regular expressions. You can write exactly what you search for, and what the files are like, and you can email me sample files and more descriptions at [email protected]
    Thanks!

    #4960
    Yutaka Emura
    Keymaster

    Also, make sure you write which options (such as Regular Expressions, Match Catch, etc.) you checked in the Find in Files dialog box. Using different options can make significant difference in search speed. Thanks!

    #4961
    tungwaiyip
    Member

    I’ve figured it out. The problem is in the “Encoding” listbox. The default “Configured Encoding” is the slowest. Choosing anything other than “Configured Encoding”, even if you pick something random like Korean, will give you good performance.

    The other options I have set is “Look in subfolder” and “Use Escape Sequence”. Unfortunately I cannot post my data file. But this is irrelevant. I repeat the search over 1000 files from an open source project and the result is the same.

    #4965
    Yutaka Emura
    Keymaster

    When “Configured Encoding” is selected in “Find in Files” dialog box, EmEditor uses the encoding configured in the File tab of the configuration properties assiciated with the file extension you are searching. In default, Text configuration uses “System Default” encoding with UTF-8 and Unicode signature detection. The UTF-8 detection can make search slower. If the “Detect All” is also checked in the File tab of the configuration properties, it may become even more slower. Please let me know which options are checked in the File tab of associated Configuration Properties.

    Also, in what encoding do those files you search are encoded? UTF-8 or Western European (CP:1252) or any other?

    #4966
    tungwaiyip
    Member

    In the File tab I have these options selected:

    * Prompt if null char found
    * Prompt if invalid char
    * show file name w/full path
    * Detect BOM
    * Detect UTF-8
    * Prompt at inconsistent returns
    * Opening Encoding: UTF-8

    The files I have searched should be all Ascii files.

    Even if EmEditor is doing more detection, it just doesn’t sound right to me that one program can search all files in 2 seconds but it takes EmEditor 9 minutes. Also if I select any encoding, says UTF-8, the performance goes up dramatically to a few seconds.

    Of course now I have a workaround as stated in my previous sentence :-)

    #4967
    Yutaka Emura
    Keymaster

    You should change the Opening Encoding in the File tab of the configuration properties from “UTF-8” to “System Default Encoding”. That should solve this issue.

    #4968
    shaohao
    Member

    You’d better use register mode — save all settings in register, not the INI file mode — save all settings in .ini files.

    The INI file mode will slow down the searching in files.

    #4977
    tungwaiyip
    Member

    This surely is a bug isn’t it? There is no explanation why it would take so long for certain combination of encoding setting.

    Besides I tried you recommendation. It doesn’t help. Picking “utf-8” or any other encoding in the Find in File dialog however does solve the problem.

    Is it true that choosing .ini for configuration slow things down? This is one thing to make me love EmEditor 7.

    #4978
    Yutaka Emura
    Keymaster

    Please try beta 32, and you should find better performance. Thanks!

    #4998
    tungwaiyip
    Member

    Upgraded to b32 and found the same issue.

    #4999
    Yutaka Emura
    Keymaster

    You should have seen at least some improbements. Can you please describe more details that might help me to reproduce your issue? Without descriptions, I won’t be able to reproduce your issue. Thanks!

    #5000
    tungwaiyip
    Member

    The procedure is the same as what has been posted in this thread. I see no difference between b29 and b32.

    One more data point. I’m using the “exported to USB drive” version. If I just run the version installed in “C:program files” it does not seems to have this problem.

    #5001
    Yutaka Emura
    Keymaster

    EmEditor is optimized for the Registry usage, and the USB install will slow down. If the speed is a top priority, please consider using the Registry (without INI files). I will optimize a little more for INI files, but it won’t become as fast as the Registry version.

Viewing 14 posts - 1 through 14 (of 14 total)
  • You must be logged in to reply to this topic.