Viewing 25 posts - 1 through 25 (of 30 total)
  • Author
    Posts
  • #27654
    spiros
    Participant

    Would that be possible? It would be very useful in a great number of languages using diacritics and there are free libraries available to do the job like Lucene.

    #27661
    Yutaka Emura
    Keymaster

    Can you write compelling examples?

    #27684
    spiros
    Participant

    There is a great number of languages using diacritics web.library.yale.edu/cataloging/music/diacrit
    For example a simple phrase in French: l’été arrive à la fin. If one does not use the diacritics (or the correct diacritics) nothing will be found.
    Or in polytonic Greek: ὃν οἱ θεοὶ φιλοῦσιν ἀποθνῄσκει νέος

    Some software already does that, dtSearch (desktop search tool).

    It could be added as an extra toggle button like the “W” (for whole word search).

    #27683
    spiros
    Participant

    There is a great number of languages using diacritics: https://web.library.yale.edu/cataloging/music/diacrit
    For example a simple phrase in French: l’été arrive à la fin. If one does not use the diacritics (or the correct diacritics) nothing will be found.
    Or in polytonic Greek: ὃν οἱ θεοὶ φιλοῦσιν ἀποθνῄσκει νέος

    Some software already does that, dtSearch (desktop search tool).

    #28347
    Yutaka Emura
    Keymaster

    Hello,

    v21.9.903+ includes the Fuzzy Matching features. Please try it and let me know if you have any inputs.

    https://www.emeditor.com/forums/topic/emeditor-v22-0-beta-21-9-901/

    In the Find dialog box, set the Fuzzy Matching option, click on the button next to the option to display the Fuzzy Matching Options dialog box, and set the Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten option.

    #28369
    spiros
    Participant

    I just saw this, I can confirm that it works great for Greek and Ancient Greek. Thank you so much, this is an amazing feature for me.

    #28370
    spiros
    Participant

    The only issue I can see so far is speed. Not sure if it could be made faster. I guess some sort of complex regex runs in the background.

    #28371
    Yutaka Emura
    Keymaster

    Did you set the Similarity of 100%, and only set the Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten option?

    If so, and if it’s still slow, please write your test condition.

    #28372
    spiros
    Participant

    Similarity was left at the default, now that I changed to 100% it is much faster. I guess the same settings are applied at the filter level?

    #28373
    Yutaka Emura
    Keymaster

    Same settings of fuzzy options are applied to the filter and all other features.

    #28375
    Yutaka Emura
    Keymaster

    I think the Fuzzy matching feature is more stable now. Please try the latest beta version, and let me know if you have any other comments before releasing the official version.

    #28376
    spiros
    Participant

    Emura-san, when I read this reply I immediately thought of an option to add characters which will be considered equal to another character. After install, I see that something like this you have already done. Congratulations! Truly amazing feature and I think it is unique in text editors, you should point it out in a advertizing material.

    I am not clear yet if I can do what I started to describe. For example, when I search and the search string contains the character “κ” I also want it to look for matches for character “χ”. I.e. if I search for οικιστας I also want it to find οἰχιστάς (when diacritics-insensitive is on apparently). How can I add this condition? In the Add dialog I see minimum and maximum character and when I add something in the Treat as, it is copied automatically in Ignore.

    Another point, it seems that when one selects options at regex level, these do not apply to filter. Not sure what is the best way here, but it would be good if there was an option to link those two (I.e. changing options in regex-level, automatically applies same options in filter and vice versa).

    #28377
    Yutaka Emura
    Keymaster

    If you want to treat κ as χ, click Add in the Fuzzy Matching Options dialog box, enter κ to both the Minimum character and Maximum character text boxes, and enter χ to the Treat as text box.

    I am not sure what you mean by regex-level, but if you mean the options you can set in the Advanced dialog box (such as Regular Expressions “.” Can Match Newline Characters option), you can’t just copy these options between Find and Filter. However, you can save all these options to macros except for the Additional Lines to Search for Regular Expressions option.

    #28378
    spiros
    Participant

    1. Great, got it. Perhaps it could be made more intuitive or some context-sensitive help added? Also, when adding the character, in order to be entered as a fuzzy match condition, one needs to click the OK button of the Character range dialog. Perhaps, an extra button near the “Treat as” like Add or Save?

    2. I mean that once making Fuzzy match settings in Find/Replace dialog, these did not seem to apply in Filter. So one had to click the Fuzzy button in Filter to change them. Now that I check again, it seems that they are actually the same. So problem solved.

    #28379
    Yutaka Emura
    Keymaster

    1. Great, got it. Perhaps it could be made more intuitive or some context-sensitive help added? Also, when adding the character, in order to be entered as a fuzzy match condition, one needs to click the OK button of the Character range dialog. Perhaps, an extra button near the “Treat as” like Add or Save?

    I am not sure if I understand correctly, but I don’t want to put an extra OK button to the Character range dialog box. There is OK button already on the right top corner, and its design is similar to many other dialog boxes.

    #28383
    tuska
    Participant

    Hi,
    Fuzzy Matching feature

    Text:	_Similitary Search_Ähnlichkeitssuche
    Filter: Ahn
    Result: Text will be hidden

    Click on button “Fuzzy Matching” in
    – emed64_21.9.901: Text is displayed and “Ähn” (without quotes) is marked
    – emed64_21.9.912: Text remains hidden

    German umlauts and special characters (ä,ö,ü,Ä,Ö,Ü,ß)
    http://www.cp1252.com/
    Ä Unicode 00C4 LATIN CAPITAL LETTER A WITH DIAERESIS

    Please check.
    Thanks!

    #28384
    spiros
    Participant

    I think we should make a note that ß should be considered equivalent to ss when diacritics-insensitive option is checked.

    #28385
    tuska
    Participant

    2Spiros
    Thank you for this addition!

    I can’t find this option/setting: “diacritics-insensitive”.
    Can you please tell me where I can find it?

    #28386
    spiros
    Participant

    Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten

    #28387
    tuska
    Participant

    2spiros

    Thank you!

    I had set my window from the “Fuzzy Matching” button with the “Fuzzy Matching Options” too small. :(

    #28388
    tuska
    Participant

    Fuzzy Matching Options:
    ✅ Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten

    Text
    a o u A O U
    ä ö ü Ä Ö Ü
    ß
    Filter  Finds
    a	a, A, ä, Ä
    o	o, O, ö, Ö
    u	u, U, ü, Ü
    A	-  -  -  -   <--
    O	-  -  -  -   <--
    U	-  -  -  -   <--
    ß	ß
    ss	-
    Find	Finds
    a	a, A, ä, Ä
    o	o, O, ö, Ö
    u	u, U, ü, Ü
    A	a, A, ä, Ä
    O	o, O, ö, Ö
    U	u, U, ü, Ü
    ß	ß
    ss	-
    #28389
    tuska
    Participant

    German umlauts and special characters (ä,ö,ü,Ä,Ö,Ü,ß)
    – Addition –

    Text	Fuzzy Matching
    ä	ae, Ae
    Ä	ae, Ae
    ö	oe, Oe
    Ö	oe, Oe
    ü	ue, Ue
    Ü	ue, Ue
    
    ß	ss	(already mentioned above by user spiros)
    #28393
    tuska
    Participant

    2Yutaka Emura

    https://www.emeditor.com/forums/topic/diacritics-insensitive-search-filtering/#post-28388

    Filter  Finds
    A	-  -  -  -   <--
    O	-  -  -  -   <--
    U	-  -  -  -   <--

    This is fixed in beta 13 (21.9.913) – September 29, 2022 …

    A       a  A  ä  Ä
    O       o  O  ö  Ö
    U       u  U  ü  Ü

    Thank you!

    Btw, ß (ss), ä (aede, deae, deaede, Aede, deAe, deAEde),etc.
    is it currently ONLY allowed to set A SINGLE CHARACTER in the “Treat as” field?
    If so, could you perhaps also allow multiple characters here? That would be very helpful!

    #28394
    spiros
    Participant

    I think that a good way of handling these is providing an extra option for German variants as it is German-specific (and strictly speaking these are not versions of the same character with or without diacritics).

    ä ae, Ae
    Ä ae, Ae
    ö oe, Oe
    Ö oe, Oe
    ü ue, Ue
    Ü ue, Ue
    ß ss

    #28395
    Yutaka Emura
    Keymaster

    v21.9.914 allows you to specify a string as well as a character range.
    Thanks,

Viewing 25 posts - 1 through 25 (of 30 total)
  • You must be logged in to reply to this topic.