Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #30329
    Patrick C
    Participant

    I’ve posted this before, but it might have been overlooked as it was in a mixed up context.

    The syntax highlighting issue arises when a regex uses quantifiers, i.e. * or + or {3,} etc.

    Example:
    Two regex matches are active, all string and comment matches are disabled.

    ″.*?″ regex match 1/2
    %%.*$ regex match 2/2

    Issue:
    EmEditor stops matching ″.*?″ as soon as it encounters %%.*$, even though the regex match for ″.*?″ has not yet been completed.
    Syntax highlighter overlap conflict

    Would this be difficult to fix?
    Because it is a serious limitation. One could write significantly more accurate syntax highlighters than what is currently possible.

    #30330
    Yutaka Emura
    Keymaster

    Your observation is correct: highlighting with regex allows overlapping.

    #30338
    Patrick C
    Participant

    Is there any way to work around this?
    Because I often fall foul of it.

    #30339
    Yutaka Emura
    Keymaster

    If %% always appears at the beginning of a line, you can use:

    ^%%.*$

    Otherwise, I don’t think there’s a good way to write a regex for this case. I’ll consider adding an option to control how regex is applied in situations like this.

    #30340
    Patrick C
    Participant

    The %% is just for the sake of illustration.

    I came up with the idea of using regex expressions for highlighting from other highlighters.

    In the case of highlighting javascript these are:
    [1] https://github.com/pygments/pygments/blob/master/pygments/lexers/javascript.py
    [2] https://github.com/speed-highlight/core/blob/main/src/languages/js.js
    and several others.

    To highlight javascript regex literals, /…/ one can, for example use (from [1]):
    \/((?!\/)[^\r\n\\]|\\.)+\/[dgimsuy]*
    And to highlight template literals ‵…‵
    ‵(?:(?!‵|${).)*?(?:‵|\${)
    }(?:(?!‵|${).)*?(?:‵|\${))

    The problem is that these two interfere:
    Template literal in regex literal falsely highlighted

    I’ve written several variable length regex based highlighters for Python, Javascript, PowerShell and more. These work well, but only up to the point where they don’t overlap with another variable length regex highlight definition.

    I’ll consider adding an option to control how regex is applied in situations like this.

    This would be awesome 😃.
    I realise that I’m just one customer, so please first focus on what’s most important for EmEditor rather than my request. Should you find the time, then I’ll greatly appreciate the effort.
    Thank you, Yutaka!

    #30348
    Yutaka Emura
    Keymaster

    EmEditor v25.3 preview 1 (25.2.901) introduces experimental support for using special keywords in regex highlight strings to set additional conditions.

    (?#_text_c==n) applies the highlight only if the text at the start of the match is already using color ‘n’.
    (?#_text_c!=n) applies the highlight only if the text at the start of the match is *not* using color ‘n’.

    Here, n is an integer representing a specific color, as defined in plugin.h. For example:

    
    #define SMART_COLOR_NORMAL          0
    #define SMART_COLOR_HILITE_4        17
    // ...and so on
    

    To set a condition, one of these keywords must appear at the very beginning of your highlight string.

    For instance, instead of just writing %%.*$, you could use (?#_text_c!=17)%%.*$ to highlight lines starting with %% only if they don’t already use the color corresponding to n = 17 (SMART_COLOR_HILITE_4).

    Hope this helps clarify how to use this new feature!

    #30350
    Patrick C
    Participant

    Wow 😃
    I’ll test this tomorrow Friday and will give feedback.
    Thank you very much Yutaka!

    #30351
    Patrick C
    Participant

    Essentially I need the following (using the regex matching example at the top):

    (?#_text_c==0)″.*?″ 
    (?#_text_c==0)%%.*$

    I.e. apply the highlight only when the start of the match uses
    SMART_COLOR_NORMAL = 0

    This works fantastically well, but only on odd lines:
    Highlight results
    → Line 1, 3, 5 and 7 are correct.

    My test file is UTF-8 with LF as line terminator (no CR).

    I cannot thank you enough for taking time for this!

    #30352
    Yutaka Emura
    Keymaster

    I’ve fixed this issue on preview 3 (25.2.903).

    #30353
    Patrick C
    Participant

    Thank you! 🙏
    I’ve just adapted my JavaScript highlighter template and the results are fantastic:
    Javascript highlighter example
    String and regex literals now render really well 😃.
    With respect to single line highlighting EmEditor now is perfect for my needs.

    The only thing I do not have a solution for is multiline matching.
    As an example: For the JavaScript multiline comment /*…*/ one could use the regex
    (?#_text_c==0)\/\*.*?\*\/ with the /s flag:
    Multiline example

    Should something like a /s flag or a directive
    #Keyword color=10, …, regexp=on, multiline=on
    be possible, then EmEditor’s highlighter would be one of the best I’ve ever seen.

    #30365
    Yutaka Emura
    Keymaster

    While we’re on the subject of the (?#_text_c==n) and (?#_text_c!=n) syntax, are there any known issues with it?

    #30366
    Patrick C
    Participant

    Yes, thank you for asking!
    While the formatting is a lot better, there is the following shortcoming:

    Rather than not applying a rule,
    (?^#_text_c==0) only postpones a rule’s formatting until c==0.
    This can lead to incorrect formatting.

    Example case (simplified regex):
    Rule 1) Format javascript strings (?^#_text_c==0)".*?"
    and
    Rule 2) Format javascript template literals (?^#_text_c==0)\/.+?\/

    Example

    On line 2:
    The formatting between the «; "» is incorrect.
    And rule 1 is not applied to «"a string"»

    If it were possible to set (?^#_text_c==0) to ignore (i.e. not apply) the rule rather than just postpone its formatting, then this shortcoming would be solved.

    #30367
    Yutaka Emura
    Keymaster

    If (?^#_text_c==0) were ignored, the literal “a string” would also be ignored. One fix is to use the regex (?#_text_c==0)"[^/]*?".

    #30370
    Patrick C
    Participant

    (?#_text_c==0)"[^/]*?"
    Isn’t exactly a fix as it makes it impossible for the string to contain a /.

    Side note:
    The regex I use in the example are intentionally simplified for the sake of illustration.
    The regex to match a string actually is (?#_text_c==0)".*?(?<!\\)", which in the example is simplified to (?#_text_c==0)”.*?”.

Viewing 14 posts - 1 through 14 (of 14 total)
  • You must be logged in to reply to this topic.