-
AuthorPosts
-
December 11, 2025 at 9:44 am #30496
Igi Noumi
ParticipantWhere can I find a specialist consultant for the program?
December 12, 2025 at 2:47 am #30497David
ParticipantThat’s a great question. To point you in the most relevant direction, could you share a few more details?
December 14, 2025 at 12:09 pm #30498Igi Noumi
ParticipantAre there Telegram channels or other places where I can ask questions or get advice?
Since there’s not much activity here with answers.
December 14, 2025 at 7:59 pm #30499David
ParticipantAs more and more people use AI to solve practical problems, fewer individuals engage in technical discussions in online communities, tech-focused platforms looking rather quiet. This trend is not unique to this community, but can also be observed across other similar platforms.
In my opinion, this officially provided community is an ideal place to discuss issues related to EmEditor.December 15, 2025 at 1:39 pm #30500Igi Noumi
ParticipantYou can’t find an answer to this question on the internet, only those who have used it can answer
December 15, 2025 at 4:06 pm #30501Igi Noumi
ParticipantFor example, how can I make similar strings categorized so that the column appears clearly and shows the percentage similarity of the words?
So that I can quickly select from 80 to 90 percent similarity.
December 16, 2025 at 6:10 am #30504Patrick C
ParticipantThis sounds like an application for an AI, i.e. EmEditor’s ChatAI plugin.
In principle you can ask the AI to categorise the strings in EmEditor’s current document, add a tab separated column showing the similarity, paste the result into a new EmEditor document and display it in tab separated CSV mode.Should you want to use an AI to perform this task, then my guess is that you might require an AI consultant, rather than an EmEditor consultant.
December 17, 2025 at 4:59 am #30507Igi Noumi
ParticipantI’m asking EI about a macro that would process a document with tens of millions of rows and display the similarity percentage in a separate column. He says it would be very slow using the Levenshtein algorithm.
So I want to know how I can initially filter the database using the program’s powerful tools.
December 17, 2025 at 11:30 am #30508Patrick C
ParticipantSo I want to know how I can initially filter the database using the program’s powerful tools.
Perhaps.
—————————
How do you want to compare the rows?
E.g.
[Case 1] Compare adjacent rows 1,2 3,4 5,6Or
[Case 2] Each row with all other rows?
1 with 2, 3, 4, 5, …
then 2 with 3, 4, 5, …
——————I’m assuming your requirement is closer to [Case 2].
EmEditor can sort by similarity.
What one could do is sort by similarity with EmEditor’s built in algorithm (hopefully fast) and then use Javascript to calculate the similarity between adjacent lines, i.e.
1 with 2,
2 with 3
3 with 4
etc.December 17, 2025 at 2:06 pm #30509Igi Noumi
ParticipantOption 2
But how can I extract the most similar words from the ordered list in this case?
Or do you mean first by matching, and then a macro that will display the similarity percentage in a separate column?
December 17, 2025 at 4:29 pm #30510Patrick C
ParticipantI’ve now spent some time thinking about this problem.
I do have an EmEditor macro that can calculate the Levenshtein distance / similarity between two strings.
Its adapted from
[1] https://github.com/gustf/js-levenshteinThe problem is what you have already stated:
macro that would process a document with tens of millions of rows and display the similarity percentage in a separate column […] would be very slow using the Levenshtein algorithm.
What EmEditor can do is sort all lines by similarity with any given line.
I.e. just one line:
E.g. line 1 with 2, 3, 4, 5, … 1’000’000 (one million lines example)
One could then, for example, write a macro to calculate the similarity of the first 100 or 1000 lines, i.e. the lines that are now the most similar, with line 1 and display the similarity percentage in a separate column.
↳ I’ve already written a proof of concept hack of such a macro.
↓
The problem is that you will then only know the similarity of line 1 with the most similar 1000 or so other lines.
Should you need to then compare
Line 2 with all others
Line 3 with all others
etc. up to line 1’000’000
would probably require a massively powerful computer and on top of that using EmEditor would no longer make sense.So for your question
I want to know how I can initially filter the database using the program’s [EmEditor’s] powerful tools.
To filter the database you would first need to know the similarity percentage. And once the similarity percentage is known, pre-filtering would no longer be necessary as the final result is already there.
Hope this helps.
-
AuthorPosts
- The forum ‘General’ is closed to new topics and replies.