Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #3834
    user
    Participant

    hello

    how do I delete duplicate lines?

    I mean lines that are identical

    thanks

    #3835
    Yutaka Emura
    Keymaster

    It isn’t easy to do with regular expressions, but how about a macro like this (JavaScript):


    // Create an array
    a = new Array();

    // Fill the array a with all lines (with returns) in the document.
    document.selection.StartOfDocument();
    for( ; ; ){
    y = document.selection.GetActivePointY( eePosLogical );
    document.selection.SelectLine();
    sLine = document.selection.Text;
    if( sLine == "" ) { // Reached the end of document, escape from the loop
    break;
    }
    a.push( sLine );
    document.selection.Collapse();
    if( document.selection.GetActivePointY( eePosLogical ) == y ) {
    // Reached the end of document (the last line without return), escape from the loop
    break;
    }
    }

    // Delete duplicate elements.
    for( i = 0; i < a.length; i++ ){
    sLine = a[i];
    for( j = i + 1; j < a.length; j++ ){
    if( sLine == a[j] ){
    a.splice( j, 1 );
    j--;
    }
    }
    }

    // Replace the entire document with new elements
    document.selection.SelectAll();
    document.selection.Text = a.join( "" );

    Please let me know if you have questions. :-)

    #3836
    user
    Participant

    thank you for your reply

    my congrats and best wishes with the new forums :)

    #7153
    Hellados
    Member

    This is a good macros for small files, but it is very slow for me :-(
    I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
    my pc’s performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
    What can i do?
    :-(

    #7156
    Yutaka Emura
    Keymaster

    Hellados wrote:
    This is a good macros for small files, but it is very slow for me :-(
    I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
    my pc’s performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
    What can i do?
    :-(

    I did some optimization. Please try this. This also shows the current status on the status bar.


    function Pair( i, s )
    {
    this.index = i;
    this.str = s;
    }

    nLines = document.GetLines();

    // Create an array
    a = new Array( nLines );

    status = "Reading lines..."

    // Fill the array a with all lines (with returns) in the document.
    for( i = 1; i <= nLines; i++ ) {
    if( (i \% 1000) == 0 ){
    status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
    }
    var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
    a.push( pair );
    }

    status = "Sorting lines..."

    a.sort( function(a,b){
    if( a.str > b.str ){
    return 1;
    }
    if( a.str < b.str ){
    return -1;
    }
    return a.index - b.index;
    });

    // Delete duplicate elements.
    for( i = 1; i < nLines; i++ ){
    if( (i \% 10) == 0 ){
    status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
    }
    if( a[i].str == a[i-1].str ){
    a[i].index = 0; // disable
    }
    }

    status = "Sorting lines again..."

    a.sort( function(a,b){
    return a.index - b.index;
    });

    var str = "";
    for( i = 0; i < nLines; i++ ){
    if( a[i].index != 0 ){
    if( (i \% 1000) == 0 ){
    status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
    }
    str += a[i].str;
    }
    }

    // Replace the entire document with new elements
    document.selection.SelectAll();
    document.selection.Text = str;
    status = "Duplicate lines deleteded."

    #7994
    Salabim
    Participant

    Hi Yutaka,

    regarding the last (faster) duplicate line macro you posted, is it possible to change the code so that the last line…

    status = "Duplicate lines deleteded."

    … could actually show how many duplicate lines were deleted ?

    Something like :
    117 duplicate lines deleted.”

    #8009
    Yutaka Emura
    Keymaster

    Then how about this?


    function Pair( i, s )
    {
    this.index = i;
    this.str = s;
    }

    nLines = document.GetLines();

    // Create an array
    a = new Array( nLines );

    status = "Reading lines..."

    // Fill the array a with all lines (with returns) in the document.
    for( i = 1; i <= nLines; i++ ) {
    if( (i \% 1000) == 0 ){
    status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
    }
    var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
    a.push( pair );
    }

    status = "Sorting lines..."

    a.sort( function(a,b){
    if( a.str > b.str ){
    return 1;
    }
    if( a.str < b.str ){
    return -1;
    }
    return a.index - b.index;
    });

    // Delete duplicate elements.
    for( i = 1; i < nLines; i++ ){
    if( (i \% 10) == 0 ){
    status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
    }
    if( a[i].str == a[i-1].str ){
    a[i].index = 0; // disable
    }
    }

    status = "Sorting lines again..."

    a.sort( function(a,b){
    return a.index - b.index;
    });

    var str = "";
    n = 0;
    for( i = 0; i < nLines; i++ ){
    if( a[i].index != 0 ){
    if( (i \% 1000) == 0 ){
    status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
    }
    str += a[i].str;
    }
    else {
    n++;
    }
    }

    // Replace the entire document with new elements
    document.selection.SelectAll();
    document.selection.Text = str;
    status = n + " duplicate lines deleteded."

    #8015
    Salabim
    Participant

    Thanks a lot Yutaka ! :)

    #9409
    Monkeyman
    Member

    Thank you for good macro. Removing duplicate lines is very nice and useful feature, which EmEditor lacks badly. I hope you’ll add it in future release.

    As for JS macro provided, it has one small “glitch”. When duplicate line is the last one this macro doesn’t recognize it. For example:

    Badger
    Eagle
    Simpsons
    Donkey
    Badger

    There’s no new line after second “Badger”, so it won’t delete it.

    #9517
    raikrivera
    Member

    Thank you sooooo much guys which i was searching for it.

    #9531
    Deipotent
    Participant

    I needed this functionality recently and thought it could be done easily with regex. Google led me to http://www.regular-expressions.info/duplicatelines.html which said to sort the lines, and then search for the following:

    ^(.*)(r?n1)+$

    and replace with:

    1

    Unfortunately, I couldn’t get this to work in EmEditor, even after enabling the option to search past line boundaries.

    Can you add support for this type of regex to EmEditor ?

    PS. I haven’t tried the macro yet, and am sure it works fine, but it would be nice if it could be done with regex.

    #9536
    Yutaka Emura
    Keymaster

    In EmEditor r is ignored, so you should try using this?

    ^(.*)(n1)+$

    In future versions, I might add a new command to remove duplicate lines, so you won’t need to use regular expression replace in the future.

    Thanks!

    #9541
    Deipotent
    Participant

    Thanks Yutaka, that did the trick!

    A new command would be useful, particularly for people who don’t scripting or RegEx, as a lot of other Text Editors include a simple menu option for removing duplicates. It would also allow you to highly optimise it, and also include an option to keep the original order (ie. so you don’t have to sort it first), which would be useful.

    One of my other suggestions was for a simple RegEx library feature, so you can create or import Regex’s with a name, optional description, and possibly find settings. This would allow you to select the relevant regex from the library list and then run it.

    #9544
    Yutaka Emura
    Keymaster

    We are working on such those features in future versions.

    Thanks!

Viewing 14 posts - 1 through 14 (of 14 total)
  • You must be logged in to reply to this topic.