EmEditor (text editor) Forum Index
   Regular Expressions
     how to delete duplicate lines
Register To Post

Threaded | Newest First Previous Topic | Next Topic | Bottom
Poster Thread
user
Posted on: 9/29/2006 7:27 am
Home away from home
Joined: 9/29/2006
From:
Posts: 212
how to delete duplicate lines
hello

how do I delete duplicate lines?

I mean lines that are identical

thanks
Yutaka
Posted on: 9/29/2006 11:56 am
Webmaster
Joined: 9/28/2006
From: Redmond
Posts: 2398
Re: how to delete duplicate lines
It isn't easy to do with regular expressions, but how about a macro like this (JavaScript):


// Create an array
a = new Array();

// Fill the array a with all lines (with returns) in the document.
document.selection.StartOfDocument();
for( ; ; ){
    y = document.selection.GetActivePointY( eePosLogical );
    document.selection.SelectLine();
    sLine = document.selection.Text;
    if( sLine == "" ) {   // Reached the end of document, escape from the loop
        break;
    }
    a.push( sLine );
    document.selection.Collapse();
    if( document.selection.GetActivePointY( eePosLogical ) == y ) {
        // Reached the end of document (the last line without return), escape from the loop
        break;
    }
}

// Delete duplicate elements.
for( i = 0; i < a.length; i++ ){
    sLine = a[i];
    for( j = i + 1; j < a.length; j++ ){
        if( sLine == a[j] ){
            a.splice( j, 1 );
            j--;
        }
    }
}

// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = a.join( "" );


Please let me know if you have questions.


----------------
Yutaka Emura
Developer of EmEditor
http://www.emeditor.com/

user
Posted on: 9/29/2006 12:42 pm
Home away from home
Joined: 9/29/2006
From:
Posts: 212
Re: how to delete duplicate lines
thank you for your reply

my congrats and best wishes with the new forums :)
Hellados
Posted on: 4/18/2009 1:10 am
Just popping in
Joined: 4/18/2009
From:
Posts: 1
Re: how to delete duplicate lines
This is a good macros for small files, but it is very slow for me
I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
my pc's performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
What can i do?
Yutaka
Posted on: 4/18/2009 1:26 pm
Webmaster
Joined: 9/28/2006
From: Redmond
Posts: 2398
Re: how to delete duplicate lines
Quote:

Hellados wrote:
This is a good macros for small files, but it is very slow for me
I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
my pc's performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
What can i do?


I did some optimization. Please try this. This also shows the current status on the status bar.

function Pair( i, s )
{
	this.index = i;
	this.str = s;
}

nLines = document.GetLines();

// Create an array
a = new Array( nLines );

status = "Reading lines..."

// Fill the array a with all lines (with returns) in the document.
for( i = 1; i <= nLines; i++ ) {
	if( (i % 1000) == 0 ){
		status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
	}
	var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
	a.push( pair );
}

status = "Sorting lines..."

a.sort( function(a,b){
	if( a.str > b.str ){
		return 1;
	}
	if( a.str < b.str ){
		return -1;
	}
	return a.index - b.index;
});

// Delete duplicate elements.
for( i = 1; i < nLines; i++ ){
	if( (i % 10) == 0 ){
		status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
	}
	if( a[i].str == a[i-1].str ){
		a[i].index = 0;  // disable
	}
}

status = "Sorting lines again..."

a.sort( function(a,b){
	return a.index - b.index;
});

var str = "";
for( i = 0; i < nLines; i++ ){
	if( a[i].index != 0 ){
		if( (i % 1000) == 0 ){
			status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
		}
		str += a[i].str;
	}
}

// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = str;
status = "Duplicate lines deleteded."


----------------
Yutaka Emura
Developer of EmEditor
http://www.emeditor.com/

Salabim
Posted on: 1/3/2010 10:33 am
Quite a regular
Joined: 9/5/2009
From: Ghent (Belgium)
Posts: 58
Re: how to delete duplicate lines
Hi Yutaka,

regarding the last (faster) duplicate line macro you posted, is it possible to change the code so that the last line...
status = "Duplicate lines deleteded."


... could actually show how many duplicate lines were deleted ?

Something like :
"117 duplicate lines deleted."
Yutaka
Posted on: 1/4/2010 9:51 pm
Webmaster
Joined: 9/28/2006
From: Redmond
Posts: 2398
Re: how to delete duplicate lines
Then how about this?


function Pair( i, s )
{
	this.index = i;
	this.str = s;
}

nLines = document.GetLines();

// Create an array
a = new Array( nLines );

status = "Reading lines..."

// Fill the array a with all lines (with returns) in the document.
for( i = 1; i <= nLines; i++ ) {
	if( (i % 1000) == 0 ){
		status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
	}
	var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
	a.push( pair );
}

status = "Sorting lines..."

a.sort( function(a,b){
	if( a.str > b.str ){
		return 1;
	}
	if( a.str < b.str ){
		return -1;
	}
	return a.index - b.index;
});

// Delete duplicate elements.
for( i = 1; i < nLines; i++ ){
	if( (i % 10) == 0 ){
		status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
	}
	if( a[i].str == a[i-1].str ){
		a[i].index = 0;  // disable
	}
}

status = "Sorting lines again..."

a.sort( function(a,b){
	return a.index - b.index;
});

var str = "";
n = 0;
for( i = 0; i < nLines; i++ ){
	if( a[i].index != 0 ){
		if( (i % 1000) == 0 ){
			status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
		}
		str += a[i].str;
	}
	else {
		n++;
	}
}

// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = str;
status = n + " duplicate lines deleteded."


----------------
Yutaka Emura
Developer of EmEditor
http://www.emeditor.com/

Salabim
Posted on: 1/5/2010 11:07 pm
Quite a regular
Joined: 9/5/2009
From: Ghent (Belgium)
Posts: 58
Re: how to delete duplicate lines
Thanks a lot Yutaka ! :)
Monkeyman
Posted on: 6/5/2011 12:47 pm
Just popping in
Joined: 9/3/2009
From:
Posts: 3
Re: how to delete duplicate lines
Thank you for good macro. Removing duplicate lines is very nice and useful feature, which EmEditor lacks badly. I hope you'll add it in future release.

As for JS macro provided, it has one small "glitch". When duplicate line is the last one this macro doesn't recognize it. For example:

Badger
Eagle
Simpsons
Donkey
Badger

There's no new line after second "Badger", so it won't delete it.
raikrivera
Posted on: 7/28/2011 1:36 am
Just popping in
Joined: 7/28/2011
From: USA
Posts: 1
Re: how to delete duplicate lines
Thank you sooooo much guys which i was searching for it.
(1) 2 »
Threaded | Newest First Previous Topic | Next Topic | Top


Register To Post
 
English čeština Deutsch español français italiano 日本語 한국어 Русский 简体中文 繁體中文