EmEditor (text editor) Forum Index
   Regular Expressions
     how to delete duplicate lines
Register To Post

Flat Previous Topic | Next Topic
Poster Thread
Yutaka
Posted on: 4/18/2009 1:26 pm
Webmaster
Joined: 9/28/2006
From: Redmond
Posts: 2401
Re: how to delete duplicate lines
Quote:

Hellados wrote:
This is a good macros for small files, but it is very slow for me
I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
my pc's performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
What can i do?


I did some optimization. Please try this. This also shows the current status on the status bar.

function Pair( i, s )
{
	this.index = i;
	this.str = s;
}

nLines = document.GetLines();

// Create an array
a = new Array( nLines );

status = "Reading lines..."

// Fill the array a with all lines (with returns) in the document.
for( i = 1; i <= nLines; i++ ) {
	if( (i % 1000) == 0 ){
		status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
	}
	var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
	a.push( pair );
}

status = "Sorting lines..."

a.sort( function(a,b){
	if( a.str > b.str ){
		return 1;
	}
	if( a.str < b.str ){
		return -1;
	}
	return a.index - b.index;
});

// Delete duplicate elements.
for( i = 1; i < nLines; i++ ){
	if( (i % 10) == 0 ){
		status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
	}
	if( a[i].str == a[i-1].str ){
		a[i].index = 0;  // disable
	}
}

status = "Sorting lines again..."

a.sort( function(a,b){
	return a.index - b.index;
});

var str = "";
for( i = 0; i < nLines; i++ ){
	if( a[i].index != 0 ){
		if( (i % 1000) == 0 ){
			status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
		}
		str += a[i].str;
	}
}

// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = str;
status = "Duplicate lines deleteded."


----------------
Yutaka Emura
Developer of EmEditor
http://www.emeditor.com/

Flat Previous Topic | Next Topic


Subject Poster Date
   how to delete duplicate lines user 9/29/2006 7:27 am
     Re: how to delete duplicate lines Yutaka 9/29/2006 11:56 am
       Re: how to delete duplicate lines user 9/29/2006 12:42 pm
       Re: how to delete duplicate lines Hellados 4/18/2009 1:10 am
       » Re: how to delete duplicate lines Yutaka 4/18/2009 1:26 pm
           Re: how to delete duplicate lines Salabim 1/3/2010 10:33 am
             Re: how to delete duplicate lines Yutaka 1/4/2010 9:51 pm
               Re: how to delete duplicate lines Salabim 1/5/2010 11:07 pm
       Re: how to delete duplicate lines Deipotent 8/10/2011 12:21 pm
         Re: how to delete duplicate lines Yutaka 8/10/2011 1:11 pm
           Re: how to delete duplicate lines Deipotent 8/10/2011 3:50 pm
             Re: how to delete duplicate lines Yutaka 8/10/2011 4:37 pm
     Re: how to delete duplicate lines Monkeyman 6/5/2011 12:47 pm
       Re: how to delete duplicate lines raikrivera 7/28/2011 1:36 am

Register To Post
 
English čeština Deutsch español français italiano 日本語 한국어 Русский 简体中文 繁體中文