Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #18566
    bbacher
    Participant

    I have a 580 MB text file, which represents about 130k lines of column-specific data – but all the data is on one line with no carriage returns. Sometimes I need to verify the format of the file, and the easiest way to do that is to break it down into 130k lines, and make sure all the columns line up properly. Currently, I’m using UltraEdit to do that, because it has a function to insert a character string after a specified number of bytes. I use that and insert a carriage return at 4,689 byte increments.

    Does EmEditor have any similar functionality? I prefer EmEditor, but I’m stuck still using UltraEdit because I don’t know how to do that in EmEditor.

    Thanks very much for any advice.

    #18567
    Yutaka Emura
    Keymaster

    Hello,

    For this purpose, a macro is the best way to go. You can record a macro (Right 4689 times, and press Return), and then “Run with Temporary Options” with repeat count 130K or more.

    #18568
    bbacher
    Participant

    Thanks, I’ll try that.

    #18576
    Stefan
    Participant

    RegEx could be a little bit slow (by its nature), but you may try this too:

    Search & Replace
    Find: (.{4689}\s)
    Replace: \1\n

     
    Explanation:
     Find one piece of any sign (char, digit, sign) by the dot, and that at least 4689 times.
    The \s ensures that we break at a whitespace only.
    I don’t know why this works, as the normal syntax would be {4689, } ( {m,n}, so 4689 at least, but more possible till a whitespace is found)
    But it works fine.

    Then we replace by what was matched into (…) back reference group and insert an line break. (\n is enough, even on DOS/Windows, see Help)

     
    HTH?

    #18580
    bbacher
    Participant

    Thanks, I’ll try that also. Sometime next week I’ll do some performance tests and post back.

    #18581
    Stefan
    Participant

    For to create a macro you could do this:

    – go to top of file
    – go to start of line
    – Macros > Start
    – press right arrow key one time
    – press Enter key
    – Macro > Stop
    – Macro > Edit (choose JavaScript)
    – change ‘1’ to ‘4689’ for CharRight
    – save file (to temporary file, (or to your EmEditor folder for later reuse)
    – Macro > Select This
    – go back to your document
    – undo
    – execute macro with temporary options

    – – –
    To make the macro execution much more quicker, add this at top of the macro:
    Redraw = false;

     
     

    #18582
    Stefan
    Participant

    AH, and I forgot the official way:

    – Tools > Properties for current Configuration
    – General > Wrap by Chars : 4689
    OK

     
    To make this permanent by insert real line breaks:
    – Select All
    – Edit > Convert Selection > Split Lines

     
     
    If often need, this could be also stored as macro and added as Menu button or executed by shortcut key.

    #18583
    bbacher
    Participant

    That’s brilliant! Thank you

    #18587
    bbacher
    Participant

    The macro method is very, very slow. It ran for about five minutes with no visible results – then I closed the program.
    The RegEx Search/Replace searched to the end of the document and did nothing (did not find anything).

    #18588
    bbacher
    Participant

    I don’t MIND using UltraEdit for this process; I can continue doing that. The whole process runs in about 13 seconds, and produces 134,715 lines on today’s file.
    I was just hoping EmEditor had a similar function that I hadn’t found.

    #18589
    Stefan
    Participant

    Interesting.
    The above shown ways should do it. Well, at least with smaller files size.

    Unfortunately I can’t really test on my 32-bit system,… I have to many things running and not enough free memory for testing.

     
    For an test I created a 581 MB text file with 130.000 x 4689 signs.
    Open that in 32-bit EmEditor 14.4.0b2 shows this dialog:

    ---------------------------
    EmEditor
    ---------------------------
    This file contains a very long line. 
    Very long lines will be split into several lines while the document is open, 
    but will be recombined when saved. Highlighting very long lines is disabled, 
    and some other features such as find and replace text containing CR or LF 
    might not work correctly for very long lines.
    
    c:\temp\jsout.txt
    ---------------------------
    OK   Abbrechen   
    ---------------------------
    

     

    and I ended up with 5 lines:
    4x 134217729 chars
    1x 072699089 chars

     

    “Wrap by Char” was disabled.
     
    RegEx s&r allocated 1,2 GB and I canceled after 3 minutes.

     

    So I just quickly split the lines with SED.exe
    c:\temp>sed -e “s/.\{4689\}/&\n/g” jsout.txt > jsoutSplitted.txt

    Unfortunately here I also run out of memory, but with a smaller file this works well.
    So on a 64-bit system that should be no problem,…I will try tomorrow just out of curiosity.

     

     

    Code to create a test file (due to little memory I had to do a additional step as workaround)

    Line="";
    for(B=1, Bs=4689; B<=Bs; B++){
      Line += "X";
    }
    File="";
    for(L=1, Ls=13000; L<=Ls; L++){
      File += Line ;
    }
    fso     = new ActiveXObject("Scripting.FileSystemObject");
    oFile   = fso.OpenTextFile("C:\\temp\\jsout.txt", 2, true)
    for(F=1; F<=10; F++)
      oFile.Write( File );
    oFile.Close();
    
Viewing 11 posts - 1 through 11 (of 11 total)
  • You must be logged in to reply to this topic.