Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #17804
    no1
    Participant

    Hi all,

    Is there any way to separate data by bytes?
    Bytes! Not characters. A non-English character may be more than one byte.
    And my source data is byte-based. So I have to do this.

    The numbers of bytes of each column:
    8,50,30,30,30,50,50,50,50,30,15,8,1,6,25,25,55,55,55,55,12,30,11,3,30,24,55,55,55,55,12,30,11,3,30,55,55,55,55,3,30,30,10,3,1,1,1,30,1,10,4,1,6,3,8,10,6,30,10,6,6,1,1,1,1,1,1,1,1,30,30,30,1,8,8,8,8,8,8,8,8,8,8,8,8,8,1,10,3,30,4,5,7,7,4,3,30,5,19,19,19,19,20,7,8,9,19,19,8,10,30,50,5,35,3,30,3,30,4,30,4,30,25,4,50,3,30,3,30,30,40,30,40,30,40,30,40,30,40,30,40,6,8,50,50,50,5,25,6,6,70,20,1,5,5,3,10,30,25,5,30,8,3,3,8,8,3,5,3,8,8,6,8,2,2,8,12,3,2,8,1,5,5,1

    Make it TSV or CSV. (Separate with tabs or commas)

    Or any other tool?

    TIA!

    #17812
    Yutaka Emura
    Keymaster

    Hello,

    How about opening a file as Binary (ASCII View)?

    #17817
    no1
    Participant

    But I need a macro or something…
    There are about 200 columns. And the file is large.

    #17837
    Stefan
    Participant

    I do not understand what you are after, no1.

     
    Can you explain in more detail? Best with before/after examples and which rules to attend.
     
    BEFORE:
    xxxxxx
    AFTER;
    xx; xx; xx; 
     

    Stefan

    #17843
    no1
    Participant

    I’ll explain more with some pictures later.

    Currently please read this:
    http://www.ultraedit.com/forums/viewtopic.php?f=2&t=14711

    Thank you all!

    #17849
    no1
    Participant

    Let me give an example with some pictures:

    The bytes are continuous in the file. To make it clear in the picture, I broke the stream at 0D0A.

    The data fields are fixed length in bytes. The red Solid lines are where I want to insert the delimiter bytes.

    It would be best if there is a way to insert the delimiter bytes into the byte stream directly. But I don’t know such a way (which is what I exactly want). So currently I have to open the files with a text editor and handle the text by characters.

    The data fields are fixed length in bytes. But the number of characters could be different. (I highlighted some corresponding bytes and characters with different colors.)

    UltraEdit’s Convert to Character Delimited command can only handle the text by characters. (And I don’t know if regular expressions can handle by bytes.) So I insert ! next to the multi-byte characters according to the numbers of the bytes.
    Now the number of characters is equal to the number of the original bytes, which lets UltraEdit’s Convert to Character Delimited command insert the delimiters to the right positions.

    Any better solutions/tools are welcome.

    The example text and its Hex(UTF-8):
    (Column Width: 8,30,15,10,13 bytes)

    Field 1 Field 2 ăĕĭŏŭ âêîôû Field3(15bytes)Field 4   Field 5, etc.
    123     Any unicode string without tab诸如此类             Field 5, etc.
    12345678[---This field is 30 bytes---]エトセトラ[10 bytes]Field 5, etc.
    
    4669656C642031204669656C64203220C483C495C4ADC58FC5AD20C3A2C3AAC3AEC3B4C3BB204669656C64332831356279746573294669656C6420342020204669656C6420352C206574632E0D0A3132332020202020416E7920756E69636F646520737472696E6720776974686F757420746162E8AFB8E5A682E6ADA4E7B1BB202020202020202020202020204669656C6420352C206574632E0D0A31323334353637385B2D2D2D54686973206669656C642069732033302062797465732D2D2D5DE382A8E38388E382BBE38388E383A95B31302062797465735D4669656C6420352C206574632E0D0A

    By the way, about the picture above, if someone would be interested:
    All the colorful highlightings and lines in the text are done within EmEditor, not by an image editor.

    Thank you, Yutaka, for the new features.
    The User-Defined Guides are useful.

    #17853
    no1
    Participant

    Drag and drop the picture to see the original size of it.

    #17854
    Yutaka Emura
    Keymaster

    Hi no1,

    After you open the file as Binary (ASCII View), you can record a macro. For example, when the cursor is at the left top corner, you can click the “Record All Except Mouse/Keyboard Activities” button on the toolbar, and record your keystroke like this:

    Press RIGHT 8 times, COMMA (,), RIGHT 30 times, COMMA (,), RIGHT 15 times, COMMA (,), RIGHT 10 times, COMMA (,), RIGHT 13 times, … , HOME, DOWN.

    Then stop recording the macro. This will record a macro like this (in case of JavaScript):

    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.Text=",";
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.Text=",";
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.Text=",";
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.Text=",";
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.CharRight(false,1);
    document.selection.Text=",";
    document.selection.StartOfLine(false,eeLineView | eeLineHomeText);
    document.selection.LineDown(false,1);
    

    Then you can choose Run with Temporary Options on the Macros menu, and enter the number of lines as the “Repeat Count”. After you have done this, save the file, and open the file and switch to the CSV mode. I hope this helps.

    #17872
    no1
    Participant

    Thank you, Yutaka.

    My files are many and large (thousands of lines or even more per file, about 200 columns per line, the column widths variable). So I’m afraid operation macros might not fit my case.

    I wrote a script macro with the help of others. Could you (or anyone) take a look to see if there’s anything to be amended in it? And see if the regular expressions could be optimized. (After I tested a few, I think the shorter, the faster.)
    The files are large. So the efficiency and stability should be considered.

    Currently the only but fatal problem is:
    The regular expression ^ would match the position after this character:
    U+0085 <control> : NEXT LINE
    How to resolve it?

    if(document.Encoding != eeEncodingBinary) {
    	alert("TSV ASCII\n\nUse this macro in Binary (ASCII View) mode only!");
    	Quit();
    };
    
    var str = "8,50,30,30,30,50,50,50,50,30,15,8,1,6,25,25,55,55,55,55,12,30,11,3,30,24,55,55,55,55,12,30,11,3,30,55,55,55,55,3,30,30,10,3,1,1,1,30,1,10,4,1,6,3,8,10,6,30,10,6,6,1,1,1,1,1,1,1,1,30,30,30,1,8,8,8,8,8,8,8,8,8,8,8,8,8,1,10,3,30,4,5,7,7,4,3,30,5,19,19,19,19,20,7,8,9,19,19,8,10,30,50,5,35,3,30,3,30,4,30,4,30,25,4,50,3,30,3,30,30,40,30,40,30,40,30,40,30,40,30,40,6,8,50,50,50,5,25,6,6,70,20,1,5,5,3,10,30,25,5,30,8,3,3,8,8,3,5,3,8,8,6,8,2,2,8,12,3,2,8,1,5,5,1";
    str = prompt("[TSV ASCII]    Enter width of each column:    (e.g. 10,5,1,7)", str);
    var arr = str.split(",");
    
    Redraw = false;
    
    var n = 0
    for(var i = 0; i < arr.length - 1; i++) {
    	n = n + parseInt(arr[i]);
    	document.selection.Replace("^.{" + n + "}", "\\t", eeReplaceAll | eeFindReplaceRegExp);
    	n = n + 1;
    
    	//Monitor/Break:
    	document.HighlightFind = false;
    	Redraw = true;
    	if(!confirm("Monitor/Break:\n\n" + i + " " + arr[i] + " " + n + " done.\n\nContinue?")) Quit();
    	Redraw = false;
    };
    
    document.selection.Replace(" +\t", "\t", eeReplaceAll | eeFindReplaceRegExp);
    
Viewing 9 posts - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.