Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #23651
    John Cordes
    Participant

    In the course of my genetic genealogy (DNA) research I have some CSV files, usually with thousands of rows and occasionally with many thousands of columns. Recently I was startled to discover what I felt to be quite strange behaviour when opening these files in EmEditor Pro v18.0.5 (since updated to 18.0.6 with no change in behaviour). This is on a Windows 10 Pro, 64 bit machine. The surprise is that when the file was initially opened, in CSV mode, only about 2,000 rows appeared. I knew there should be more data (later confirmed by opening the file in vim, and also by importing into Excel). When I turned off CSV mode, going to Normal Mode, the full content appeared, with over 10,000 rows. I have managed to reduce the original file to a test case with Normal Mode showing 109 lines, while CSV mode shows only 17 lines.

    The point which really troubles me is that at first I had no idea that by viewing in csv view mode I was not seeing the whole file. In fact when this first happened, I went back to the source and ran a separate process to download what I thought was the “missing data”; I only discovered later that the extra lines really were in the original file after all.

    One of my main reasons for getting EmEditor Pro (about a year or so ago) was for viewing large CSV files. It has been extremely convenient for that. But now I am a bit concerned. I will say that when in CSV mode some warnings appeared in the status area, such as “Inconsistent number of columns detected.”. That is hardly surprising for a program (the source of the csv files) which emits thousands of columns. I’ve had a lot of experience (but not recently) with ‘fixing’ csv files when I used to have to convert them to HTML tables and use jQuery to make the tables sortable; it was necessary (unlike for most browsers when displaying non-sorting tables) that the rows all have exactly the same number of columns. But I did not expect this warning to mean that some data would not get displayed at all.

    Is there any way to have a warning appear in EE that not all data has been shown, while in CSV mode?

    Thanks,
    John Cordes

    #23652
    Yutaka Emura
    Keymaster

    Hello John,

    I apologize for any inconveniences. I received the similar issue by email, and I believe there are embedded newlines when visible lines were reduced, and the Output bar should show “Invalid double quotes”.
    The older version would continue when this happens, but v18.0.5-v18.0.6 will embed all newlines after the encountered line. The next version should fix this issue.

    If the source file does NOT contain any embedded newlines, you can clear the “Allow newlines in double quotes” check box in the CSV page of the Customize dialog box. This will fix the issue, and also will make EmEditor faster.

    I will release a minor update soon. If you want to make sure the new version will work for you, I will, either send you a new test version for you to test, or you can send me a sample file so that I can test, so let me know if you would like either way. Again, I am sorry for inconveniences.

    Thank you for reporting the issue.

    #23653
    John Cordes
    Participant

    Hello Yutaka,
    Thank you for the prompt response. I am glad to hear that the issue is understood and will soon be corrected. I would send you my reduced file, but it still contains data involving other people which should probably be kept private. Even though I have pared it down a lot from the original, it is not easy to work with (mostly because of the large number of columns still present), so is difficult to ‘clean’. I would be glad to try out a new test version, if you would care to send it to me when ready. I am not sure if the source file contains embedded newlines but will look into that later.

    #23654
    John Cordes
    Participant

    Brief follow-up: I have just tried your suggestion of clearing the “Allow newlines in double quotes” check box in the CSV page of the Customize dialog box. That did indeed make my test file display properly. I believe it is working properly for my original source file also but should check some details to be certain. It also made the display in csv mode *much* faster. Thank you for this suggestion. I’m not sure that I understand just what is happening though — have not been able to spend much time on this today.

    #23659
    Yutaka Emura
    Keymaster

    Hello John,

    From your follow-up descriptions, the next version v18.0.7 should work for you. We are already releasing v18.0.7, and you can update now.
    Thanks for detailed report.

    #23664
    John Cordes
    Participant

    Hello Yutaka,
    Thank you for the quick update. I have installed 18.0.7 and tried it on a couple of the CSV files I described.
    I find myself somewhat puzzled by the results.
    First, when I load the smaller test file I mentioned, I do see 109 lines in both modes (csv and normal), no matter whether the “Allow newlines in double quotes” box is cleared or not.
    However, when I open the complete file (before I began deleting lines to try to get to a simple test file situation), things are more complicated.
    With the ‘newlines’ box cleared, I find the same results in both csv and normal mode: 10,284 lines. There are just 2 lines in the Output window (in csv mode):
    (2036): Invalid double quotes.
    (2036): Inconsistent number of columns detected.

    With the ‘newlines’ box checked, I see 10,284 lines in normal mode (expected result I think) but only 7,883 lines in csv mode. In csv mode for this case the Output window contains these lines:
    (2036): Invalid double quotes.
    (2036): Inconsistent number of columns detected.
    (2082): Invalid double quotes.
    (2085): Invalid double quotes.
    (2160): Invalid double quotes.
    (2170): Invalid double quotes.
    (2303): Invalid double quotes.
    (2765): Invalid double quotes.

    My reduced test file is 203 KB in size, the original file is 2.88 MB. I would be prepared to send either file to you if you would like to check them out yourself.

    #23665
    Yutaka Emura
    Keymaster

    I believe this due to embedded newlines.
    You can email me a zipped file to tech @ emurasoft.com so that I will look into the file for you.
    If you need to send a very large file, I will send you information to upload the file to a cloud. So please email me.

    Thanks,

    #23666
    John Cordes
    Participant

    Thanks – I have submitted a zipped file. I presume it does contain embedded newlines but I am unclear just what to do about that.

Viewing 8 posts - 1 through 8 (of 8 total)
  • You must be logged in to reply to this topic.