Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
  • #11009

    I’ve recently had a need to work with some enormous XML files. They’ve ranged from just over 1GB to well over 100GB in size. We process them on our servers by reading them a line at a time, storing some data in a staging database, and then extracting the data that we want via SQL and insert it into our production database.

    Reading a line at a time is all very well and good, but for development it’s really useful to be able to browse the file in an application on the desktop. EmEditor’s large-file-controller is excellent and has helped me no end (though I’ve noticed that in v12 the LFC doesn’t appear to be as necessary as in v11, presumably the temporary file support is new in v12?) but there’s one thing that’s missing that would be REALLY useful for me, and that is more specific support for XML files (and well formed similar [SGML-based] languages in general) such as:
    * Being able to automatically format the data in a more-readable way (indenting, line breaking etc. Basically refactoring)
    * Being able to support XPath queries to find specific data, or to get an XPath from a selected tag

    I completely understand that this is utching into the realm of a dedicated XML reader technology, or an IDE, which I know that EmEditor is not, however I think that since decent XML readers appear to have issues with large files, and EmEditor handles them so well, that it would be an excellent addition as a plugin. I also see that there’s improved support for tag highlighting in HTML documents and the like in core, which could also be considered IDE territory. Frankly, I used to use EmEditor as my primary editor for web-dev (HTML, CSS, JS and PHP) and still do for many projects (prefer to use a proper IDE for projects that I don’t already know as their code-follow is usually better, but otherwise it’s EmEditor for me!) so it’s my development environment of choice anyway :)


    Did you consider using Perl and its XML package? If this type of processing is your day-in-day-out activity, I would have opted for that.


    We don’t have a problem processing XML a line at a time on the server. What I’m looking for is something that makes reading these files for investigation and development much simpler. At the moment I have a really nice reader for small files, and I use EmEditor to read and search the larger ones. It’s not the tidiest view though in EmEditor

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.