Tagged: 

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #28131
    Simon McMahon
    Participant

    Hi there, I would like to extract all occurrences of a URL string pattern (which can appear multiple times in a file) to build a list of all occurrences.
    Currently I can identify each occurrence with the Find in files feature, but I would like the Extract feature to list each occurrence on a new line. Currently the feature lists each line that contains the string. And a line can contain the sting multiple times.
    My goal is to get a list of the full URL that contains __data/assets/

    In the below example __data/assets/ occurs 48 times.
    However, the extract only 44 lines are extracted, but I need to output all 48 occurrences (the full URL).
    I will be running this extract over 270 files in total.

    View source of this example webpage:
    https://www.walkerville.sa.gov.au/council/strategic-plans/2020-2024-living-in-the-town-of-walkerville-a-strategic-community-plan

    #28132
    Patrick C
    Participant

    Partial answer
    You probably cannot do it in one step, but you would* be able do this in two:
    First extract
    and then Find -> Extract with the following Regex*
    /https:\/\/.*__data\/assets.*(?=")/gU

    *there is a problem here:
    The syntax above is not perl compatible

    Does somebody know how to apply the /gU regex flag?
    Would really appreciate this

    #28133
    Patrick C
    Participant

    Figured it out:
    First extract the lines via find in file

    Then Find → Extract with Regex:
    (?<=")[^"]*__data\/assets.*?(?=")

    🙂

    #28134
    Patrick C
    Participant

    Arghh, stupid me
    You can do this in one step

    Find in files → Extract with Regex:
    (?<=")[^"]*__data\/assets.*?(?=")

    With the extract option display matched strings only
    😴

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.