EmEditor (text editor) Forum Index
   Regular Expressions
     How to extract only one tag from the html file using regular expression?
Register To Post

Threaded | Newest First Previous Topic | Next Topic | Bottom
Poster Thread
raviniral
Posted on: 6/18/2011 6:46 am
Just popping in
Joined: 6/18/2011
From:
Posts: 1
How to extract only one tag from the html file using regular expression?

I am not able to create macro for extracting only one tag from HTML file.

My query is, I have HTML files and I want to extract only <a href="https://XXXXX/en/XXXXX"> tags. There are many tags in the file but i want to extract only those have the text in between /en/ or /us/.

Can anyone please help. It will help me to work faster.

Stefan
Posted on: 7/12/2011 12:34 pm
Home away from home
Joined: 7/14/2008
From: Germany, EU
Posts: 265
Re: How to extract only one tag from the html file using regular expression?
I don't know if i do it the elegant way, but at least it works:


Example text:
<a href="https://XXXoneXX/en/XXXXX">
<a href="https://XXtwoXXX/de/XXXXX">
<a href="https://XXXthreeXX/es/XXXXX">
<a href="https://XfourXX/en/XXXXX">
<a href="https://XXfiveXXX/de/XXXXX">
<a href="https://XXsixXXX/us/XXXXX">
<a href="https://XsevenXX/it/XXXXX">
<a href="https://XeightXX/tc/XXXXX">
<a href="https://XnineXX/en/XXXXX">


Result:
1: <a href="https://XXXoneXX/en/XXXXX">
4: <a href="https://XfourXX/en/XXXXX">
6: <a href="https://XXsixXXX/us/XXXXX">
9: <a href="https://XnineXX/en/XXXXX">


RegEx used to match wanted line:
"<a href=.https.+/(en|us)/.+>"




Script:

strSelectedText = document.selection.Text
If strSelectedText = "" Then 
	alert "Nothing selected. Script quits."
	Quit
Else
	'alert strSelectedText
End If


'regex search pattern:
strREPattern = "<a href=.https.+/(en|us)/.+>"


'how many lines?
yPosStartSelection = document.selection.GetAnchorPointY( eePosLogical )
yPosEndSelection = document.selection.GetBottomPointY( eePosLogical )
LinesSelected = yPosEndSelection-yPosStartSelection


'For Each Line In LineS Do:
LinesArray = split(strSelectedText, vbCRLF)
funcRegExReturn =""
For LineNr = 0 to LinesSelected -1
	currLine = LinesArray(LineNr)
	'alert LineNr & ": " & currline & ": " & strREPattern
	funcRegEx strREPattern, currLine
	'alert funcRegExReturn
	If funcRegExReturn = TRUE Then 
		OutArray = OutArray & Linenr+1 & ": " & currLine 
		If LineNr < LinesSelected -1 Then OutArray = OutArray & vbCRLF
	End If
Next

'TEST:
'alert OutArray



'Write OutArray to new document:
'if not editor.EnableTab then 
'   editor.EnableTab = True 
'end if 
editor.NewFile 
document.write OutArray



Function funcRegEx(needle, haystack)
	funcRegExReturn = false
	Set regEx = New RegExp
	regEx.Pattern = needle
	regEx.IgnoreCase = True
	regEx.Global = True
	if regEx.Test(haystack) Then funcRegExReturn = TRUE
End Function




HTH?



---
@Emura

But i don't understand why the function didn't return something with the function name?

I think this should work:


call funcRegEx x y

If funcRegEx Then alert "OK"

Function funcRegEx(needle, haystack) 	
    Set regEx = New RegExp 	
    regEx.Pattern = needle 	
    regEx.IgnoreCase = True 	
    regEx.Global = True 	
    if regEx.Test(haystack) Then funcRegEx = TRUE 
End Function


but i have to use something like:

Dim 	funcRegExReturn

call funcRegEx x y
If funcRegExReturn Then alert "OK"

Function funcRegEx(needle, haystack)
	Set regEx = New RegExp
	regEx.Pattern = needle
	regEx.IgnoreCase = True
	regEx.Global = True
	if regEx.Test(haystack) Then funcRegExReturn = TRUE
End Function


10.1.0, 32-bit portable on XP 32 SP3
Threaded | Newest First Previous Topic | Next Topic | Top


Register To Post
 
English čeština Deutsch español français italiano 日本語 한국어 Русский 简体中文 繁體中文