Select lines containing tex1 but not text2

Started by JJK, January 16, 2009, 08:03:42 PM

Previous topic - Next topic

JJK

Hello Stefan and other regexp masters
I know how to find lines which contain both "text1" AND "text2" by searching "text1|text2" with regexp on.
But I am not able to find lines which contain "text1" but not "text2".
Any help ?

Stefan

;D thanks JJ.

With RegEx one would do this with negative "Look Behind" and "Look Ahead" if this is implemented (not tested yet)
But the most editors can't do that.

VIM has an "global" command to search for and execute smtg to matching or non-matching lines
:v/ text2  s:/text1
where v/ is the equivalent to an non-matching g/
And you can execute any command after you have an match, not only find or find&replace but also delete the line f.ex.

But for other editors then VIM i don't no right now .... i will think about.

Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

Stefan

I use this post just as an notice.
I do just a few tests here.



I have this text:

1 text1 text1 text1 text1 text1 text1 text1 text1 text1
2 text1 text1 text1 text1 text1 text1 text1 text1 text1
3 text1 text1 text1 text1 text1 text1 text1 text1 text1
4 text1 text1 text2 text1 text1 text1 text1 text1 text1
5 text3 text3 text2 text3 text3 text3 text3 text3 text3
6 text1 text1 text1 text1 text1 text1 text1 text1 text1


i RegEx search for

.*(?<=text2).*text1.*

Explanation:
.*               =any char, null or more of them
(?<=text2) =Zero-width positive lookbehind and text2 to must have in line
.*               =any char, null or more of them
text1         =text to match too in line
.*               =any char, null or more of them


HippoEDIT match on line 4. Very good!


More explanations:
(?<=text)
Zero-width positive lookbehind.
Matches at a position to the left of which text appears.
Since regular expressions cannot be applied backwards, the test inside the lookbehind can only be plain text.
Some regex flavors allow alternation of plain text options in the lookbehind.
copied from http://www.regular-expressions.info/refadv.html

This example  of Zero-width positive lookbehind.match a line with text1 behind of text2
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

Stefan

#3
What you want, JJ would be

(?<!text)
Zero-width negative lookbehind. Matches at a position if the text does not appear to the left of that position.
in combination with
(?!regex)
Zero-width negative lookahead. Identical to positive lookahead, except that the overall match
will only succeed if the regex inside the lookahead fails to match.  ( http://www.regular-expressions.info/refadv.html )


I tried this as
.*(?<!text2).*text1.*
but without success. Line 4 is matched too.


This example  of Zero-width negative lookbehind. should match a line with text1  where is no  text2 in front.
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

Stefan

#4
I think this is not doable with RegEx only and HippoEDIT didn't support this right now.
This have to be done by coding  at Alex itself or later when we get scripting.... or by using external tools like SFK.

An pseudo code could be:

For i = 1 To AllLinesCount
  boolCheck = RegEx.Match("text2")
  IF boolCheck = true Then
    'do nothing but continue next line
  ELSE
     If RegEx.Match("text1")
        mark line and wait for user
     End If
  End If
Next
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

JJK

Thks Stefan for your efforts. So it doesn't seem simple, even doable.
That's why I didn't succeed  :) :) (no, it is not true, I don't know well regexp)

But I realize only now that in my first post I wrote :
Quoteto find lines which contain both "text1" AND "text2" by searching "text1|text2" with regexp on
Of course I made a mistake. That was to find lines containing text1 OR text2 (instead of text1 AND text2), or in other words to find lines containing text1 PLUS lines containing text2.

So another request : to find lines containing both text1 AND text 2, I search "text1.*text2|text2.*text1" with regexp on. That seems me a bit heavy. Do you know a more elegant solution ?

BTW
Quoteor by using external tools like SFK
What is SFK ?

Stefan

HI JJ
your RegEx is good

it searches  for
text1   anything_or_nothing   text2
or for
text2   anything_or_nothing   text1

what ever sorting match first is taken

-

I think this
(text1).*(text2) | \2.*\1
(without the spaces) should work too, but it doesn't


----
http://stahlworks.com/dev/index.php?tool=sfk
Swiss File Knife - the open source file tree processor is a free multi function command line tool
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

alex

Hello Guys,

I am also not very good expert in regular expression, that is why I have not respond, I used them only for creating matching patterns for labels in scehmas and a little bit for search and replace in dayly work.

But I can give you a link to regular expression libruary documentation used in HE (BOOST regexp):
http://www.boost.org/doc/libs/1_37_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
There you can find what is supported and which is correct syntax.
And if something works incorrectly, then I can check. Probably this is a bug in my coding.
One of the known bugs, is that .* would not go outside of line boundaries.
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

JJK

Quote(text1).*(text2) | \2.*\1
I tried also (text1).*(text2) | $2.*$1
because it seems that tagged expressions are $x, but no more luck.

Stefan

AFAIK:
\1 is to reuse ()back references in search string.
$1 is to reuse ()back references in replace string,

To make it more complicated some RegEx flavors use \1 also in replace string, while some use $1 there.

http://www.regular-expressions.info/refreplace.html
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

alex

Hello Stefan,

I am not sure, but I think you have a mistake in your regexp...
(text1).*(text2) | \2.*\1 or
(text1).*(text2) | $2.*$1

because you can use \2 or \1 only if \1 or \2 was found BEFORE. So this mean it would work if you have somthing like this:
text1 text2 text2 text1

But this generally has no sence, in context of problem.
JJK you can use \1 as placeholders for submatches done before.

I have tested the regexp here http://regexlib.com/RETester.aspx to check of HE engine has a bug, but result is same (by the way, very good regexp libruary).

So, I think only way is to do like this:
(text1).*(text2) | (text2).*(text1)  ...

But maybe I ma wrong. As I have wrote I am not big expert in regexp.
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

Stefan

>because you can use ... only if ... was found BEFORE
Yes, you're right. I have not thought about this little fact :D

>(text1).*(text2) | (text2).*(text1)  ...
yes, i come back to this too. You're right.

However the () are not always needed.
This () are needed only if you
a) have to group some words to one find-string
b) need to refer back to this match  (as i want to with \1\2)
c) want to quantifier by + or *

Or in words of Jan G.:
Round brackets group the regex between them.
They capture the text matched by the regex inside them that can be reused in a backreference,
and they allow you to apply regex operators to the entire grouped regex. (http://www.regular-expressions.info/refadv.html)
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

alex

Yes, round brackets (probably they called parenties, have seen this somewhere) are not necessary ;)
So: text1.*text2|text2.*text1 can be used.

Just want to inform, that you can use also (?:expression) to group some expresions but not create a submatch for it.
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

Stefan

#13
Hi JJK, maybe this will help you?  ==> http://forum.hippoedit.com/index.php/topic,143.msg496.html#msg496

use

Command: sfk
Arguments: filter %FileName% %Variable name%

If you are prompted enter: ++text1 ++text2
and take a look at the output window what happens:

Command Line:
C:\WINDOWS\system32\sfk.exe filter F:\HippoEDIT\Untitled3.TXT ++text1 ++text2

4 text1 text1 text2 text1 text1 text1 text1 text1 text1

-------------------------------------------- Done --------------------------------------------
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

epfax

Hi, I found the regexp-solution to the problem. Stefan, your try with negative lookbehind ".*(?<!text2).*text1.*" allowed "text2" to match with those many ".*", therefore line 4 matched too.

My expression tests every single character from line start up to "text1" and after that till line end with a negative lookforward:

^(.?(?!text2))*text1((?!text2).?)*$

You can test it with these 10 lines:

1 text1 text1 text1 text1 text1 text1 text1 text1 text1
2 text1 text1 text1 text1 text1 text1 text1 text1 text2
3 text3 text1 text1 text1 text1 text1 text1 text1 text1
4 text1 text1 text2 text3 text1 text1 text1 text1 text1
5 text1 text1 text3 text1 text1 text2text1 text1 text1
6 text1 text1 text3 text1 text1text2 text1 text1 text1
7 text1 text1 text3 text1 text1text2text1 text1 text1
8 text3 text3 text1 text3 text2 text3 text3 text3 text3
9 text3 text3 text1 text3 text3 text3 text3 text3 text3
10 text1 text1 text3 text1 text1 text1 text1 text1 text1

Every whole line containig "text1" but not "text2" will be found and marked.