Author Topic: Regular Expression: greedy option  (Read 1394 times)

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
Regular Expression: greedy option
« on: November 30, 2008, 11:12:01 pm »
Hi alex,

RegEx in HippoEDIT works lazy (non-greedy)

How can i search greedy?


f.ex.

I have this text
Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


Test 1:

if i was on line 'start'
and want to search for 'eat!'
with .+eat (RegEx and Ext. Selection are enabled)
i get only this selected:
Quote
start
HippoEDIT is great!

HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
This was lazy.


Test 2:

What should i do to search greedy to get:
Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


-----

I think

.*?eat!

should find lazy like:

Quote
start
HippoEDIT is great!

HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!



and

.*eat!

should find greedy like

Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


---


What do you think?

May i suggest to let  the RegEx  be "standard" i.e.: greedy
and allow an '?'-sign to switch to non-greedy?
Stefan, HippoEDIT beta tester  (HippoEDIT News On Twitter: http://twitter.com/hippoedit/)

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1712
  • Karma: +29/-2
    • View Profile
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #1 on: December 01, 2008, 05:02:27 pm »
Hello Stefan,

... unfortunately I think this is also bug. Greedy search should also work, but due to implementation algorithm it is now not possible. I think I need to redo regexp searching, and this is not very easy...
Hippoedit stores text as lines (without line breaks), and then, before search, analyzing regular expression, I try to find how many lines it can request and provide such multiline blocks to search engine one by one. And of course it is not possible to predict how many lines would be necessary with expression like this ".+eat"..

I think I need to switch to stream seach for regexp, it would decrease performance of regular expression search, but would be less buggy and more flexible. When - dont know yet. This would be a big change and I prefer to move this topic to 1.5, and release 1.4 with quick fixes (to at least not have crash in search for cases with multiline search) because otherwise I can delay 1.4 and introduce new bugs..

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
Re: Regular Expression: greedy option
« Reply #2 on: December 04, 2008, 10:38:05 pm »
An another issue ==> RegEx Find: ^$ didn't work,
HE says "Cannot find string '^$' "
It's expected to find empty line.

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1712
  • Karma: +29/-2
    • View Profile
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #3 on: December 08, 2008, 03:08:52 pm »
Should be fixed with 550.

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
Re: Regular Expression: greedy option
« Reply #4 on: January 20, 2009, 02:57:47 pm »
Fixed: RegEx  ^$  find empty line now. Thanks.
ToDo: correct greedy behaviour

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
Re: Regular Expression: greedy option
« Reply #5 on: January 24, 2009, 01:05:27 am »
I have re-thinked my explanation:
the problem is not an greedy issue as it, .... but an issue with "dot match new line" (?m)

Because i see now that greedy and lazy option works with HippoEDIT too:


I have this text:
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
---------

I search for RegEx .*eat and get

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

So RegEx greedy works.
---------

I search RegEx .*?eat and get
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

So RegEx lazy works too.
-----------

What i meant above was: search RegEx (?m).*eat above multi lines, to get
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great
!






EDIT:


^.*(\r|\n)*.*eat.*$
   .*(\r|\n)*.*eat.*

match on two lines, not on three or four. Perhaps i get it if i try harder?


HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
« Last Edit: January 24, 2009, 01:16:31 am by Stefan »

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1712
  • Karma: +29/-2
    • View Profile
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #6 on: January 31, 2009, 10:03:18 pm »
Hi Stefan,

yes, the problem is once more line based seach in current implmentation. Because what I am doing, I am checking how many lines you want by counting \r\n in search string, and then prepare block for search combining N lines.
I see that this is incorrect and incomplete logic and would rework it in 1.50. Dirty work but necessary ;)

 

Related Topics

  Subject / Started by Replies Last post
10 Replies
1889 Views
Last post June 30, 2010, 07:58:52 am
by Stefan
3 Replies
648 Views
Last post January 22, 2009, 03:40:27 pm
by Stefan
3 Replies
934 Views
Last post February 03, 2009, 10:08:20 pm
by Stefan
6 Replies
912 Views
Last post January 17, 2012, 07:07:24 pm
by alex
5 Replies
140 Views
Last post January 21, 2012, 06:37:41 pm
by alex