Author Topic: Regular Expression: greedy option  (Read 4597 times)

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 775
  • Karma: +6/-0
Regular Expression: greedy option
« on: November 30, 2008, 11:12:01 PM »
Hi alex,

RegEx in HippoEDIT works lazy (non-greedy)

How can i search greedy?


f.ex.

I have this text
Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


Test 1:

if i was on line 'start'
and want to search for 'eat!'
with .+eat (RegEx and Ext. Selection are enabled)
i get only this selected:
Quote
start
HippoEDIT is great!

HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
This was lazy.


Test 2:

What should i do to search greedy to get:
Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


-----

I think

.*?eat!

should find lazy like:

Quote
start
HippoEDIT is great!

HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!



and

.*eat!

should find greedy like

Quote
start
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!
HippoEDIT is great!


---


What do you think?

May i suggest to let  the RegEx  be "standard" i.e.: greedy
and allow an '?'-sign to switch to non-greedy?
Stefan, HippoEDIT beta tester 
HippoEDIT - the editor programmers wants to code thyself when they are dreaming.        -Don't just edit. HippoEDIT!-

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2224
  • Karma: +37/-3
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #1 on: December 01, 2008, 05:02:27 PM »
Hello Stefan,

... unfortunately I think this is also bug. Greedy search should also work, but due to implementation algorithm it is now not possible. I think I need to redo regexp searching, and this is not very easy...
Hippoedit stores text as lines (without line breaks), and then, before search, analyzing regular expression, I try to find how many lines it can request and provide such multiline blocks to search engine one by one. And of course it is not possible to predict how many lines would be necessary with expression like this ".+eat"..

I think I need to switch to stream seach for regexp, it would decrease performance of regular expression search, but would be less buggy and more flexible. When - dont know yet. This would be a big change and I prefer to move this topic to 1.5, and release 1.4 with quick fixes (to at least not have crash in search for cases with multiline search) because otherwise I can delay 1.4 and introduce new bugs..

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 775
  • Karma: +6/-0
Re: Regular Expression: greedy option
« Reply #2 on: December 04, 2008, 10:38:05 PM »
An another issue ==> RegEx Find: ^$ didn't work,
HE says "Cannot find string '^$' "
It's expected to find empty line.

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2224
  • Karma: +37/-3
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #3 on: December 08, 2008, 03:08:52 PM »
Should be fixed with 550.

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 775
  • Karma: +6/-0
Re: Regular Expression: greedy option
« Reply #4 on: January 20, 2009, 02:57:47 PM »
Fixed: RegEx  ^$  find empty line now. Thanks.
ToDo: correct greedy behaviour

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 775
  • Karma: +6/-0
Re: Regular Expression: greedy option
« Reply #5 on: January 24, 2009, 01:05:27 AM »
I have re-thinked my explanation:
the problem is not an greedy issue as it, .... but an issue with "dot match new line" (?m)

Because i see now that greedy and lazy option works with HippoEDIT too:


I have this text:
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
---------

I search for RegEx .*eat and get

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

So RegEx greedy works.
---------

I search RegEx .*?eat and get
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

So RegEx lazy works too.
-----------

What i meant above was: search RegEx (?m).*eat above multi lines, to get
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great
!






EDIT:


^.*(\r|\n)*.*eat.*$
   .*(\r|\n)*.*eat.*

match on two lines, not on three or four. Perhaps i get it if i try harder?


HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!

HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
HippoEDIT is great!HippoEDIT is great!
« Last Edit: January 24, 2009, 01:16:31 AM by Stefan »

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2224
  • Karma: +37/-3
    • HippoEDIT
Re: Regular Expression: greedy option
« Reply #6 on: January 31, 2009, 10:03:18 PM »
Hi Stefan,

yes, the problem is once more line based seach in current implmentation. Because what I am doing, I am checking how many lines you want by counting \r\n in search string, and then prepare block for search combining N lines.
I see that this is incorrect and incomplete logic and would rework it in 1.50. Dirty work but necessary ;)