Adding new Language Syntax

Started by alex, October 29, 2008, 01:10:00 AM

Previous topic - Next topic

alex

INFO: Keep this thread clean please
For discussing or questions please use "Adding new Language Syntax - Discuss thread" >click<




Hello,

I would try to write some instruction here, based on your questions and then would move the thread to FAQ brunch.

The definition of the programming language (syntax) in HippoEDIT is based on two files: {lang_name}_spec.xml and  {lang_name}_user.xml. Names of the file is not important – definition recognized by XML header. {lang_name}_spec.xml contains general definition of the syntax and is obligatory. {lang_name}_user.xml contains definition of the user specific settings for language (such as code templates, programming language specific tools, language help urls etc.) and is optional. Definition files should be placed in directory defined in Tools -> Options -> General -> Settings Path. By default this is {HippoEDIT_InstallDir}\data\syntax.
To create your own syntax I would suggest to search some existing schema for language similar to one you want to have and copy + rename files.
Then open new files and search for something similar to this:
Code (xml) Select
<SYNTAX id="asm" name="ASM" inherit="def_source" inherit_url="defsource_spec.xml">
Attributes:

  • id (any string, preferable low case, without spaces and symbol : ) – unique id of the language (obligatory)
  • name (any string) – description of the language that would be used in UI
  • inherit (any id) – name of base (parent) schema. New schema would inherit all settings, styles of   parent. Normally you need to inherit from def or def_text or def_source. These schemes contain base definition for styles, without them a lot of functionality would not be available.
  • inhertit_url (file path, relative or absolute) – name of parent schema file. Used only fir navigation between schemes when opened in browser (optional).
After creating of the definition files and copying them into {Data} folder, HippoEDIT should load definitions and display them in Available Languages list (Tools->Options->Available Languages).

Code (xml) Select
<SPECIFICATION>
This node contains base settings for syntax (as file pattern, braces list, case sensitive flag etc).
Here:
Operators - set of single symbols, which will be treated as operator symbols by HE and displayed with operator style, if it is defined/inherited for current schema. Operator style is one of special HE known styles (as comments fex).
Delimiters - set of single symbols, which used to determine delimiter symbol (symbol which stops/starts word). You can place here only that symbols which do not come to operators or OpenClose node. If they will intersect, it is not problem. But if some of the delimiter symbols will be forgotten, HE will "stop" on it during next word navigation, for example.


STYLES

SCOPES

This node contains Scopes that used for parsing outlining (folding). Inheritable from parent.

Code (xml) Select
<Scope>
This node described specific scope. Scope should have one [1] open, can be more than one [1.n] close tags and none or more middle tags [0..n].
Code (xml) Select
<Scope open="Class" close="End Class" has_name="true" separator="true"/>
Attributes:

  • open (any string) – is open tag
  • close (any string) – is close tag
  • has_name (true|false|0|1) – indicates that name of the scope follows open tag
  • separator (true|false|0|1) – draw separator after close tag (if enabled in editor settings)
  • strict (true|false|0|1) – flag tells HE to not take seriously missing close text for this scope. So you would not get a error displayed for open tag. The attribute also used for better solving of outlining constructions (strict scopes have higher priority than non-strict ).

LABELS

Code (xml) Select
<LABELS>
   <Label group="Subroutine" match="\&lt;(sub|function)[\s\[]+(\w+)[\s\]]*(\([^)]*\))" name="\2" descr="\1 \2 \3" image="8" scope="1">
      <Image if="\1" equal="sub" value="8"/>
   </Label>
   <Label group="Class" match="\&lt;(public|private)?\s+class\s+(\w+)\s*" name="\2" descr="Class \2" image="4" scope="1">
      <SubImage if="\1" equal="public" value="1"/>
   </Label>
</LABELS>


Labels give you a way fo quick navigation inside the document with Navigation Bar (can be also called as Function List). It can be as a method or function definition, as include definition. Generally it can be any part of the code you want to refer.
Labels are described with help of regular expression.
HippoEDIT uses BOOST regular expression engine, which use perl regular expression syntax.
For testing of the label definition I am using RegexLib service.
Attributes:

  • group (any string) - group of the label, currently not used, but would be used later for displaying labels in Function List window grouped by label name
  • match (any valid regular expression), obligatory - regular expression describing the label. Expression can be multi line and greedy. You can use sub matches (grouping) for later referring.
  • name (any string) - any string describing found label. name is displayed in left filtered list in Navigation Bar. You can refer to results of the match using regular expression replace tags as \0 complete match, \1 - firts group, \2 second and so on...
  • descr (any string) - wider description of found label. descr is displayed in right (description) field in Navigation Bar. You can refer to results of the match using regular expression replace tags as \0 complete match, \1 - firts group, \2 second and so on...
  • descr_match (any valid regular expression), obligatory - additional regular expression for better resolving of the description. Applied to result of the match. If exist, results of descr_match would be used for description back references.
  • image (enumeration from 5 till 20, none=default=0) - image associated with label. Image is displayed as in left list of Navigation Bar as in right description field.
  • sub_image (enumeration from 2 till 4, none=default=0) - sub image associated with label (image drawn on top of main image). Usually used for visualization of label visibility (public, protected, private).
  • scope (0|1|2) - defines relation of the label to any scope. 0 - not related to scope, 1 - label includes relevant scope start, 2 - scope includes relevant scope end. If label is related to scope (1|2) HippoEDIT would try to find appropriate scope and associate label with it. Then you would see label description as name of the scope while navigating in Navigation Bar and in Scroll Info Tip.
  • navigation (true|false, default = true) - if set, then label would not be shown in Navigation Bar, but would be used in Smart Navigate and when Go button pressed in Navigation Bar. For example to navigate to include file.
  • navigation (true|false, default = true) - if set, then label would not be shown in Navigation Bar, but would be used in Smart Navigate and when Go button pressed in Navigation Bar. For example to navigate to include file.

In addition to listed before properties, you can have some overriding of the label properties, depending on regular expression match.

Skip condition:
Code (xml) Select
<Skip if="\2" equal="if"/>
If you want to skip some labels depending on result of the match, you can use a Skip Condition defined inside of Label node.
Attributes:

  • if (some regular expression string with back references to match) - regular expression with back references to match which would be compared to equal
  • equal - string to compare with result of regular expression replace from attribute if
If if equals to equal the label would be skipped.

Image/SubImage condition:
Code (xml) Select
<Image if="\1" equal="sub" value="8"/>
<SubImage if="\1" equal="private" value="3"/>

If you want to adopt image or subimage based on result of label match, you can useImage/SubImage Condition defined inside of Label node.
Attributes:

  • if (some regex string with back references to match) - regular expression with back references to match which would be compared to equal
  • equal - string to compare with result of regular expression replace from attribute if
  • value - enumeration value of the image or sub image
If if equals to equal the label would have image or sub image defined in value.

To restrict label for some style, you can also use Containers node inside Label node.
Code (xml) Select
<Containers open="preprocessor"/>

CONTAINERS - how to use

The logic is following:
1) You have styles. Styles are described by Blocks, Keywords, and Words conditions and applied to some interval of the text (text block).
2) Styles can be referred by style id. If id is not defined HE uses name attribute.
3) When you define container (or containers) for Scope or another schema object, you allow this Scope, Style or Label etc to be recognized only in the style specified by container id.

For example this mean:
Code (xml) Select
<Scope open="{" close="}">
   <containers open="test_container_name"/>
</Scope>
Scope with open tag "{" allowed only in the style "test_container_name". If it would be found in some other style, it would be skipped. If you do not specify containers, HE assumes that this object (scope, style or label) allowed only in normal style (style with id = normal, defined in def_spec.xml).

For objects as Scope and Style you can define containers as for open part, as for close part (for scopes also for middle part). For example CSS block in HTML can be started in normal style of HTML code and can be closed inside CSS normal or comment style.
Code (xml) Select
<Containers>
   <Close id="css:normal"/>
   <Close id="css:comment"/>
</Containers>

If you do not define open container, it would be normal style of current language.

If you want to refer to a style from another syntax schema, you can do this as well, by giving syntax id before style id separated with ":"  :
Code (xml) Select
<Style id="style" name="CSS" include="css:normal" ...
This is including of the CSS block in HTML syntax schema.

IMAGES
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

alex

#1
IMAGES


234567891011121314151617181920

It is possible to customize images for styles and margin for every syntax schema or family of schemes (based on schema inheritance).
For this you can use properties: StyleBitmap, MarginBitmap. Both are inheritable
Code (xml) Select
<SPECIFICATION>
    <StyleBitmap>images\vs_styles.bmp</StyleBitmap>
    <MarginBitmap>images\vs_margin.bmp</MarginBitmap>
</SPECIFICATION>

Property is a path to an image list (as relative as absolute paths are supported) in 32 bit bmp format (with Alpha channel).
Image list is a horizontal bitmap with height 16 px containing of smaller images with 16 px width each.
2,3,4 normally is used as sub image. 1 is not shown but used as switch for inherited not inherited members in completion list.
If property is empty or points to missing file, defaults are used. Images, referenced in styles etc, taken from same positions from image list.
Alternative image lists can be also included in syntax bundles.
As an example (comes with default installation) you can check css_ms_spec.xml and c++_ms_spec.xml, images are in data\syntax\images sub folder.

I think good new set of completion images also cost a free license ;)
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

alex

has_name
this flag was used before labels were added to extract name of scope from text. If has_name set to true, HE takes next word after the open tag as name of the scope (for example function foo(), if function is open scope tag, foo would be selected as name)

separator
if this flag is set, and you have Scope Separators enabled for document/syntax, then after the scope with this flag in source code would be drawn horizontal line.

strict="false"
flag tells HE to not take seriously missing close text for this scope. So you would not get a error displayed for open tag and this also used or more correct solving of outlining constructions.
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

alex

There is no specific order in XML tags necessary.
But for keywords it is better to keep alphabetic order - this way internal loading faster.

All tags and attributes are CASE SENSITIVE.

All not English text should be correctly encoded in UTF-8

You can use comments, comment some parts etc.

Because it is XML some symbols should be converted to entity (even inside attributes):
<    =>     &lt;
>    =>     &gt;
&    =>     &amp;
"     =>     &quot;


If you want to use some where (not everywhere allowed) line breaks or tabs you can represent them as following:
Any kind of line breaks (\r\n, \n\r, \r, \n)   =>     \n  - all line breaks use dos style at the end (\r\n)
Tab character (0x09)                                 =>    \t
\n combination                                          =>    \\n
\t combination                                          =>    \\t
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]

alex

#4
STYLES

This node described styles that used as for colorizing of the text as for some UI element as editor indicator margin for example. Inheritable from parent.

Code (xml) Select
<Style>
This node describes specific style.
Attributes (everything is case sensitive):

  • id (any string, preferable low case, without spaces and symbol : ) – unique id of the style, used to be referenced from other areas of schema. If not provide name attribute is used as id.
  • name (any string) – Human name of the style, displayed in Options dialog etc. If id is empty, then obligatory.
  • abstract (true|false) – if set style would not be visible in user settings. Can be used for example if you only want to define visibility range for some another style, and do not want to syntax highlight it
  • dbkclr (true|false) – technical attribute, which informs HippoEDIT (HE) that back color selection should be disabled.
  • dstyle (true|false) – technical attribute, which informs HippoEDIT (HE) that style changing should be disabled (bold, italic, underline).
  • bold/italic/underline (0|1|2) – style attributes of the text. 0 - false, 1 - true, 2 - undefined, that means, inherit from underling style (not parent syntax, but style in document).
  • hotspot (0|1|2) – HE would show hand cursor, when hovering over this style with Ctrl, and would try to navigate to it, if clicked. 0 - false, 1 - true, 2 - undefined (inherited).
  • overview (0|1|2|3) – show or not this style in overview bar. 0 - false, 1 - true, 2 - undefined, 3 - disable check in UI.
  • extend = false (true|false) – technical attribute, which tells HE that parent style should not be overridden but only extended by some properties. Default is false. If extended = true, then style with same id should exist in some parent syntax. In UI you would see that style still belongs to parent syntax. If you have defined style with same name as already defined in some parent, and don't use extend=true then on UI as the owner of style would be current syntax, and all changes would be applied to it, but not to parent.
  • exclude = false (true|false) – technical attribute, which tells HE that parent style should not be inherited and used in the current and all child schemes. Default = false. Can be used if you do not want to inherit some style from parent syntax.
  • override = false (true|false) – technical attribute, mostly same as extend, but only resets inherited keywords, blocks and words. Default = false. Can be used if you want to completely overwrite definitions for style from parent schema.
  • include (any id of existing style with/without syntax prefix) – technical attribute, which tells HE that style should embed another style, or from this syntax (rare case) or from another existing syntax. Can be used if you want to embed another existing syntax in your syntax. As example JavaScript or CSS  in HTML or CDATA in XML etc.
  • inherit (any id of existing style with/without syntax prefix) – technical attribute, which tells to copy information from other style to this one. Copied are keywords, blocks, settings, but not containers. This is a better way than duplicate keyword sets. Example can be found in css_spec.xml (inherit="html:elements")
  • text = 0 (0|1|2) – technical attribute, which tells HE that style should not be treated as "not important". Default = 0. Used for string or comment styles, for example. HE use this information, to skip collection of statistics for such style (used for completion) and some other purposes.
  • clr/bkclr (RGB|RGBA|Palette color|System color) – text (clr) or background (bkclr) color of the style. As color you can use as explicit value in RGB or RGBA (with alpha) format. Like #FFAAFF (RGB, no transparency) or #FFFFFFFF (RGBA, transparent, last FF is value of color transparency 0 - opaque, FF - 100% transparent). Transparency of color is used, when you have one style on top of another; for example current line style on top of search results on top of .... You can also use System color. It can be specify as this $0500 (Windows Text color, opaque), where first 05 is value system (correspond to Windows system color constants) color and second 00 is transparency. Because there is no list of the system color yet published, better to use configuration dialog for selecting. And most preferable way is to use Palette colors. Then value of clr or bkclr should correspond to some named color from current color scheme (can be found in HE data\colors, for example, color_default.xml ). In case of using palette colors you can be sure that you would not get black on black, if somebody would select color scheme with black background for document...
  • image (enumeration from 5 till 20, none=default=0) - image associated with text style. Image can be displayed in Font & Colors dialog and in Completion List.

STYLE  -> Containers

To control, scope of the styles (in which other style, this style can be used), you can use Containers. If no containers for the style defined (also in parent syntax), then HE assume, that style can appear only inside normal (defined in def syntax) style.
Code (xml) Select
<Containers>
    <Open id="normal" exclude="true"/>
     <Open id="string"/>
     ....

Here <Open ..> says where style can start, and <Close ...> where style can end. Normally it is sufficient to only define Open containers, if no no Close containers defined, HE assume that style can be closed in same style where it starts (this is not the case for embedded syntaxes).
If you want to exclude some parent definition of container, you can add to the list container with id defied in parent syntax and add attribute exclude (<Open id="string" exclude="true"/>).
It is also possible to address style from another existing syntax, for this add syntax id before style id separated by : , like this <Close id="js:normal"/>.


STYLE -> Blocks

You can describe style as with so called blocks:
Code (xml) Select
<Blocks>
    <Block open="{" close="}"/>
</Blocks>

Which used for matching continues space between open and close tags.
Attributes:

  • start_pos. Indicates from which position style should start (position of open tag).
  • first_pos. Alternative for start_pos used , when you wont to specify first non white space position in the line. start_pos is absolute, and count all symbols. Can be used, for example for specifying comment style starting if found on before any non-white space character (inno_spec.xml).
  • noneof. Indicates, that tag stops, if next character is not found in provided set (like regular expression [0-9]* ). This attribute can be only used together with close tag (you need to define it in a node way <Close noneof="0-9"/>).
  • anyof. Indicates, that tag stops, if next character is found in provided set (like regular expression [^a-z]* ). This attribute can be only used together with close tag (you need to define it in a node way <Close anyof="a-z"/>).
  • text. Indicates that block should be treated as sequence (not open as close pair). In this case only char sequence provided in text attribute is highlighted. This can be used as workaround, if you want to define "keyword" which contains characters not mentioned in Words node.  The attribute can not be combined with open or close tags.
  • If no close tag is provided (not as close attribute for block, not as Close node for block) than block is also treated as sequence (see text attribute). start tag than has same meaning as text. Better use text attribute, because missing close tag, when open is provided, is generally error in definition.

The open tag is obligatory fixed (you can not use any dynamic condition here, till now). open tag should be unique in complete schema, otherwise  first definition wins.
There are several predefined attribute values:

  • If close tag contains symbol "\n", this mean that block ends by line end.
  • If close tag is empty close="", this mean that block is closed by first delimiter (space or one defined in schema).

STYLE  -> Keywords

Or Keywords:
Code (xml) Select
<Keywords>
   <Keyword text="aaa"/>
   <Keyword text="aad"/> ...

for matching keywords. Keyword should contain only characters defined in node <Words>0-9A-Za-z_</Words> of the <SPECIFICATION> node or space, for multi word keyword.
Attributes:

  • text - keyword text. Case sensitivity depends from appropriate setting in SPECIFICATION.
  • lead_with - keyword text is leaded by character sequence (case sensitive).
  • trail_with - keyword text is word trailed by character sequence (case sensitive).
  • descr - Allows you specify details for keyword, displayed on hovering or by request, when QuickInfo (Ctrl+Shift+Space) called on some keyword.
  • pattern - specify, how keyword should be inserted by code completion. You can also use template tags inside. %CurrentWord% tag in this context mean keyword text.
All attributes can be also placed on aggregation level (Keywords  node). This useful if you have same definition for all keywords or for most, to not duplicate entries for every keyword. In this case property on level of aggregation treated as defaults and you can then redefine it for some specific keyword.

STYLE  -> Words

Or Words:
Word definition catches any word (continues group of symbols from Words node in Specification) which meet defined conditions.
Attributes:

  • lead_with - word leaded by character sequence (case sensitive).
  • trail_with - word trailed by character sequence (case sensitive).
You can have mor then one Word definition, in Words aggregation inside style. Word definitions are inheritable and working as OR condition. To have AND condition you can use lead_with and trail_with in same Word definition.
HippoEDIT team
[url="http://www.hippoedit.com/"]http://www.hippoedit.com/[/url]