Author Topic: File encoding DOS OEM 850 852 and Windows 1250 1252  (Read 1955 times)

Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
File encoding DOS OEM 850 852 and Windows 1250 1252
« on: June 05, 2009, 05:00:42 pm »
File encoding DOS OEM 850 852  and Windows 1250 1252


I am puzzled

Right now i had problems with encoding:

     i have dos batch file with umlauts in 1250.
     I open that in HE and get an 850 code page
     I add an space and press Ctrl+S (to save in "dos" codepage)
     But the file itself is still 1250 (or at least, i see umlaute in windows code page by executing the cmd)


I have this file content:
ECHO üöäÜÖÄ??? written as 1252

Then i switch 852 and add
ECHO converted to dos 852
PAUSE
and save it



But  I have still this file content:
üöäÜÖÄ??? written as 1252
converted to dos 852






As i see on the status bar, HE convert to unicode instead ansi chars?

ECHO Ř÷ń▄Í─??? written as 1252

0344
0247
0324
6904

Right numbers would be
132=228
142=196
148=246
153=214
129=252
154=220
225=223


-----------

Question: should HE encode to 850 if this is selected?
Stefan, HippoEDIT beta tester  (HippoEDIT News On Twitter: http://twitter.com/hippoedit/)

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1712
  • Karma: +29/-2
    • View Profile
    • HippoEDIT
Re: File encoding DOS OEM 850 852 and Windows 1250 1252
« Reply #1 on: June 06, 2009, 02:19:16 pm »
Brr... Ok, Once more :)

HE works internally with Unicode always. So you can type what ever you like. And then save it in file using selected encoding.
Encoding for document can be auto selected (then it is not stored between session, but always used auto determination) or manually selected (or from view menu, or from Status bar menu or because you have used special encoding when opening file with file open dialog). In case of manual selection of the encoding it is saved between sessions and not auto detected next time.
The encoding is used only in time of opening or saving of the document. Internally you are always working with Unicode.

This is how it should work.

I have not understand example completely, but possible reason can be that you have done conversion after modifying of text! So, initial conversion was done wrong, and character were converted wrong, then you add something and change encoding. In this case HE could not use any more original contents for conversion on save. And tries to convert existing Unicode (with already wrong converted symbols) to new code page with double conversion as UTF16 -> 1252 then 1252 -> 850 then 850 -> UTF16...

Stefan, can you please attach here original file and repeat instruction once more :/

Meanwhile I would try to use existing one to reproduce "bug", if it is a bug.

Best regards,
Alex


Offline Stefan

  • Administrator
  • Hero Member
  • *****
  • Posts: 774
  • Karma: +6/-0
    • View Profile
Re: File encoding DOS OEM 850 852 and Windows 1250 1252
« Reply #2 on: June 08, 2009, 03:22:49 pm »
> HE works internally with Unicode always.

But from point of user view i am  don't  be interested how HE works internally.
If i switch to 852 i like to SEE my chars with codepagenumber  from 852 code page, not from unicode code page:
        1252            852           Unicode
ü    0252  FC         129 81        0344
Ü    0220  DC        154 9A         9604

And if i switch to 852  i like to have the chars encoded with the correct/related number of this code page.
I have attached an pdf (password protected) with some tests . EDIT: it's too big for att, have to send it this evening

I am always confused with this code pages :D  and now with HE's unicode it's even an bit stronger :D;D
Maybe i am on an totally wrong track ?
I don’t understand this.


I don’t want to see unicode chars and numbers 
if i have  ANSI 1252  or ASCII 852  encoding enabled.

--
>that you have done conversion after modifying of text!

That should be no issue for users.
When ever i switch cp, HE should convert the file and after that HE should use this CP.

So if i switch CP and then type some text 
it should bet he same as if i modify text and switch code page then.
And if i switch CP at the end again  and save then the file, HE should save in that last CP.


Sorry, if i didn’t find the right words. I don't mean that rigorous.
« Last Edit: June 08, 2009, 03:50:07 pm by Stefan »

Offline alex

  • Developer
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1712
  • Karma: +29/-2
    • View Profile
    • HippoEDIT
Re: File encoding DOS OEM 850 852 and Windows 1250 1252
« Reply #3 on: June 09, 2009, 11:18:33 am »
Hello Stefan,

I think you are wrong here. And the logic you are requesting is not standard.

1) Check any Unicode editor. For example emeditor, or even notepad. They works exactly as HE.
2) Conversion between code pages is not lossless. If destination code page does not contain corresponding symbol, you could not convert it back. Only Unicode code pages conversions are working without destroying of text.
3) What you suggest for typing of symbol, which does not exist in current code page?

Quote
So if i switch CP and then type some text 
it should bet he same as if i modify text and switch code page then.
No, based on 2, if you do not have the original and do conversion between non-unicode cp you never can get same result. HE tries to hide this (reloading text from file, which is original), but this does not work if text is already modified.

Maybe I can think about representation of char code in status bar in destination code page as alternative.

Or you can use not Unicode version HE. Then you would get behavior you like (the missing in current CP symbols would be displayed as ?, this is how it is done in TextPad)

Best regards,
Alex.

 

Related Topics

  Subject / Started by Replies Last post
4 Replies
476 Views
Last post April 16, 2009, 08:04:33 pm
by alex
5 Replies
752 Views
Last post April 16, 2009, 10:06:52 pm
by Arthur
1 Replies
812 Views
Last post June 06, 2009, 01:53:00 pm
by alex
5 Replies
1200 Views
Last post June 08, 2009, 08:49:34 pm
by alex
2 Replies
274 Views
Last post September 03, 2010, 11:29:07 am
by photon