Highlighted
Super Contributor.
Super Contributor.
2665 views

Special Characters in text (non-unicode)

Jump to solution

Using Extend 9.1.1 (or 9.2) on Windows.  We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

0 Likes
1 Solution

Accepted Solutions
Highlighted
Super Contributor.
Super Contributor.

RE: Special Characters in text (non-unicode)

Jump to solution

Thanks for the suggestions.  It turns out that Extend uses Extended ASCII codes, but Windows stores characters using Windows-1252 format.  Even in Notepad, you can use ASCII codes (ALT+code) to enter a character, but that code is converted to Windows-1252 format when the file is saved.  For example, a ç character (ALT 135, hex 87) is displayed correctly in Notepad, but when saved, it is saved as hex E7.  Converting Windows-1252 codes to extended ASCII during the import solves the problem.

View solution in original post

0 Likes
4 Replies
Highlighted
Micro Focus Expert
Micro Focus Expert

RE: Special Characters in text (non-unicode)

Jump to solution

It would be good to know the under lying data type these characters are stored in(in Excel) and what COBOL data type you are using.  Have you exported them to XML and then read the XML? When you say import, are you using string or reference modification?

0 Likes
Highlighted
Super Contributor.
Super Contributor.

RE: Special Characters in text (non-unicode)

Jump to solution

Everything we're working with is string data.  Forgetting about Excel for a minute, I'm confused even by what Windows does with these special characters just using Notepad.  For example, the ç character is obtained by using the key sequence ALT 135 - this works in Extend, and Notepad.  However, if I take a text string entered in Extend and export it to a text file, and check it with a hex editor, the character is stored as Hex87 (which is decimal 135). In notepad, however, the character displays different.  If I use key sequence ALT 135 in Notepad, I get the ç character, but when I save the document and check it with a hex editor, the character is actually saved as HexE7, which is decimal 231.  There is clearly a version taking place somewhere (seemingly at a Windows level).

0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

RE: Special Characters in text (non-unicode)

Jump to solution

It was for this very reason that Unicode was developed.

I don't know the various details, but this has to do with code pages and OEM vs ANSI encoding. I suspect if you do enough Googling, you will find the answer.

0 Likes
Highlighted
Super Contributor.
Super Contributor.

RE: Special Characters in text (non-unicode)

Jump to solution

Thanks for the suggestions.  It turns out that Extend uses Extended ASCII codes, but Windows stores characters using Windows-1252 format.  Even in Notepad, you can use ASCII codes (ALT+code) to enter a character, but that code is converted to Windows-1252 format when the file is saved.  For example, a ç character (ALT 135, hex 87) is displayed correctly in Notepad, but when saved, it is saved as hex E7.  Converting Windows-1252 codes to extended ASCII during the import solves the problem.

View solution in original post

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.