Highlighted
Absent Member.
Absent Member.
3920 views

XML utf-8 encoding

[Migrated content. Thread originally posted on 16 May 2011]

Hi, i write an xml using the c$xml API encoding it in "UTF-8".
call "c$xml" using CXML-NEW-PARSER
            giving wh-descr-parser.
if we-runtime-major-version >= 8
   call "C$XML" using CXML-SET-ENCODING
                      wh-descr-parser
                      "UTF-8"   
end-if.

It happens that when i write in it some accented character like "à", the final xml results in an invalid format and parsing it with "C$XML" CXML-PARSE-FILE returns CXML-PARSE-ERROR. Also, it is not possible to open the xml with a browser (chrome or firefox) as it says that the xml has a wrong format.

i write the accented character with the following
call "c$xml" using CXML-ADD-CHILD
                   wh-gsdx-doc
                   w78-gsdx-file
                   w-str
            giving wh-gsdx-file.

where w-str is a pic x(250) and is the data string, wh-gsdx-doc is the handle and w78-gsdx-file is a level 78 constant.

Am i missing something when writing?
0 Likes
7 Replies
Highlighted
Absent Member.
Absent Member.

RE: XML utf-8 encoding

I observed that in runtime 9 the XML is created correctly. But when trying to parse it, accented characters are parsed adding a "+" sign before if, for example "tracciabilità" becomes "tracciabilit+à". How can i solve it?

Is there a way to solve the encoding issue also in runtimes before 9?
0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

RE: XML utf-8 encoding

Appendix H: Configuration Variables > H.2 Configuration Variables >

AXML_ENCODING
This variable is designed for use with AcuXML. Use it when you want to specify a character encoding method for the XML files that ACUCOBOL-GT creates. By default, the XML output generated by ACUCOBOL-GT is mapped to the UTF-8 encoding system (compatible with the US-ASCII character set). If you want to use a different encoding system, for instance a European encoding system that includes the British pound character (£), change this variable to reflect the new system name. For example:

AXML_ENCODING IS0-8859-1
This variable causes encoding information to be added to the header of XML files created by ACUCOBOL-GT. With the configuration file entry shown above, the following header would be included:


This header causes the ISO-8859-1 Latin encoding system to be applied to the data file as desired.

AcuXML supports the following encoding systems:

•UTF-8, default [8-bit Unicode Transformation Format, backwards compatible with US-ASCII]
•US-ASCII
•UTF-16 [16-bit Unicode Transformation Format]
•ISO-8859-1 [Latin 1, European encoding]


0 Likes
Highlighted
Absent Member.
Absent Member.

RE: XML utf-8 encoding

Thank you for reply; i'm not using AcuXML, i'm using the c$xml function.
0 Likes
Highlighted
New Member.

RE: XML utf-8 encoding

Sorry for replying to an old message but I encountered the same problem with extend 10.1.0 and using C$XML. I was convinced that using the statement:

CALL "C$XML" USING CXML-SET-ENCODING XML-PARSER-HANDLE(XMLF),
"UTF-8".

would solve this problem.

é, Ä, ä è etc. are left out of my XML file. So the name André is Andr in my XML file.
0 Likes
Highlighted
Micro Focus Expert
Micro Focus Expert

RE: XML utf-8 encoding

Using 10.1 have you set COBOL_CHARACTER_SET UTF-8 in your runtime configuration file?
0 Likes
Highlighted
New Member.

RE: XML utf-8 encoding

I added the COBOL_CHARACTER_SET UTF-8 to my configuration file. That mutilated the data in my entry-field.

 

  became  and in my XML file still Andr

 I edited the data in the entry-field to André (COBOL_CHARACTER_SET UTF-8 still in my configuration file) and made a new XML file. Still Andr in my XML file.

Here is a part of my XML file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- C:/BED/LIN/Export/WERKN.xml - generated by ACUCOBOL-GT v10.1.0 + ECN-4454 + ECN-4463 + ECN-4464 + ECN-4465 on 2017/03/28 -->
<InstallWare>
<Werknemers>
<Werknemer
Nummer="004020"
Zoekcode="BOURGONJ"
Initialen="ANBOU">
<Naam>Bourgonje</Naam>
<Voorletters>A.J.</Voorletters>
<Roepnaam>Andr</Roepnaam>
<Adressen>

0 Likes
Highlighted
New Member.

RE: XML utf-8 encoding

I made a stupid mistake in my program. It turned out that I dit not need the encoding "UTF-8" and the COBOL_CHARACTER_SET UTF-8. It works fine without. Now André is André again in my XML file 😉
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.