UTF-8 to ANSI conversion code is not working as expected, sample and information


Problem :

Your UTF-8 to ANSI conversion code is not working as expected, characters are shown incorrectly in output.

Resolution :

Note: How to do this can vary depending on Product, Operating System and System locale used.

For the sample below, two important parts to consider are the:
Intrinsic functions used, see code statement in sample:

move function display-of (function national-of(dl-data, 1208)) to conv-data

And also the use of PC_WIN_OEM_TO_CHAR  to set the codepage, e,g,:

call PC_WIN_OEM_TO_CHAR using test02 test02 by value 30 30.

There is a scenario where only the move statement is needed, statement "CALL " PC_WIN_OEM_TO_CHAR" will be needed to get the characters to display correctly, but this may vary.   For example. if only working with data from files that are not displayed, then this routine may not be needed.

Sample code:
      $set nsymbol(national)
      $set sourceformat"free"
       Environment Division.
       Configuration Section.
       Input-Output Section.
       Select file-download assign to "utf8datafile.txt"
              organization is line sequential
              file status is stat-key.
       Data Division.
       FD filedet.
       01 dload-rec.
          05 dl-data  pic x(4096).
       Working-Storage Section.
       01 stat-key     pic x(2)    value space.
       01 conv-data   pic x(4096) value space.
       01 test-feld    pic x(30)   value space.
       01 MFTech  pic x(8).

       Procedure Division.
       display space at 0101
       open input file-download
       read filedet next
       CALL " PC_WIN_OEM_TO_CHAR" using by value 1
       move function display-of (
            function national-of(dl-data, 1208))
         to conv-data
       unstring conv-data delimited by "#"
           into test-feld test-feld test-feld test-feld
       display dl-data(32:18) at 0801  *> Input  field
       display test-feld       at 1001  *> Output field
       accept test-feld
       stop run

For ANSI to UTF8 the intrinsic functions change, e.g.:
If running with the default ANSI charset (i.e. no call to PC_WIN_SET_CHARSET has been made) then:

  1. a)       DISPLAY-OF(n-data)                       - converts UTF-16 to ANSI
  2. b)      DISPLAY-OF(n-data, 1208)            - converts UTF-16 to UTF-8
  3. c)       NATIONAL-OF(x-data)                   - converts ANSI to UTF-16
  4. d)      NATIONAL-OF(x-data, 1208)       - converts UTF-8 to UTF-16

To convert UTF-8 to ANSI you need to do d) followed by a)
To convert ANSI to UTF-8 you need to do c) followed by b)


Comment List