Unicode Strings

I am working with a dll where all the strings used in the funtion calls are defined as null terminated unicode strings.  I have never had to work with unicode before. Unicode seems to define each character as 2 bytes and my calls to the functions using standard PIC X fields don't seem to function properly. Any ideas on how I can make my PIC X fields unicode so that my function calls work properly?

  • In looking into Unicode a few years back, I believe that a Unicode character is 3 bytes and sometimes 4 bytes (4 being the maximum). You will need to experiment with pic xxx or pic x(4) to hold a single Unicode character and see if the dll deals with it appropriately. I take it your getting these Unicode characters from a file or some other input .. not from an Acu GUI screen.

  • No actually the dll defines the parameters as "Pointer to a null terminated unicode string. " I want to pass in a regular PIC X(40) field containing "HELLO WORLD". And it does not work :-( becasue it expects UNICODE. I have tried converting my string "HELLO WORLD" to unicode using the Kernal32.dll "MultiByteToWideChar" API call and passing that in but it still does not function. The dll has a very simple function where it accepts one parameter "Pointer to a null terminated unicode string. " and it's supposed to display a dialog with the string you passed, unfortunately I get nothing.. :-(  maybe I'm making the call incorrectly.

  • The resluting string from the "MultiByteToWideChar" API call looks to be "HELLO WORLD" where each character is separated by NULLS.

  • The resluting string from the "MultiByteToWideChar" API call looks to be "HELLO WORLD" where each character is separated by NULLS.

    That's how a string of ASCII characters is encoded in UTF-16, which is one of the Unicode encodings.

    Unicode is not a character encoding. It's a collection of code points for a great many characters, plus rules for a number of ways of encoding them. There is no such thing as a "Unicode string", so you should go back to the developer of the DLL and request clarification.

    Windows NT originally used a Unicode encoding called UCS-2 LE, which uses two bytes per character. The Unicode code points for ASCII characters are the same as their ASCII values, and UCS-2 LE puts the less-significant byte first, so for ASCII characters the UCS-2 representation is the ASCII value followed by a zero (nul) byte. Note this only applies to characters with values 0-127; code points 128-255 are not actually part of the ASCII specification.

    UCS-2 is now deprecated, and the last several versions of Windows have used a Unicode encoding called UTF-16 ("UCS Transformation Format, 16-bit"). It's very similar to UCS-2, except when you start getting into characters that weren't in the original 16-bit Unicode set. The details are rather complicated.

    The other popular Unicode encoding is UTF-8, which is widely used - it's the default encoding for modern versions of Java and many other programming languages, it's often used on Windows in .NET programs, it's the most common encoding on the Web, and so on. UTF-8 is a multibyte encoding. ASCII characters take one byte; other characters take anywhere from 2 to 4 bytes (older versions of UTF-8 allowed up to 6 bytes). ASCII characters are the same in UTF-8, so a valid ASCII string is a valid UTF-8 string.

    If the DLL you're calling didn't like your nul-delimited ASCII string (which is valid UTF-8), and didn't like the output of MultiByteToWideChar (which should be valid UTF-16), then either there's a bug in your program, or the DLL is expecting some weird encoding that's not commonly used in Windows. In any event, I suggest you go back to whoever wrote the DLL, because their documentation is incorrect (since "Unicode string" doesn't mean anything).

    There's a vast amount of information about Unicode available online. The Wikipedia pages aren't bad.

    As for how you do any of the Unicode encodings in an ACUCOBOL program - I'm afraid I don't know. My experience is with the non-ACU Micro Focus COBOL implementations.

  • Thanks for the information Michael.  I did email the developer and it turns out I was calling the dll incorrectly.  After calling the MultiByteToWideChar using the current threads code page (I assume UTF-8)and passing the resulting pointer to the dll I was getting a proper successful response from the query. Which is also unicode and I have to call the WideCharToMultiByte to get it back into a readable format for my use.  Anyway after a lot of mucking around I think it seems to be working properly now.