Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
Mikalodeon
New Member.
2150 views

Unicode Strings

I am working with a dll where all the strings used in the funtion calls are defined as null terminated unicode strings.  I have never had to work with unicode before. Unicode seems to define each character as 2 bytes and my calls to the functions using standard PIC X fields don't seem to function properly. Any ideas on how I can make my PIC X fields unicode so that my function calls work properly?

0 Likes
5 Replies
Micro Focus Expert
Micro Focus Expert

RE: Unicode Strings

In looking into Unicode a few years back, I believe that a Unicode character is 3 bytes and sometimes 4 bytes (4 being the maximum). You will need to experiment with pic xxx or pic x(4) to hold a single Unicode character and see if the dll deals with it appropriately. I take it your getting these Unicode characters from a file or some other input .. not from an Acu GUI screen.

0 Likes
Mikalodeon
New Member.

RE: Unicode Strings

No actually the dll defines the parameters as "Pointer to a null terminated unicode string. " I want to pass in a regular PIC X(40) field containing "HELLO WORLD". And it does not work 😞 becasue it expects UNICODE. I have tried converting my string "HELLO WORLD" to unicode using the Kernal32.dll "MultiByteToWideChar" API call and passing that in but it still does not function. The dll has a very simple function where it accepts one parameter "Pointer to a null terminated unicode string. " and it's supposed to display a dialog with the string you passed, unfortunately I get nothing.. 😞  maybe I'm making the call incorrectly.

0 Likes
Mikalodeon
New Member.

RE: Unicode Strings

The resluting string from the "MultiByteToWideChar" API call looks to be "HELLO WORLD" where each character is separated by NULLS.

0 Likes
Micro Focus Expert
Micro Focus Expert

RE: Unicode Strings

[quote user="Mikalodeon"]

The resluting string from the "MultiByteToWideChar" API call looks to be "HELLO WORLD" where each character is separated by NULLS.

[/quote]

That's how a string of ASCII characters is encoded in UTF-16, which is one of the Unicode encodings.

Unicode is not a character encoding. It's a collection of code points for a great many characters, plus rules for a number of ways of encoding them. There is no such thing as a "Unicode string", so you should go back to the developer of the DLL and request clarification.

Windows NT originally used a Unicode encoding called UCS-2 LE, which uses two bytes per character. The Unicode code points for ASCII characters are the same as their ASCII values, and UCS-2 LE puts the less-significant byte first, so for ASCII characters the UCS-2 representation is the ASCII value followed by a zero (nul) byte. Note this only applies to characters with values 0-127; code points 128-255 are not actually part of the ASCII specification.

UCS-2 is now deprecated, and the last several versions of Windows have used a Unicode encoding called UTF-16 ("UCS Transformation Format, 16-bit"). It's very similar to UCS-2, except when you start getting into characters that weren't in the original 16-bit Unicode set. The details are rather complicated.

The other popular Unicode encoding is UTF-8, which is widely used - it's the default encoding for modern versions of Java and many other programming languages, it's often used on Windows in .NET programs, it's the most common encoding on the Web, and so on. UTF-8 is a multibyte encoding. ASCII characters take one byte; other characters take anywhere from 2 to 4 bytes (older versions of UTF-8 allowed up to 6 bytes). ASCII characters are the same in UTF-8, so a valid ASCII string is a valid UTF-8 string.

If the DLL you're calling didn't like your nul-delimited ASCII string (which is valid UTF-8), and didn't like the output of MultiByteToWideChar (which should be valid UTF-16), then either there's a bug in your program, or the DLL is expecting some weird encoding that's not commonly used in Windows. In any event, I suggest you go back to whoever wrote the DLL, because their documentation is incorrect (since "Unicode string" doesn't mean anything).

There's a vast amount of information about Unicode available online. The Wikipedia pages aren't bad.

As for how you do any of the Unicode encodings in an ACUCOBOL program - I'm afraid I don't know. My experience is with the non-ACU Micro Focus COBOL implementations.

0 Likes
Mikalodeon
New Member.

RE: Unicode Strings

Thanks for the information Michael.  I did email the developer and it turns out I was calling the dll incorrectly.  After calling the MultiByteToWideChar using the current threads code page (I assume UTF-8)and passing the resulting pointer to the dll I was getting a proper successful response from the query. Which is also unicode and I have to call the WideCharToMultiByte to get it back into a readable format for my use.  Anyway after a lot of mucking around I think it seems to be working properly now.

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.