AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

What about UTF-8 support?
 
 
  [ # 16 ]

mayhap. How do I tell the system to use Lucinda?

 

 
  [ # 17 ]

Sorry, Bruce. I was going to put that in my post, but got distracted (it’s trash day) and forgot.

Once you’ve got the command window open, click the icon in the upper-left corner of the window and select “Properties”. From there, you can change all sorts of settings for the window, including the font. In the “vanilla” configuration of Windows 7 there are only 3 fonts available: “Raster Fonts”, Consolas, and Lucida Console. As I said, Lucida Console seemed to handle the UTF-8 character set just fine.

If you use a shortcut (say, on your desktop) to access the command window, you can set this up permanently by right-clicking on that shortcut, selecting Properties, and making your changes to the shortcut itself. I’ve changed mine to display a larger font size, and also to display more columns and rows, making the window much more useful to me. smile

 

 
  [ # 18 ]

Cool, but no dice. No only did it NOT display my character correctly, but even if it did, doesn’t mean I have a way to do that from visual studios console window.

 

 
  [ # 19 ]

Ok, long, long time ago, I did this in C++ (10 years ago or longe), things are starting to bubble up again. If memory serves me well, it has something to do with wchar or wide characters (different types of strings.
a quick search has given me this:
http://msdn.microsoft.com/en-us/library/ms235631(v=vs.80).aspx
So, I think the question is: how are you writing the strings to screen?

 

 
  [ # 20 ]

I have a visual studio console app. So originally I was writing to the console using printf.
Trying to get utf-8 to print, I tried this:

mainbuffer is char* with utf8 characters in it.

std::wstring xx =  Utf8ToUtf16(mainBuffer);
wprintf(L”%ls\r\n”,xx);
but that didn’t improve things.

 

 
  [ # 21 ]

Mmm, probably best to do some tests. I know from past experiences that this is not a trivial thing in C++. Perhaps the char* lost it’s UTF8 char in the front?

 

 
  [ # 22 ]

yes, it is tricky. i’ve been reading a lot of forum posts without success so far.

the “char*” doesn’t ever have a header at the start… It merely has embedded multi-byte characters, correctly flagged.
And the system clearly sees there are multibyte characters because it prints out funny stuff for them, just not the umlaut characters in this data:

topic: ~x []
u: (land) übel mitgespielt worden
u: (Österreich) Schi fahren?

When I open the file containing this from inside visual studio, it does display the characters correctly. It’s just sending these to the console that has issues.

 

 
  [ # 23 ]

perhaps this can helpe you:
http://stackoverflow.com/questions/2493785/how-i-can-print-the-wchar-t-values-to-console

 

 
  [ # 24 ]

I tried std::wcout directly and translating my string thru Utf8ToUtf16.  The output of Utf8ToUtf16 displays correctly as a watch value (unlike the original c-string variable with multibyte characters - which is to be expected). But nothing at all displays via wcout. It prints absolutely nothing when given this:

std::wstring xx = Utf8ToUtf16(mainBuffer)
std::wcout << xx

but displays the old stuff if I do
std:wcout << mainBuffer

 

 
  [ # 25 ]

Well, I’m out of ideas. Perhaps some good old trial and error testing: see which combination works, what doesn’t? Have fun.

 

 
  [ # 26 ]

Dear gentlemen,

In the name of all “UTF-8-people”:
thank you for your bid for UTF-8.
Have a nice christmas-time.

All the best
Andreas

P.S.: Unfortunately, I can´t contribute to this discussion here.
But I am beta-testing the User-Manual
and will announce my first results after christmas.

 

 
  [ # 27 ]

I have attempted various experiments based on various web pages.
I “think” that the system as a server manages UTF-8 correctly.
But I can’t get the local windows console built into the stand-alone app to display utf-8 correctly.

I can’t get any simple main program to display static utf8 string of any complexity correctly.

I hereby abandon further attempts.

 

 
  [ # 28 ]

If you want me to Bruce, I can dig out my ChatScript GUI project, and do some testing with it, to see if it displays UTF-8 characters correctly (it should, since it uses the server, rather than a console window). If it works, I can add it to the Source Forge page for others to use it. Up to you, though. smile

 

 
  [ # 29 ]

I accept all help.

 

 
  [ # 30 ]
Bruce Wilcox - Dec 30, 2011:

I accept all help.

Bruce, when you process the strings, do you internally work with 8bit or 16bit strings?
Whenever we had to support wchars, everything had to be 16 bit internally. That’s also how other languages handle this (C#‘s strings are all 16 bit, I don’t know about java, but I suspect so as well).

 

 < 1 2 3 > 
2 of 3
 
  login or register to react
‹‹ failed after rebuild      Parsing features ››