AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

JSON parsing of strings
 
 

If I go and fetch the following JSON string from a remote web service

{"8758665P2f":{"surname":"McKenzie","forename":"Scott","preferredName":"Scott","fullName":"Mr Scott McKenzie"}} 

and parse it, then it gets turned into the nice set of facts:

create jo-1 surname Mckenzie 0x12110 Created 216982
create 
jo-1 forename Scott 0x12110 Created 216983
create 
jo-1 preferredName Scott 0x12110 Created 216984
create 
jo-1 fullName `Mr Scott McKenzie0x12110 Created 216985
create 
jo-0 8758665P2f jo-1 0x14110 Created 216986 

The only problem is that the value of the surname property has been normalized (the uppercase K has become lowercase).

Is there anything I can do to influence the parsing to stop this from happening, or do I have to change the service to quote the value in the JSON string?

I note that the property names/verbs are not normalized.

 

 

 
  [ # 1 ]

Note, I’ve tried setting $cs_token to #UNTOUCHED_INPUT before the parse, to no avail.

 

 
  [ # 2 ]

The issue is that CS only keeps a single form of spelling of an upper case word. And the dictionary had data that gave it Mckenzie.  I have modified the raw dictionary data to fix this spelling (and a bunch of other Mck names) for the next CS release. You can patch the specific instance at present by going into the DICT/ENGLISH/m.txt file and globally changing Mck to McK.  Then erase dict.bin.  When you restart cs, it will rebuild dict.bin from those text sources, and your problem should be gone.

 

 
  [ # 3 ]

I guess it is the same for Mac* as well.
Would the Irish O’* be impacted also?

 

 
  [ # 4 ]

Mac* is not a global one. Specific names would have upper case next character and others may not.
And O’ has to be fixed individually, but I will do that for the dictionary for next build since it would be universally upper case.

 

 
  [ # 5 ]

There will always be issues because some people are Macpherson and some are MacPherson, etc.

If you could guarantee your service didn’t have multiple spellings, you could do ugly hacks like putting an * in front of the name so it would be unique in the dictionary, and removing the * at the last minute when you tried to use the value of the fact. But it is ugly.

 

 
  login or register to react