AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

A small escapade in Computer Vision
 
 

I recently found a code that allowed me to print-screen other windows and patch the images to my A.I., so I used it as a cheat to get webcam video. In the following two weeks I programmed a self-calibrating video noise filter, a blur filter, motion detection, object detection, edge detection and colour-to-word translation, getting the gist of the methods from tutorials.
It was interesting to do, and the result is a very brief example of one possible use.
(which would have been more natural if voice recognition would allow my hands freedom. You’ll just have to imagine that)

The exercise made it very clear that a lot of the common knowledge humans are so proud of, are just sensory input like colours, shapes and sizes and putting a word label on it. So computer vision can definitely help expand a common knowledge database.

My vision system still has instabilities, is blind in the dark, gets distracted by shiny objects, and two weeks of work won’t equal Google’s neural net training. If I need something better I’d just install OpenCV like everybody else, but there is the question:

If your stationary chatbot had modest computer vision abilities, what would you use them for?

 

 
  [ # 1 ]

First on my list would be to recognize user faces.

 

 
  [ # 2 ]
Don Patrick - Nov 20, 2014:

If I need something better I’d just install OpenCV like everybody else, but there is the question:

If your stationary chatbot had modest computer vision abilities, what would you use them for?

Very cool integration between video input and your chatbot.

OpenCV is a nice intro to computer vision, but no need to reinvent the wheel. I have fooled around with computer vision a while back and still think the Kinect type sensor is the best thing out there.  You can get all kinds of cool info via the API- light or dark vision, face, lip, expression, number of people, gestures and voice recognition.  Used it to make a person tracking talking monkey (from a WowWee Alive Chimp) head that was pretty cool.  You could have a really interactive bot if it could respond to motion/gestures, but being able to process objects is the most interesting aspect.

 

 
  [ # 3 ]

Face recognition (i.e. identification) would mostly serve as an alternative to typing a login name, I imagine.

I thought OpenCV was supposed to have all face recognition functions ready for use. Thanks for the tip Carl. I only know Kinect from console games, but the addition of its depth sensor is quite valuable. With just a webcam one can’t really tell scale and size of things. I agree that, just as with TTS and voice recognition, it is not necessary to reinvent if other people are getting paid to do the same and make it available. This exercise was mostly to gain insight into the possible uses and see how much computer vision is worth integrating.

 

 
  [ # 4 ]

So I got frustrated with programming relative clauses and built another computer vision function. It’s like edge detection but for desktop graphics. My earlier object detection looked at video frame difference, but that doesn’t work with a static desktop.

Combined with commands like “This is a button”, this could be used to teach a program to recognise, locate and interact with on-screen elements. It also occurred to me that I could save the object image to file and have the AI re-examine the image’s pixels whenever asked for the colour of an apple, and then maybe the “symbol grounding” supporters would be satisfied. But it would be 1000x less efficient than just saving the word “red”.

 

 
  [ # 5 ]

Hi Don,

Don Patrick - Apr 25, 2015:

It also occurred to me that I could save the object image to file and have the AI re-examine the image’s pixels whenever asked for the colour of an apple, and then maybe the “symbol grounding” supporters would be satisfied. But it would be 1000x less efficient than just saving the word “red”.

As soon as you save the object, a ‘pixel color’ is the exact same as the word that represents that color: a symbol wink

The ‘grounding’ is in the fact that you map a ‘sensed’ color (outside, sensory perception) to the internal ‘symbol’. So if the sensor gives a color value for ‘red’ and you store this value as being ‘the sensory precepted value’ for ‘red’, you have (to a certain degree) grounded the symbol ‘red’ with sensed perception. The next time the system ‘sees’ that value, it will subsitute it with the (grounded) symbol ‘red’.

 

 
  [ # 6 ]

Since the stored information is the same, as we seem to agree, it seems to me that the only difference between grounded and not grounded is whether the input was received directly or second-hand.

 

 
  [ # 7 ]
Don Patrick - Apr 26, 2015:

Since the stored information is the same, as we seem to agree, it seems to me that the only difference between grounded and not grounded is whether the input was received directly or second-hand.

As I said ‘to a certain degree’, as grounding needs (past) experience, i.e. the previous input from the sensor being ‘recognized’ as re-occuring, but also ‘perception’ as in being able to value (or evaluate) the new incomming signal in a broader context of past ‘experiences’. Basically, the system needs an internal representation of not only the ‘information’ (knowledge model), but also some sort of historical trail of weighted contextual values, to do actual ‘symbol grounding’.

Mind you, a system that is setup to handle this, is perfectly capable of grounding ‘second-hand’ received information. Sensor input (sensory perception) simply makes for what we call a ‘richer contextual representation’, which basically helps to get much better grounding at once.

 

 
  [ # 8 ]

20 April 15: Alt Text Bot Image Descriptions FTW

The Twitter bot @alt_text_bot by @ckundo is now using image recognition to caption photos.

 

 
  [ # 9 ]

Inspiring. It is true computer vision has made a rather sudden leap from 6% to 94% accuracy last year, combining both object and scene recognition with human guidance in a new generation of neural nets, leading to Google etc being able to describe pictures to a fair degree. It makes you wonder why we still have alt texts, and one of the answers is the same as why I’d store the word “red” instead of a picture: Search optimisation.

 

 
  login or register to react
‹‹ A group project?      My Chatbot ››