AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..


Hi, I’m Harry,

I’m a professional programmer (boring insurance industry) so my home project/hobby writing an AI system.

Looking forward to sharing ideas and seeing what other people are up to.


  [ # 1 ]


Hi Harry, am I right assuming this is your mindloop blog, above?


You can more or less find out where I’m at by reading through my Quora answers, above.

Lately, I’ve been thinking outside the box, the chatterbox that is, about what it would take to make a desktop AI that could both visually interpret the screen and manipulate the browser in the way that people do.  I don’t know of examples of anything quite like this at the moment.  I was reminded of this by your visual OCR experimentation.

Your graphical simulation efforts remind me of the current Internet of Things (IoT) hype, something like a GUI for the “Internet of Sensors”, with every mobile device a potential sensor.


I’ve recently seen the above simulation of a “Neural Network” going around, and hope to pair even a faux simulation of this kind with a conversational AI, like a voice GUI, or beginning of a bona fide cortical reflection for natural language….  ;^)


  [ # 2 ]

Hi Marcus,

You’re right, that is my blog.

I’ll check your your links later (I’m at work)

You mentioned manipulating the browser - I suspect we have a lot to talk about. My OCR work was aimed directly at dealing with CAPTCHAS. I plan to use Java’s Robot.class which can snap the screen and mimic keyboard and mouse at the OS level.

And while we’re thinking about it - why stop with the browser…



  [ # 3 ]

The current trend in mobile assistant apps does seem to be away from a defining emphasis on natural language toward more manipulation at the OS level.

I did recently run across a video, PIBOT : Humanoid Pilot Robot, showing a physical robot able to visually fly a simulator; though, I’m not thinking about physical keyboard manipulation.  Web browser automation for software testing, such as Selenium, seems to commonly be used for this purpose.

Interestingly, CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”; however, my limited understanding is that human powered APIs, something like “mechanical turks”, are often used to defeat CAPTCHA.


  [ # 4 ]

Sadly there are loads of spam-ware that can read captchas.

At least for my purposes I’m doing it for altruistic reasons :D

If I do get my bot to create (for example) a Google+ account on its own (solving the captcha) I’ll definitely count it as one test passed.

Selenium is a great technology for ‘real’ software testing - but it relies on looking behind the covers and uses the ID’s / Names of text boxes to identify them. I suspect we both want to do it with a screen-reading approach.


  login or register to react