AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

how can I extract info from a site and insert it in my bot?
 
 

Hi,
I am new in virtual chatting and I like very much .. I am trying to make my own Bot with pandoraBot.
I thought that it would be nice answers not to be static.
I mean, when a person asks bot who is einstein for example, bot searches a wiki site and retrieve the information.

Is it easy to do something like that ?
What languages I have to learn?

is this called semantic search ?

Thanks a lot

P.S : I have heard that RDF helps, but I don’t know what exact is.

 

 
  [ # 1 ]

Websites like YAGO2 and DBpedia would be good places to start. They’ve already done all the work of extracting useful information for you.

http://www.mpi-inf.mpg.de/yago-naga/yago/
http://dbpedia.org/About

Don’t expect it to be easy however. The sheer volume of the data involved makes it difficult to do without advanced programming skills or some very expensive (fast) computer hardware.

Search engines like Google, Bing and Wolfram Alpha also provide web based APIs for these sorts of things, however accessing them requires some kind of commercial arrangement.

 

 
  [ # 2 ]
Andrew Smith - Oct 13, 2011:

Search engines like Google, Bing and Wolfram Alpha also provide web based APIs for these sorts of things, however accessing them requires some kind of commercial arrangement.

Wolfram Alpha provides an API for free for personal use.

 

 
  [ # 3 ]
Carl B - Oct 14, 2011:

Wolfram Alpha provides an API for free for personal use.

That’s true, but my reading of the terms of use suggests that you would be forbidden from using it in an application like a chatbot in the way that Bufoss is wanting to do it. Maybe if it was strictly on your own computer or LAN…

Anyway, that reminds me, I’ve got another significant source of data right here, which I developed myself and have been operating for the last 6 years. That’s http://tracktype.org which provides a fast and powerful API for music meta data.

The data all comes from http://freedb.org but it undergoes a substantial amount of filtering and reprocessing to clean it up, which is unique to my service. My database also provides a much more powerful API which is upward compatible with the original CDDB API, but allows searching on names and parts of names. It is also hundreds of times faster than competing services and routinely handles hundreds of thousands of queries per day.

Anyone who wants to use the TrackType music database via their chatbot is welcome to do so. Hopefully the notes on the news page and the existing CDDB documentation and tools will be sufficient to get you started, but feel free to ask for more info if you need it.

You can also download the latest version of the entire data set directly from

http://tracktype.org/archives/cddb-disc-20110901.7z

but be warned, it is 535 megabytes and my internet service isn’t really suitable for very large files so it will take a while.

 

 

 
  [ # 4 ]

Here’s another great source of live data.

http://www.freebase.com/

These people were the original developers of the product which is now being distributed as Google Refine.

 

 
  [ # 5 ]

Thanks a lot for your answers

 

 
  [ # 6 ]

You also might like to check out the AI Nexus forum. It has a lot of info for Pandorabots users and examples of how to integrate web search with your bot.
http://knytetrypper.proboards.com/index.cgi?action=recent

 

 
  [ # 7 ]

This sounds like Corpus based approach! There are huge Corporas available like http://corpus.byu.edu/
but i am not sure how to use them or is it even possible to use them i chatbots! i am beginner too smile

 

 
  [ # 8 ]

Can we make our own simple page scroller or is it legal to make an API to search from google.com and use the results in program?

 

 
  [ # 9 ]

That corpus website is a great find. I’m going to investigate that immediately. Previously I’d only known about small corpora like Susanne which is free, and things like the Penn Tree Bank and Brown Corpus which are certainly not free.

As far as web crawling is concerned, don’t try to use Google in any way that breaches the terms of service. I believe you can buy access to the API but they’ll probably do horrible things to your IP address if you try to circumvent that. Also check out duckduckgo and bing.

Then there is this:

http://www.commoncrawl.org/

They crawl five billion web pages for you and provide very cost effective ways of using the resulting data for your own research. Probably out of the range of a student budget, but still better than trying to crawl that many pages yourself.

 

 
  login or register to react