AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Sentences List for my Bot
 
 

Hi,
I am looking for a list of english sentences, to use as a bot database.
Found only one list with google, but that was not suitable to use as a bot database. I want a general (everyday) sentences list.
Thanks,
Mahan Marwat

 

 
  [ # 1 ]

You want a list of English sentences? This forum is full of them!

 

 
  [ # 2 ]

@Steve You mean, I have to scrape the forum, with a scraping bot? I will do the same as my last decision.
But it would be alot easier for me, if someone point me to a place, where I found a nice list of phrases, like there are already list of words available, but I want sentences.

 

 
  [ # 3 ]

These have to be copy / pasted, but they meet your “everyday” test:

http://www.englishspeak.com/english-phrases.cfm

If you need much more than that, you should look for a sentence corpus, like the MASC sentence corpus here: http://www.anc.org/data/masc/corpus/

“MASC is a balanced subset of 500K words of written texts and transcribed speech drawn primarily from the Open American National Corpus (OANC).”

you can use their online tool to select the type of documents you want sentences from.

 

 
  [ # 4 ]

What are you trying to accomplish with your bot database?  Usually bots have a pattern that needs to be matched that goes along with the sentence that would be the response.  What are you trying to do with just the sentences?

If you use Chatscript for your bot it can read and process documents.  You determine what to do with the sentences in the documents.  You could store each sentence that is read as a database record. 

I am currently trying to isolate sentences from web pages for use with my bot using a custom c# program.  I am extracting the paragraph tag contents and processing the text with regex match and substitutions.  Currently I have 10-14 substitutions that give me a textfile that is a list of sentences formatted as fact triples that Chatscript can read while it is running. A chatscript chatbot can also execute a command line utility command so it can call my custom c# code.  I would like to be able to have my chatbot respond to a question by having it “read” the internet and infer an intelligent response by parsing the sentences it reads.

I like to use Simple Wikipedia because the sentence structures are simpler.

Processing the URL: https://simple.wikipedia.org/wiki/Guitar yields:

( webquery answer The_guitar_is_a_string_instrument_which_is_played_by_plucking_the_strings. )
( webquery answer The_main_parts_of_a_guitar_are_the_body,_the_fretboard,_the_headstock_and_the_strings. )
( webquery answer Guitars_are_usually_made_from_wood_or_plastic. )
( webquery answer Their_strings_are_made_of_steel_or_nylon. )
( webquery answer The_guitar_strings_are_plucked_with_the_fingers_and_fingernails_of_the_right_hand_openparen_or_left_hand,_for_left_handed_players_closedparen_,_or_a_small_pick_made_of_thin_plastic. )
( webquery answer This_type_of_pick_is_called_a_"plectrum"_or_guitar_pick. )
( webquery answer The_left_hand_holds_the_neck_of_the_guitar_while_the_fingers_pluck_the_strings. )
( webquery answer Different_finger_positions_on_the_fretboard_make_different_notes. )
( webquery answer Guitar-like_plucked_string_instruments_have_been_used_for_many_years. )
( webquery answer In_many_countries_and_at_many_different_time_periods,_guitars_and_other_plucked_string_instruments_have_been_very_popular,_because_they_are_light_to_carry_from_place_to_place,_they_are_easier_to_learn_to_play_than_many_other_instruments. )
( webquery answer Guitars_are_used_for_many_types_of_music,_from_Classical_to_Rock. )
( webquery answer Most_pieces_of_popular_music_that_have_been_written_since_the_1950s_are_written_with_guitars. )
( webquery answer There_are_many_different_types_of_guitars,_classified_on_how_they_are_made_and_the_type_of_music_they_are_used_for. )
...

Chatscript includes a POS tagger and can identify the main subject, main verb and main object of a sentence so rather than just have the bot repeat static sentences verbatim it should be possible to process the sentences for the meaning in them.

Parsing the html of websites is one way of gathering sentences on subjects you are interested in or that your bot is interested in.  There are automated ways of obtaining the sentences.

Good luck!

 

 

 
  [ # 5 ]

@Alaric @OliverL Thank You,
Python is king in web-scraping, I scrapped this site http://www.english-for-students.com/Frequently-Used-Sentences.html using requests and beautifulsoup. If anybody need it, the file is attached.
@Alaric I am a student and very new to chatter bots and AI. I didn’t know about this pattern matching before. I was coding my first bot on my own idea, i.e I split the user entered sentence into words and then pick the lengthy word in the list and which line contain that word in database, I show that line to the user as a bot reply.
I completed my first bot using this technique but now my bot really sucks…
_: Hi,
BB-8: What? —Bot Reply
_: How are you?
BB-8: What are you going to do?
_: What is your name?
BB-8: Sir, nobody is here by that name.
_: I said, what is your name?
BB-8: I am BB-8 and please your name?
_: I have a dog and a cat do you have any?
BB-8: Have you announced the entrance test results?
_: what the fuck, bb-8?
BB-8: I am BB-8 and please your name?
_: :( holy crap!
BB-8: Say it again!
_: holy crap!
BB-8: Say it again!
_: holy crap!
BB-8: What?
_: holy crap!
BB-8: What?
_: holy crap!
BB-8: Please Google it.
Now I am going to study other techniques. The pattern matching technique does not seem to me a good option, cause the other bots, I have tested sucks too. Yes ALICE is smart!

File Attachments
database_--mahan-marwat.txt  (File Size: 33KB - Downloads: 500)
 

 
  [ # 6 ]

But… ALICE majorily uses pattern matching.
http://www.alicebot.org/aiml.html

 

 
  login or register to react
‹‹ Hello ! I’m new      Saying hi ››