AI Zone Admin Forum Add your forum
Best way to find “knowledge” in spreadsheet-type setup on the web
 
 

Hey folks,

I’m looking for lists of “knowledge” on the web.
Things like, lists of nouns, verbs, etc.  Ideally, in the “nouns” category, I get “is a” information, too.
Like:
cat is an animal
dog is an animal
etc

I have found a couple “short” lists of nouns… but that’s it. 

I’ve explored WordNet and OpenCyc… and unless I can export the list of knowledge into a spreadsheet (which I don’t seem to be able to do), it’s not helping me… although I think WordNet does contain the info I’m after.

Any ideas?

Thanks in advance!

-Adeena

 

 
  [ # 1 ]

Hey, Adeena. good to hear from you.

I think I’ve still got the PHP/MySQL version of WordNet on my dev server, and if you like, I can see about writing a PHP script to extract the data you need. It may take me a couple of days to do it, but it shouldn’t be to much of a circus to do. smile

 

 
  [ # 2 ]

Hey Dave,

I know I’ve been a little MIA from bot-related things recently… it’s been a hard winter for me personally.

But… since the Loebner Contest is earlier this year I need to get ready!

I didn’t realize there was a PHP/MySQL version of WordNet… I had only downloaded the executable for windows.  Maybe that’s my problem.  smile  I’m actually looking at their website now… I’m not seeing an easy reference to a PHP/MySQL version… is it the “Corpus”?  I need to spend some more time looking through the website, maybe!

-Adeena

 

 
  [ # 3 ]

I’ll look through my files, as well. Since my computer crashed back in October, I’ve simply not installed everything I once had, but I’m a “packrat” when it comes to downloaded files. I don’t remember exactly where I got the PHP version, but I do know it wasn’t from the WordNet site. I had to Google “WordNet PHP” to find it.

 

 
  [ # 4 ]

http://rtw.ml.cmu.edu/rtw/

http://openmind.media.mit.edu/

http://code.google.com/p/mindpix/source/browse/trunk/mpexport/mindpixels.txt

The last link to mindpixels.txt requires transforming questions (with scores of 1, say) from the form: “Is X Y?” or “Does X Y?” (or “Can X Y?”) to “X is Y”, or “X Verb Y”, (or “X can Y”,) etc.

This post shows how I’m trying to solve that particular problem, using a grammar parser to segment X and Y ...

This type of question transformation reminds me of my linguistic training:

“A typical transformation is the rule for forming questions, which requires that the normal subject—verb order is inverted so that the surface structure of Can I see you later? differs in order of elements from that of I can see you later.” (from http://www.encyclopedia.com/topic/transformational-generative_grammar.aspx)

 

 
  [ # 5 ]

You could also extract this information from one of the DBPedia datasets (http://wiki.dbpedia.org/Datasets) although it would take some time to get familiar with the structure and then write a script to extract it into a spreadsheet.  Dave’s generous offer looks like a pretty attractive alternative!

 

 
  [ # 6 ]

And yet I seem to have mis-spoken a small amount in my earlier post. It wasn’t a PHP version of WordNet, but a MySQL version. I also found that the WordNet database is, indeed, no longer on the dev server, but I found the files to install it, so am doing so at this moment. From there, I’ll examine the structure, and see about extracting the necessary data.

 

 
  [ # 7 ]

Thanks everyone…
Definitely thanks for the mindpixel txt file.  I knew of the project in general, but didn’t realize this file was available. That helps a lot. 
And Dave… let me know what you come up with.  I’m MySQL knowledgable, too, so if it’s something reasonably simple that you could pass on (since I don’t want to make you do a pile of my work for me!), I’d appreciate it or any help you can give!

:D

-Adeena

 

 
  [ # 8 ]

Well, I was able to get the WN database installed again, and that caused me to remember why I gave up on the project.

Being self-taught, as far as MySQL goes, I’ve got huge gaps in my experience, and the structure and layout of the WordNet database is very nearly incomprehensible to me. I mean I can read the contents of the various tables easily enough, but “complex” SQL statements that pull data from multiple tables using joins is a bit above my pay-grade, I’m afraid.

I did, however, find the website where I obtained the files to install WordNet as a database, at http://wnsql.sourceforge.net/ - there are both MySQL and PostgreSQL versions there. Also, there are some interesting projects at Princeton’s Related Projects Page that you may find interesting/useful.

I’ll still keep poking at this, but it’s a lot more complex than I really have the skills for. :(

 

 
  [ # 9 ]

There’s also an mssql version of wordnet, which you can download from my download page (pretty big) (http://janbogaerts.name/index.php/downloads/).
The database itself doesn’t contain the queries, they are complied in my application, but I can extract a couple of them if you want. (I figured out the structure by staring at the data for a long time)
There’s also framenet and verbnet, which you can use as resources, but I’m not certain if they will be as helpful as some of the other sources.

 

 
  [ # 10 ]

Hey Jan,

So I grabbed the wordnet.zip from your website link.
Could you extract a couple queries?  I’m looking at the tables (actually in MS SQL Server Management Studio), and I can’t quite make sense of it…  I don’t know what table goes with what…

Thanks!

-Adeena

 

 
  [ # 11 ]

I’ve got a number of good word lists on my site at http://wixml.net. You would probably find lists like Moby and Scowl particularly suitable.

 

 
  [ # 12 ]
Adeena Mignogna - Jan 14, 2012:

Hey Jan,

So I grabbed the wordnet.zip from your website link.
Could you extract a couple queries?  I’m looking at the tables (actually in MS SQL Server Management Studio), and I can’t quite make sense of it…  I don’t know what table goes with what…

Thanks!

-Adeena

I know what you mean, it also took me a while. Here’s a set of queries that I use with these tables:

File Attachments
queries.sql  (File Size: 6KB - Downloads: 0)
 

 
  [ # 13 ]

Hey Andrew - thanks for the link!  I’m looking at it right now…

Hey Jan - thanks for the sql… but… argh.  For some reason I can’t download the file.  I click on the link and am greeted with a blank page.  Alternatively, I right click to do a “save as” and am asked to save “download.htm”.  I’ve tried it in multiple browsers…

-A

 

 
  [ # 14 ]

I downloaded YAWL.  If YAWL had a second column that said “verb”. “noun”, “adjective”, etc… it would be 80% of what I’m looking for!  :D

-Adeena

 

 
  [ # 15 ]
Adeena Mignogna - Jan 15, 2012:

I downloaded YAWL.  If YAWL had a second column that said “verb”. “noun”, “adjective”, etc… it would be 80% of what I’m looking for!  :D

-Adeena

It would be easiest to get that from the Moby word lists. http://wixml.net/moby.html

You could download the original package and use the part of speech file or you could download the version that I’ve corrected and converted to SQL. You can get the list that you want from the SQL database with a query like this:

SET SEARCH_PATH TO import_moby;

SELECT w.fText AS "word",c.fName[1] AS "class",wc.fRank 
FROM tWord w
,tClass c,tWordClass wc 
WHERE wc
.pWord w.kWord AND wc.pClass c.kClass
ORDER BY 1
,3,2

I also saved the output of that query as a comma separated values file which you can import directly to a spreadsheet, after unzipping it. You can download it from my server here (it was too big to add as a file attachment to this post):

http://asmith.id.au/files/moby-words-pos.zip

As there are over two hundred and seventy thousand words in that list, don’t expect to be able to handle them easily in a spreadsheet. The reason that I’ve been converting all these word lists to PostgreSQL is that it can easily handle such huge amounts of data.

Hope this helps.

 

 

 1 2 > 
1 of 2
 
  login or register to react