I’m confused about where to look for this missing ‘z’. I’m looking at the text files of WordNet included with NLTK. Searching index.noun, the letter ‘z’ is present and accounted for, between ‘yves_tanguy’ and ‘z-axis’.
If I look up roman_alphabet (indexed as 06497872) and then check that index in data.noun, I find an entry as follows:
06497872 10 n 02 Roman_alphabet 0 Latin_alphabet 0 028 @ 06497459 n 0000 @ 06825863 n 0000 %m 06831177 n 0000 %m 06831284 n 0000 %m 06831391 n 0000 %m 06831498 n 0000 %m 06831605 n 0000 %m 06831712 n 0000 %m 06831819 n 0000 %m 06831926 n 0000 %m 06832033 n 0000 %m 06832140 n 0000 %m 06832248 n 0000 %m 06832356 n 0000 %m 06832464 n 0000 %m 06832572 n 0000 %m 06832680 n 0000 %m 06832788 n 0000 %m 06832896 n 0000 %m 06833004 n 0000 %m 06833112 n 0000 %m 06833220 n 0000 %m 06833328 n 0000 %m 06833436 n 0000 %m 06833544 n 0000 %m 06833663 n 0000 %m 06833776 n 0000 %m 06833890 n 0000 | the alphabet evolved by the ancient Romans which serves for writing most of the languages of western Europe
The index 06833890 corresponds to ‘z’ and 06831177 corresponds to ‘a’, so I’m guessing ‘z’ is included in my version??
Now on to antonyms. Some antonyms are only defined at the level of lemmas. Here’s an example:
>>> from nltk.corpus import wordnet as wn
>>> alls = wn.synsets('all')
>>> alls[0].antonyms()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Synset' object has no attribute 'antonyms'
>>> alls[0].lemmas[0].antonyms()
[Lemma('no.a.01.no'), Lemma('some.a.01.some')]
As you can see, directly trying to find the antonym of the synset failed, but succeeded for the lemma. Could this be why you’re not finding those antonyms? Here’s what I found for your examples (including all synsets and all lemmas):
all: no, some, partially
before: (none found); after: (none found)
create: (none found); destroy: (none found)
happy: unhappy
in: (none found); out: safe <—odd one; perhaps by way of “knocked out” or “forbidden”?
loud: soft, piano, softly
neat: (none found); messy: (none found)
over: (none found); under: (none found)
real: unreal, nominal, insubstantial
right-side-up: (none found); upside-down: (none found)
tough: tender
ancient: (none found); modern: old style, nonmodern
every: (none found); none: (none found)
dumb: (none found); smart: stupid
crazy: (none found); sane: insane
behind: (none found); ahead: back, backward
arrive: leave
any: (none found): none: (none found)
The code snippet I wrote to search for these (if anyone’s interested) is as follows:
# For input word 'wd'
syns = wn.synsets(wd)
lemlist = []
for x in syns: lemlist.extend(x.lemmas)
antlist = []
for x in lemlist: antlist.extend(x.antonyms())
antwords = [x.name for x in antlist]
I imagine more antonyms could be found by expanding the base of synonyms. For example, if I include hyponyms in my search for “create”, I find the antonym “disassemble”. However, out of 58 hyponyms (and their 129 lemmas), that’s the only antonym that comes up. So clearly most synsets have not been tied to antonyms.