Saturday, September 23, 2006
natural language tool kit for python
This is just something I noticed while the re module has been driving me crazy. It is almost 25 megs probily more because there is an opitional part of it....
Version 0.6.5 (May)
Extended Shoebox support
Simple n-gram language modeling
Chinese treebank sample and corpus reader
Streamline Windows corpus installation
Words tutorial: Pywordnet discussion
Advanced programming tutorial: recommended methods for saving models (pickle?, shelve?); reading and writing XML; formatting output, tabulation; urllib
Parsing tutorial: discuss parse tree navigation
Version 0.7 (June)
Improved corpus modeling and interfaces
Feature-based grammars and parsers
Unit testing
Change corpora.basedir to be a search path
...
Unprioritised
Software
fix misc.quicksort
Marshalling
interpolated and backoff language models
port NLTK feature detectors and classifiers
integrate more student projects (incl TAG, textcat, paradigms)
add sequence values to FeatureStructure
decision list classifier
add OLAC support: read and write an OLAC static repository
port MXTerminator (incl James' fixes)
collocation support (chi-sq, PMI, spearman rank correlation, etc)
new material on data modelling (interlinear text, paradigms)
maxent package and tutorial
lexical semantics
good off-the-shelf tagger and chunker
more contributed materials
material on writing (adapting?) a corpus reader
information extraction (e.g. from biomedical literature)
regular expressions for extracting temporal expressions
coverage of other languages
Corpora
port more NLTK corpus readers
more LDC corpus samples
Hindi materials?
add Li & Roth question classification data
SRL corpus and reader
Teaching materials
single PDF aggregate
collect all references into a final bibliography
instructor's manual?
produce latex-beamer slides for tutorials
more non-English examples and exercises throughout
Housekeeping
get epydoc docstrings to compile cleanly
Unicode compliance
permit multiple corpus locations; allow corpus readers to be pointed at user's local files
check graphical demos on windows machines (add cf.mainloop()?)
Version 0.6.5 (May)
Extended Shoebox support
Simple n-gram language modeling
Chinese treebank sample and corpus reader
Streamline Windows corpus installation
Words tutorial: Pywordnet discussion
Advanced programming tutorial: recommended methods for saving models (pickle?, shelve?); reading and writing XML; formatting output, tabulation; urllib
Parsing tutorial: discuss parse tree navigation
Version 0.7 (June)
Improved corpus modeling and interfaces
Feature-based grammars and parsers
Unit testing
Change corpora.basedir to be a search path
...
Unprioritised
Software
fix misc.quicksort
Marshalling
interpolated and backoff language models
port NLTK feature detectors and classifiers
integrate more student projects (incl TAG, textcat, paradigms)
add sequence values to FeatureStructure
decision list classifier
add OLAC support: read and write an OLAC static repository
port MXTerminator (incl James' fixes)
collocation support (chi-sq, PMI, spearman rank correlation, etc)
new material on data modelling (interlinear text, paradigms)
maxent package and tutorial
lexical semantics
good off-the-shelf tagger and chunker
more contributed materials
material on writing (adapting?) a corpus reader
information extraction (e.g. from biomedical literature)
regular expressions for extracting temporal expressions
coverage of other languages
Corpora
port more NLTK corpus readers
more LDC corpus samples
Hindi materials?
add Li & Roth question classification data
SRL corpus and reader
Teaching materials
single PDF aggregate
collect all references into a final bibliography
instructor's manual?
produce latex-beamer slides for tutorials
more non-English examples and exercises throughout
Housekeeping
get epydoc docstrings to compile cleanly
Unicode compliance
permit multiple corpus locations; allow corpus readers to be pointed at user's local files
check graphical demos on windows machines (add cf.mainloop()?)