Getting started with nltk
Remarks#
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
The book
Natural Language Processing with Python provides a practical introduction to programming for language processing. Written by the creators of NLTK, it guides the reader through the fundamentals of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure, and more. The book is being updated for Python 3 and NLTK 3. (The original Python 2 version is still available at https://nltk.org/book_1ed .)
Versions#
NLTK Version History
Version | Release Date |
---|---|
3.2.4 (latest) | 2017-05-21 |
3.2 | 2016-03-03 |
3.1 | 2015-10-15 |
With NLTK
You can use NLTK (especially, the nltk.tokenize
package) to perform sentence boundary detection:
import nltk
text = "This is a test. Let's try this sentence boundary detector."
text_output = nltk.tokenize.sent_tokenize(text)
print('text_output: {0}'.format(text_output))
Output:
text_output: ['This is a test.', "Let's try this sentence boundary detector."]
Installation or Setup
NLTK requires Python
versions 2.7 or 3.4+.
These instructions consider python
version - 3.5
-
Mac/Unix :
- Install NLTK: run
sudo pip install -U nltk
- Install Numpy (optional): run
sudo pip install -U numpy
- Test installation: run
python
then typeimport nltk
NOTE : For older versions of Python it might be necessary to install setuptools (see https://pypi.python.org/pypi/setuptools) and to install pip (sudo easy_install pip).
- Install NLTK: run
-
Windows :
These instructions assume that you do not already have Python installed on your machine.
32-bit binary installation
- Install Python 3.5: https://www.python.org/downloads/ (avoid the 64-bit versions)
- Install Numpy (optional): https://sourceforge.net/projects/numpy/files/NumPy/ (the version that specifies pythnon3.5)
- Install NLTK: https://pypi.python.org/pypi/nltk
- Test installation:
Start>Python35
, then typeimport nltk
-
Installing Third-Party Software :
Please see: https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software
Reference : https://www.nltk.org/install.html
NLTK’s download function
You can install NLTK over pip
(pip install nltk
).After it is installed, many components will not be present, and you will not be able to use some of NLTK’s features.
From your Python shell, run the function ntlk.download()
to select which additional packages you want to install using UI. Alternatively, you can use python -m nltk.downloader [package_name]
.
-
To download all packages available.
nltk.download(‘all’)
-
To download specific package.
nltk.download(‘package-name’)
-
To download all packages of specific folder.
import nltk
dwlr = nltk.downloader.Downloader()
chunkers, corpora, grammars, help, misc,
models, sentiment, stemmers, taggers, tokenizers
for pkg in dwlr.packages(): if pkg.subdir== ‘taggers’: dwlr.download(pkg.id)
-
To download all packages except Corpora Folder.
import nltk
dwlr = nltk.downloader.Downloader()
for pkg in dwlr.corpora(): dwlr._status_cache[pkg.id] = ‘installed’
dwlr.download(‘all’)
NLTK installation with Conda.
To install NLTK with Continuum’s anaconda
/ conda
.
If you are using Anaconda, most probably nltk would be already downloaded in the root (though you may still need to download various packages manually).
Using conda
:
conda install nltk
To upgrade nltk
using conda
:
conda update nltk
With anaconda
:
If you are using multiple python envriroments in anaconda, first activate the enviroment where you want to install nltk. You can check the active enviroment using the command
conda info --envs
The enviroment with the * sign before the directory path is the active one. To change the active enviroment use
activate <python_version>
for eg. activate python3.5
Now check the list of packages installed in this enviroment using commnad
conda list
If you dont find ‘nltk’ in the list, use
conda install -c anaconda nltk=3.2.1
For further information, you may consult https://anaconda.org/anaconda/nltk.
To install mini-conda a.k.a. conda
: https://conda.pydata.org/docs/install/quick.html
To install anaconda
: https://docs.continuum.io/anaconda/install