This document describes how to install Scrapy in Linux, Windows and Mac OS X systems and it consists on the following 3 steps:
Optional:
Scrapy works with Python 2.5 or 2.6, you can get it at http://www.python.org/download/
The procedure for installing the required third party libraries depends on the platform and operating system you use.
If you’re running Ubuntu/Debian Linux run the following command as root:
apt-get install python-twisted python-libxml2
To install optional libraries:
apt-get install python-pyopenssl python-simplejson
If you are running Arch Linux run the following command as root:
pacman -S twisted libxml2
To install optional libraries:
pacman -S pyopenssl python-simplejson
First, download Twisted for Mac.
Mac OS X ships an libxml2 version too old to be used by Scrapy. Also, by looking on the web it seems that installing libxml2 on MacOSX is a bit of a challenge. Here is a way to achieve this, though not acceptable on the long run:
Fetch the following libxml2 and libxslt packages:
Extract, build and install them both with:
./configure --with-python=/Library/Frameworks/Python.framework/Versions/2.5/
make
sudo make install
Replacing /Library/Frameworks/Python.framework/Version/2.5/ with your current python framework location.
Install libxml2 Python bidings with:
cd libxml2-2.7.3/python
sudo make install
The libraries and modules should be installed in something like /usr/local/lib/python2.5/site-packages. Add it to your PYTHONPATH and you are done.
Check the libxml2 library was installed properly with:
python -c 'import libxml2'
Download and install:
There are three ways to download and install Scrapy:
Download Scrapy from the Download page. Scrapy is distributed in two ways: a source code tarball (for Unix and Mac OS X systems) and a Windows installer (for Windows). If you downloaded the tarball you can install it as any Python package using setup.py:
tar zxf scrapy-X.X.X.tar.gz
cd scrapy-X.X.X
python setup.py install
If you downloaded the Windows installer, just run it.
Warning
In Windows, you may need to add the C:\Python25\Scripts (or C:\Python26\Scripts) folder to the system path by adding that directory to the PATH environment variable from the Control Panel.
You can install Scrapy running easy_install like this:
easy_install -U scrapy
Note
If you use the development version of Scrapy, you should subscribe to the mailing lists to get notified of any changes to the API.
Check out the latest development code from the Mercurial repository (you need to install Mercurial_ first):
hg clone http://hg.scrapy.org/scrapy scrapy-trunk
Add Scrapy to your Python path
If you’re on Linux, Mac or any Unix-like system, you can make a symbolic link to your system site-packages directory like this:
ln -s /path/to/scrapy-trunk/scrapy SITE-PACKAGES/scrapy
Where SITE-PACKAGES is the location of your system site-packages directory. To find this out execute the following:
python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"
Alternatively, you can define your PYTHONPATH environment variable so that it includes the scrapy-trunk directory. This solution also works on Windows systems, which don’t support symbolic links. (Environment variables can be defined on Windows systems from the Control Panel).
Unix-like example:
PYTHONPATH=/path/to/scrapy-trunk
Windows example (from command line, but you should probably use the Control Panel):
set PYTHONPATH=C:\path\to\scrapy-trunk
Make the scrapy-ctl.py script available
On Unix-like systems, create a symbolic link to the file scrapy-trunk/bin/scrapy-ctl.py in a directory on your system path, such as /usr/local/bin. For example:
ln -s `pwd`/scrapy-trunk/bin/scrapy-ctl.py /usr/local/bin
This simply lets you type scrapy-ctl.py from within any directory, rather than having to qualify the command with the full path to the file.
On Windows systems, the same result can be achieved by copying the file scrapy-trunk/bin/scrapy-ctl.py to somewhere on your system path, for example C:\Python25\Scripts, which is customary for Python scripts.