![]() |
LinkChecker |
To check a URL like http://www.example.org/
it is enough to
type linkchecker www.example.org
on the command line or
type www.example.org
in the GUI application. This will check the
complete domain of http://www.example.org
recursively. All links
pointing outside of the domain are also checked for validity.
All URLs have to pass a preliminary syntax test. Minor quoting After the syntax check passes, the URL is queued for connection checking. All connection check types are described below.
HTTP links (http:
, https:
)
After connecting to the given HTTP server the given path or query is requested. All redirections are followed, and if user/password is given it will be used as authorization when necessary. Permanently moved pages issue a warning. All final HTTP status codes other than 2xx are errors.
Local files (file:
)
A regular, readable file that can be opened is valid. A readable directory is also valid. All other files, for example unreadable, non-existing or device files are errors.
File contents are checked for recursion. If they are parseable files (for example HTML files), all links in that file will be checked.
Mail links (mailto:
)
A mailto: link resolves to a list of email addresses. If one address fails the whole list will fail. For each mail address the following things are checked:
FTP links (ftp:
)
For FTP links the following is checked:
anonymous
, the default password is anonymous@
.Telnet links (telnet:
)
A connect and if user/password are given, login to the given telnet server is tried.
NNTP links (news:
, snews:
, nntp
)
A connect is tried to connect to the given NNTP server. If a news group or article is specified, it will be requested from the server.
Ignored links (javascript:
, etc.)
An ignored link will print a warning, but no error. No further checking will be made.
Here is the complete list of recognized, but ignored links. The most prominent of them are JavaScript links.
acap:
(application configuration access protocol)afs:
(Andrew File System global file names)chrome:
(Mozilla specific)cid:
(content identifier)clsid:
(Microsoft specific)data:
(data)dav:
(dav)fax:
(fax)find:
(Mozilla specific)gopher:
(Gopher)imap:
(internet message access protocol)irc:
(internet relay chat)isbn:
(ISBN (int. book numbers))javascript:
(JavaScript)ldap:
(Lightweight Directory Access Protocol)mailserver:
(Access to data available from mail servers)mid:
(message identifier)mms:
(multimedia stream)modem:
(modem)nfs:
(network file system protocol)opaquelocktoken:
(opaquelocktoken)pop:
(Post Office Protocol v3)prospero:
(Prospero Directory Service)rsync:
(rsync protocol)rtsp:
(real time streaming protocol)service:
(service location)shttp:
(secure HTTP)sip:
(session initiation protocol)tel:
(telephone)tip:
(Transaction Internet Protocol)tn3270:
(Interactive 3270 emulation sessions)vemmi:
(versatile multimedia interface)wais:
(Wide Area Information Servers)z39.50r:
(Z39.50 Retrieval)z39.50s:
(Z39.50 Session)Before descending recursively into a URL, it has to fulfill several conditions. The conditions are checked in this order:
--recursion-level
command line option, the recursion
level GUI option, or through the configuration file.
The recursion level is unlimited by default.--ignore-url
command line option or through the
configuration file.Note that the local and FTP directory recursion reads all files in that
directory, not just a subset like index.htm*
.
Each user can edit a configuration with advanced options for checking or filtering.
On Unix or OS X systems the user configuration file is at
~/.linkchecker/linkcheckerrc
On Windows the user configuration file is at
%HOMEPATH%\.linkchecker\linkcheckerrc