ahwget
By default wget gets to current directory and builds subdirectories as needed
wget [options] url
wget [options] -i <file_with_list_of_urls>
wget [options] -F -B <url_to_use_for_rel_links> -i <html_file_with_urls>
useful wget command options
-r recursive
-l <n> recursion level - get n+1 levels (default=5)
-p get all files needed to show the file
-k convert links - make links point to downloaded local files
-np no-parent - only follow links at and below dir level of requested
-L follow relative links only
-H span hosts - follow recursive links to other hosts
-K keep original versions of files (before -k option is done)
-E save files like .asp files with .html suffix
-nH no host dirs - do not add directory whose name is url loaded
-O <f> use file <f> as output file
-o <f> use file <f> as logfile (overwrite)
-a <f> use file <f> as logfile (append)
-nd put all files in current directory (do not create subdirs)
-P <d> save all to directory <d>
-A <l> <l> is comma separated list of suffixes or patterns to accept
-R <l> <l> is comma separated list of suffixes or patterns to reject
-domains=<l> <l> is comma separated list of domains to follow
--exclude-domains=<l> <l> is comma separated list of domains to NOT follow
--follow-ftp follow ftp links from html pages
--http-user=user for pages that require login
--http-passwd=password for pages that require login
--no-check-certificate accept bad certificates
--load-cookies <f> use cookies from file <f>
--save-cookies <f> save new cookies to file <f>
NOTE: cookie file is something like:
~/.mozilla/firefox/5k9udbec.default/cookies.sqlite
EXAMPLES
--------
to get a page with all content:
wget -p -k <url>
ln -s <url> . # remove http:// from <url>
to get a page recursively with all linked pages on same server <n> levels deep:
wget -r -p -k -l <n> <url>
ln -s <url> . # remove http:// from <url>
to do this with all files in a new local subdirectory named ./foo add:
-nH --directory-prefix=foo
To do this but remove <m> levels of subdirectory names add:
--cut-dirs=<m>
To get all .jpg files linked from a page:
wget -p -r -l2 --accept'.jpg,.JPG,.Jpg' <url>