ahwget

ahwget

By default wget gets to current directory and builds subdirectories as needed

wget [options] url
wget [options] -i <file_with_list_of_urls>
wget [options] -F -B <url_to_use_for_rel_links> -i <html_file_with_urls>
useful wget command options

-r      recursive
-l <n>  recursion level - get n+1 levels (default=5)
-p      get all files needed to show the file
-k      convert links - make links point to downloaded local files

-np     no-parent - only follow links at and below dir level of requested
-L      follow relative links only
-H      span hosts - follow recursive links to other hosts
-K      keep original versions of files (before -k option is done)
-E      save files like .asp files with .html suffix
-nH     no host dirs - do not add directory whose name is url loaded
-O <f>  use file <f> as output file
-o <f>  use file <f> as logfile (overwrite)
-a <f>  use file <f> as logfile (append)
-nd     put all files in current directory (do not create subdirs)
-P <d>  save all to directory <d>

-A <l>  <l> is comma separated list of suffixes or patterns to accept
-R <l>  <l> is comma separated list of suffixes or patterns to reject
-domains=<l>  <l> is comma separated list of domains to follow
--exclude-domains=<l>  <l> is comma separated list of domains to NOT follow
--follow-ftp   follow ftp links from html pages
--http-user=user         for pages that require login
--http-passwd=password   for pages that require login
--no-check-certificate   accept bad certificates
--load-cookies <f>       use cookies from file <f>
--save-cookies <f>       save new cookies to file <f>
                            NOTE: cookie file is something like:
                            ~/.mozilla/firefox/5k9udbec.default/cookies.sqlite


EXAMPLES
--------
to get a page with all content:
    wget -p -k <url>        
    ln -s <url> .           # remove http:// from <url>

to get a page recursively with all linked pages on same server <n> levels deep:
    wget -r -p -k -l <n> <url>
    ln -s <url> .           # remove http:// from <url>

to do this with all files in a new local subdirectory named ./foo add:
            -nH --directory-prefix=foo

To do this but remove <m> levels of subdirectory names add:
            --cut-dirs=<m>

To get all .jpg files linked from a page:
    wget -p -r -l2 --accept'.jpg,.JPG,.Jpg' <url>