Skip to main content

Posts

Showing posts from April, 2010

Download website with wget in linux

wget is a nice tool for downloading resources from the internet. The basic usage is wget url:

wget http://linuxreviews.org/

Therefore, wget (manual page) + less (manual page) is all you need to surf the internet. The power of wget is that you may download sites recursive, meaning you also get all pages (and images and other data) linked on the front page:

wget -r http://linuxreviews.org/

But many sites do not want you to download their entire site. To prevent this, they check how browsers identify. Many sites refuses you to connect or sends a blank page if they detect you are not using a web-browser. You might get a message like:

Sorry, but the download manager you are using to view this site is not supported. We do not support use of such download managers as flashget, go!zilla, or getright

Wget has a very handy -U option for sites like this. Use -U My-browser to tell the site you are using some commonly accepted browser:

wget -r -p -U Mozilla http://www.stupidsite.com/restricedplace.h…