Recently I've been working with The Trade Desk API whose documentation is hosted on a password protected web site. The site requires a username and password to access. These you get from your TTD representative.
Now, the site is fine but very dense. In a few cases I wanted to search the entire site and having a copy of the site locally would make this possible.
How to download the site is described in this post.
The first attempt to download using
wget didn't work because the site doesn't support access with Basic Authentication. This meant that passing your username and password as parameters to
wget will not work.
Notice that when logging in that the site asks for the username and then prompts for the password in two steps. This means another technique will be needed.
One idea would be to build a crawler that knew how to navigate the login dialogs and then copy the site's files.
Another idea, is to split the work between the login and the file copying between the browser and
wget. With this idea we login using the browser to have the site set it's cookies with the browser. Then, we borrow the cookie values and have
wget use them to copy the files.
wgetto download the site using the cookies file
The Export Cookies Add-In is available here:
Install this into your Firefox browser.
Next, manually login to the site using Firefox. This will set the appropriate cookie values within the browser.
The site's url is:
Export the cookie file for the site to
cookies.txt. Find the 'Export Cookie' add-in's icon in the Firefox tool bar and click it. Export the cookies for the site's domain
Lastly, here is the usage of
wget to copy the site while using the
cookies.txt file to allow to access the files.
wget --load-cookies cookies.txt --recursive --no-parent --convert-links --no-clobber --html-extension https://api.thetradedesk.com/v3/doc