On May 18th 2017 the Yahoo ichart data download link http://ichart.finance.yahoo.com/table.csv
stopped working. This link has for many years been a convenient access point for downloading historical quote data. It was simple to get quotes by building up this url with your symbol as a parameter.
Since this link was under the Download Data
button on the https://finance.yahoo.com/ page after you've submitted a symbol the first place to look in finding a resolution was to visit this page to what Yahoo had replaced it with.
Using Google as an example we can excercise this url https://finance.yahoo.com/quote/GOOG?p=GOOG to look for the current Download Data
button. Investigate the page and find the Historical Data
tab. When clicked the Download Data
button appears above the default range of quote values.
Notice, the link created for downloading the data.
https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1493836545&period2=1496514945&interval=1d&events=history&crumb=4c1fh7TK/VW
Two things are noticable. The period values are likely be the from
and to
date values and the mysterious crumb
value. Reviewing the page you'll see the Time period date range. These dates when converted to Unix (or Epoch) time match the period values. The crumb is likely to be a unique value generaeted for our page.
Now, how can we verify this.
Viewing the source is a good next step but you'll quickly find it unwieldy due to the size of the page data. Try curling or wget-ing the page https://finance.yahoo.com/quote/GOOG?p=GOOG to a file.
$ curl https://finance.yahoo.com/quote/GOOG?p=GOOG > goog.html
Then open the file and look for crumb
. I found that the file is minimfied to a single line with 48 matches for crumb
. To help I used the occur
command in Emacs.
Now, 48 instances is a tedious number to look through so I'll give a hint and tell you the value we are looking for is found by searching for CrumbStore
. Here I found the following:
"CrumbStore":{"crumb":"sMt9UQ80bWV"}
So, this step one. This value is used to build the download the url shown above. We'll need to make a call to the finance.yahoo.com
page to pull out the CrumbStore crumb value.
But, is that enough. It turns out that this page drops a cookie. We'll need that as well. Using curl let's repeat the above excercise with this command.
curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/GOOG?p=GOOG > goog.html
Looking at the cookie.txt you'll see something like the following:
# Netscape HTTP Cookie File
# http://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
.yahoo.com TRUE / FALSE 1528051708 B 68gt6hhcj613s&b=3&s=kq
The value of B
is what you need. For our use here we'll just let curl use the cookie.txt
file. Other languages my need the specific B
value so I point it out here.
Is that it? Well, in working through the above I noticed that once in a while the crumb would have an octel value. For example:
"CrumbStore":{"crumb":"3OCj0.Pb\u002F4G"}
This would break things later on so we'll need to get that value translated. A trick is to echo the curl output and it will take care of things. Here is the new command:
echo -en "$(curl -s --cookie-jar $cookieJar https://finance.yahoo.com/quote/GOOG/?p=GOOG)"
Considering the above here is what we need to do to download quote data.
finance.yahoo.com
page while storing the dropped cookie and parsing the CrumbStore crumb value from the pageBefore we build a script to download quote data let's make some assumptions to keep the script system. Later, we can expand on things but for now we'll make the following assumptions:
SYMBOL=$1
if [[ -z $SYMBOL ]]; then
echo "Please enter a SYMBOL as the first parameter to this script"
exit
fi
Nicely, we can ask for dates which are too old. In practice many symbols don't have data going back to 1970 but for us it's simple to just ask and get what Yahoo has. In Unix time 1/1/1970 is 0 and now can be gotten with date +%s
.
START_DATE=0
END_DATE=$(date +%s)
We'll want to store our cookie and have access to it later. Also, we'll do the echo trick mentioned above and lasty we'll need to pull out the crumb value.
cookieJar=$(mktemp)
echo "COOKIEJAR: $cookieJar"
function getCrumb() {
# Sometimes the value has an octal character
# echo will convert it
# https://stackoverflow.com/a/28328480
# curl the url then replace the } characters with line feeds. This takes the large json one line and turns it into about 3000 lines
# grep for the CrumbStore line
# then copy out the value
# lastly, remove any quotes
echo -en "$(curl -s --cookie-jar $cookieJar https://finance.yahoo.com/quote/$SYM/?p=$SYM)" | tr "}" "\n" | grep CrumbStore | cut -d':' -f 3 | sed 's+"++g'
}
crumb=$(getCrumb)
echo "CRUMB: $crumb"
BASE_URL="https://query1.finance.yahoo.com/v7/finance/download/$SYM?period1=$START_DATE&period2=$END_DATE&interval=1d&events=history"
echo $BASE_URL
URL="$BASE_URL&crumb=$crumb"
echo "URL: $URL"
curl -s --cookie $cookieJar $URL > $SYM.csv
My final version of the script discussed here is up on GitHub in the following repo: