Brad Lucas

Programming, Clojure and other interests
June 2, 2017

New Yahoo Finance Quote Download Url

Overview

On May 18th 2017 the Yahoo ichart data download link http://ichart.finance.yahoo.com/table.csv stopped working. This link has for many years been a convenient access point for downloading historical quote data. It was simple to get quotes by building up this url with your symbol as a parameter.

Since this link was under the Download Data button on the https://finance.yahoo.com/ page after you've submitted a symbol the first place to look in finding a resolution was to visit this page to what Yahoo had replaced it with.

Investigating

Using Google as an example we can excercise this url https://finance.yahoo.com/quote/GOOG?p=GOOG to look for the current Download Data button. Investigate the page and find the Historical Data tab. When clicked the Download Data button appears above the default range of quote values.

Notice, the link created for downloading the data.

https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1493836545&period2=1496514945&interval=1d&events=history&crumb=4c1fh7TK/VW

Two things are noticable. The period values are likely be the from and to date values and the mysterious crumb value. Reviewing the page you'll see the Time period date range. These dates when converted to Unix (or Epoch) time match the period values. The crumb is likely to be a unique value generaeted for our page.

Now, how can we verify this.

Data in the page

Viewing the source is a good next step but you'll quickly find it unwieldy due to the size of the page data. Try curling or wget-ing the page https://finance.yahoo.com/quote/GOOG?p=GOOG to a file.

$ curl https://finance.yahoo.com/quote/GOOG?p=GOOG > goog.html

Then open the file and look for crumb. I found that the file is minimfied to a single line with 48 matches for crumb. To help I used the occur command in Emacs.

Now, 48 instances is a tedious number to look through so I'll give a hint and tell you the value we are looking for is found by searching for CrumbStore. Here I found the following:

"CrumbStore":{"crumb":"sMt9UQ80bWV"}

So, this step one. This value is used to build the download the url shown above. We'll need to make a call to the finance.yahoo.com page to pull out the CrumbStore crumb value.

But, is that enough. It turns out that this page drops a cookie. We'll need that as well. Using curl let's repeat the above excercise with this command.

curl -s --cookie-jar cookie.txt https://finance.yahoo.com/quote/GOOG?p=GOOG > goog.html

Looking at the cookie.txt you'll see something like the following:

# Netscape HTTP Cookie File
# http://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.yahoo.com	TRUE	/	FALSE	1528051708	B	68gt6hhcj613s&b=3&s=kq

The value of B is what you need. For our use here we'll just let curl use the cookie.txt file. Other languages my need the specific B value so I point it out here.

Is that it? Well, in working through the above I noticed that once in a while the crumb would have an octel value. For example:

"CrumbStore":{"crumb":"3OCj0.Pb\u002F4G"}

This would break things later on so we'll need to get that value translated. A trick is to echo the curl output and it will take care of things. Here is the new command:

echo -en "$(curl -s --cookie-jar $cookieJar https://finance.yahoo.com/quote/GOOG/?p=GOOG)"

Summary

Considering the above here is what we need to do to download quote data.

  • Curl the finance.yahoo.com page while storing the dropped cookie and parsing the CrumbStore crumb value from the page
  • Building the download data link using the crumb value and passing appropriate period date range values

Building the script

Before we build a script to download quote data let's make some assumptions to keep the script system. Later, we can expand on things but for now we'll make the following assumptions:

  • The script will accept a SYMBOL as a single parameter
  • We'll ask for all data since the Unix epoch time (1/1/1970)
  • We'll have the script create it's results in local file named SYMBOL.csv

Symbol as a parameter

SYMBOL=$1
if [[ -z $SYMBOL ]]; then
  echo "Please enter a SYMBOL as the first parameter to this script"
  exit
fi

Max date range

Nicely, we can ask for dates which are too old. In practice many symbols don't have data going back to 1970 but for us it's simple to just ask and get what Yahoo has. In Unix time 1/1/1970 is 0 and now can be gotten with date +%s.

START_DATE=0
END_DATE=$(date +%s)

Getting crumb value

We'll want to store our cookie and have access to it later. Also, we'll do the echo trick mentioned above and lasty we'll need to pull out the crumb value.

cookieJar=$(mktemp)
echo "COOKIEJAR: $cookieJar"

Get the crumb value

function getCrumb() {
  # Sometimes the value has an octal character
  # echo will convert it
  # https://stackoverflow.com/a/28328480

  # curl the url then replace the } characters with line feeds. This takes the large json one line and turns it into about 3000 lines
  # grep for the CrumbStore line
  # then copy out the value
  # lastly, remove any quotes
  echo -en "$(curl -s --cookie-jar $cookieJar https://finance.yahoo.com/quote/$SYM/?p=$SYM)" | tr "}" "\n" | grep CrumbStore | cut -d':' -f 3 | sed 's+"++g'
}

crumb=$(getCrumb)

echo "CRUMB: $crumb"
  • Replace '}' with '\n' so the one string is converted to multiple lines
  • Grep for the CrumbStore and then cut out the third field delineated by colons
  • Lastly, remove any extraneous quotes

Download Data

BASE_URL="https://query1.finance.yahoo.com/v7/finance/download/$SYM?period1=$START_DATE&period2=$END_DATE&interval=1d&events=history"
echo $BASE_URL

URL="$BASE_URL&crumb=$crumb"
echo "URL: $URL"

curl -s --cookie $cookieJar  $URL > $SYM.csv
  • The url is built up with the symbol, data range period values and the crumb value.

Final Script

My final version of the script discussed here is up on GitHub in the following repo:

https://github.com/bradlucas/get-yahoo-quotes

Tags: bash yahoo quotes trading