Breaking Heron Foods website

By Cj Malone on

Recently I've been looking over supermarket websites to get a list of stores that they run. They all have "store finders"/"branch locators" which is what I want, except I want it in an easier format to use. It kinda sucks that they don't publish a list of stores as Open Data. After all having a store finder and publishing the data achieves the same thing, helping customers find your stores.

Google has already stolen the data for Google Maps, OSM can because of the stricter copyright reasons, but if it was published in a suitable format and under a suitable license everyone can have a better experience.

Today I looked at heronfoods.com. One of the tricks I have been using is to check the sitemap.xml for a list of all stores, but they don't have one. 😀 Nor do they have a page that lists all the stores. 🙁

When searching in the store finder you get a long url, and its not a session id. 😀

https://heronfoods.com/storelocator/eyJyZXN1bHRfcGFnZSI6InN0b3JlbG9jYXRvciIsImxvY2F0aW9uX2ZpZWxkIjoiUzEgMUFBIn0

I know what that looks like. Base64!

{
	"result_page": "storelocator",
	"location_field": "S1 1AA"
}

So, what if we change that to an wildcard? Nope. How about a blank query?

{
	"result_page": "storelocator",
	"location_field": ""
}

And there we go. 296 results.

No links to a per store page. 🙁

Lets inspect element and see they they use a standard schema to define all the addresses, or even a none standard one. Unfortunately it's not that easy, it's just a bunch or p tags, with a bit of effort I could parse it, but whats this. A HTML comment after the opening hours.

<!--p><a href="https://heronfoods.com/index.php/storelocator/store-details/linthorpe">View store</a></p-->

After a redirect I get what I want, a per store webpage. 😀

Unfortunately there is no Latitude and Longitude, just a post code, so I can't transform it straight into a geojson file. But it'll do for now.