topher nguyen project zillow

Do you ever look at Zillow and see what houses might be in your price range? I recently went to Denver, CO to see a friend and I was curious about house prices. I looked up a guide to find houses around the area with a specific criteria. I wanted to automate the process so I always could run the program and see how the market has changed. Export the data to json and csv and do a little real-world anaylsis.

Setting Up the Environment

As always, these are the libraries I used.:

from dotenv import load_dotenv
import os
import pandas as pd
import requests
import warnings
from datetime import date
import json

# settings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", None)

dotenv: I store my passwords in an .env file.
os: Use operating system functionality in my code.
pandas: Of course. Must have always.
requests: This is how you make a request on a website
warnings: For hiding warning that I know I'll get.
datetime: For working with dates. This is how I timestamp my files
json: For working with JSON data.

Main Function and Configuration

All the major functions in my script:

def main():
	configure()
	api_key, listing_url = get_token()
	listing_response = get_listings(api_key, listing_url)
	process_listing(listing_response)

The configure() function loads environment variables:

def configure():
	load_dotenv()

Passwords are stored in .env file for security reasons, keeping sensitive information like API keys out of the main script.

Fetching API Key and URL

The get_token() function retrieves the API key and URL from environment variables:

def get_token():
api = os.getenv('api')
url = os.getenv('url')
return api, url

Making the API Request

The get_listings() function makes a GET request to the Zillow scraper. I couldn't find a Zillow API. I am guessing that Zillow makes too much money on their data to give it away for free. So I used Scrapeak who pays for Zillow API access. I get 100 pulls per month before I have to pay money. Hopefully I never hit that threshold:

def get_listings(api_key, listing_url):
	url = "https://app.scrapeak.com/v1/scrapers/zillow/listing"

	querystring = {
		"api_key": api_key,
		"url": listing_url
	}

	return requests.request("GET", url, params=querystring)

The API key and listing URL are used in the function. API key is my account/password. Listing URL is the URL with the query criteria in it. For example, my url was searching for 3 bedrooms, 2 bathrooms. The url would be www.zillow.com?query=3bedrooms+2bathrooms in the zillow syntax. That gets passed through the function. The response is returned from the function.

Processing the Response

The process_listing() function processes the API response:

def process_listing(listing_response):
today = date.today().strftime("%y.%m.%d")
print(listing_response.json().keys())

num_of_properties = listing_response.json()["data"]["categoryTotals"]
	["cat1"]["totalResultCount"]
print("Count of properties:", num_of_properties)
df_listings = pd.json_normalize(listing_response.json()["data"]["cat1"]
	["searchResults"]["mapResults"])
df_listings.to_csv("zillow." + today + ".csv", index=False)

text = json.dumps(listing_response.json(), sort_keys=True, indent=4)

with open("zillow." + today + ".json", "w", encoding="utf-8") as f:
	f.write(text)

Today's Date: Used to timestamp the output files.
Count of Properties: Extracted and printed for reference.
Normalize JSON: Converts nested JSON data into a flat table.
Save to CSV: Saves the data to a CSV file.
Save to JSON: Saves the raw JSON response for further inspection.

Conclusion

This script provides a robust template for grabbing the same details over and over again. I could analyze the contents and use machine learning to give me a top 10 and then email me a list once a week. That's on my todo list. Happy coding!