Zillow is fun.
Do you ever look at Zillow and see what houses might be in your price range? I recently went to Denver, CO to see a friend and I was curious about house prices. I looked up a guide to find houses around the area with a specific criteria. I wanted to automate the process so I always could run the program and see how the market has changed. Export the data to json and csv and do a little real-world anaylsis.
Setting Up the Environment
As always, these are the libraries I used.:
from dotenv import load_dotenv
import os
import pandas as pd
import requests
import warnings
from datetime import date
import json
# settings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", None)
dotenv: I store my passwords in an .env
file.
os: Use operating system functionality in my code.
pandas: Of course. Must have always.
requests: This is how you make a request on a website
warnings: For hiding warning that I know I'll get.
datetime: For working with dates. This is how I timestamp my files
json: For working with JSON data.
Main Function and Configuration
All the major functions in my script:
def main():
configure()
api_key, listing_url = get_token()
listing_response = get_listings(api_key, listing_url)
process_listing(listing_response)
The configure()
function loads environment variables:
def configure():
load_dotenv()
Passwords are stored in .env
file for security reasons, keeping sensitive information like API keys out of the main script.
Fetching API Key and URL
The get_token()
function retrieves the API key and URL from environment variables:
def get_token():
api = os.getenv('api')
url = os.getenv('url')
return api, url
Making the API Request
The get_listings()
function makes a GET request to the Zillow scraper. I couldn't find a Zillow API. I am guessing that Zillow makes too much money on their data to give it away for free. So I used Scrapeak who pays for Zillow API access. I get 100 pulls per month before I have to pay money. Hopefully I never hit that threshold:
def get_listings(api_key, listing_url):
url = "https://app.scrapeak.com/v1/scrapers/zillow/listing"
querystring = {
"api_key": api_key,
"url": listing_url
}
return requests.request("GET", url, params=querystring)
The API key and listing URL are used in the function. API key is my account/password. Listing URL is the URL with the query criteria in it. For example, my url was searching for 3 bedrooms, 2 bathrooms. The url would be www.zillow.com?query=3bedrooms+2bathrooms in the zillow syntax. That gets passed through the function. The response is returned from the function.
Processing the Response
The process_listing()
function processes the API response:
def process_listing(listing_response):
today = date.today().strftime("%y.%m.%d")
print(listing_response.json().keys())
num_of_properties = listing_response.json()["data"]["categoryTotals"]
["cat1"]["totalResultCount"]
print("Count of properties:", num_of_properties)
df_listings = pd.json_normalize(listing_response.json()["data"]["cat1"]
["searchResults"]["mapResults"])
df_listings.to_csv("zillow." + today + ".csv", index=False)
text = json.dumps(listing_response.json(), sort_keys=True, indent=4)
with open("zillow." + today + ".json", "w", encoding="utf-8") as f:
f.write(text)
Today's Date: Used to timestamp the output files.
Count of Properties: Extracted and printed for reference.
Normalize JSON: Converts nested JSON data into a flat table.
Save to CSV: Saves the data to a CSV file.
Save to JSON: Saves the raw JSON response for further inspection.
Conclusion
This script provides a robust template for grabbing the same details over and over again. I could analyze the contents and use machine learning to give me a top 10 and then email me a list once a week. That's on my todo list. Happy coding!