Notes from a pilot's kid

Finding Zip Codes in Nielsen DMAs

Diagram of zip codes that overlap with Atlanda Nielsen DMA

Here’s a common market research scenario: we need to find more about the people who live in the area where we’re launching a TV and/or radio campaign. Maybe we just want to know demographics. Or maybe we want to know what restaurants are nearby.

Those TV/radio areas are called DMAs (Designated Market Areas) by Nielsen. In theory, people living inside one DMA are exposed to the same television and radio.

A useful first step to learning more about the people within a DMA of interest is identifying which postal/zip codes fall inside the DMA. Knowing which zip codes are inside a DMA will let us slice the right census data to learn more basic demographic information, for example.

So let’s pick a few DMAs and we’ll find the zip codes that fall inside them:

  • Seattle (DMA 819)
  • Phoenix (DMA 753)
  • Houston (DMA 618)
  • Atlanta (DMA 524)

Crucial to this approach is acquiring spatial data for both zip codes and Nielsen DMAs. To make the data easier to work with, I am going to convert all spatial data to GeoJSON.

Geo data about zip codes is available from census.gov as a shapefile. As I learned in exploring my options for this approach, no geographical dataset can completely represent zip codes, since they’re designed as lists of addresses where one post office delivers mail and those addresses don’t always end up comprising a simple polygon on a map (more discussion about this here)

ogr2ogr can convert the shapefile provided by the US Census into a GeoJSON file (here’s a short tutorial):

ogr2ogr -f GeoJSON -lco COORDINATE_PRECISION=2 zip_codes.geojson [US CENSUS SHAPEFILE FILENAME].shp

Data about DMAs, since they’re from Nielsen, are probably available from them. In fact, specific data about which zip codes fall into DMAs of interest is also available from Nielsen, for a fee. But let’s assume in this research scenario either that we don’t have any money to spend on that or we want the flexibility to explore and visualize the data ourselves, so we want a dataset that’s free and gives us the shapes and locations of each DMA. I was able to track down a Topojson file on Github of each Nielsen DMA in the lower 48 states.

For the TopoJSON, there’s an easy online converter here. But since the file is a flavor of JSON already, it would also be easy to load without changing the format.

The Nielsen DMA TopoJSON file also contains metadata for each DMA, which is easier to reference if extracted and saved as a CSV. The metadata can be extracted using the following few lines of code:

import json
import numpy as np
import pandas as pd

topojson_filename = 'FILE NAME OF NIELSEN TOPOJSON'
nielsen = json.load(open(topojson_filename, 'rb'))

nielsendf = pd.DataFrame([i['properties'] for i in nielsen['objects']['nielsen_dma']['geometries']])
del nielsendf['name']
nielsendf.columns = ['adsperc', 'cableperc', 'dma_id', 'dma_name', 'latitude', 'longitude', 'tvperc']

nielsendf.to_csv('nielsenDMAchart.csv', index = False)

The next step is finding out which zip code polygons are contained in or overlap with our DMAs of interest. There are lots of ways to do this, but I’m going to use a Python library called Shapely to examine overlaps.

Our two GeoJSON files can be loaded as dictionaries of Shapely objects using the below code, which uses their identifying properties (the zip code and the DMA number) found in the structure of the JSON as dictionary keys. This requires the python library geojson.

import geojson, sys
from shapely.geometry import shape

zipshape_filename = 'FILE NAME OF ZIP CODE GEOJSON'
nielsenshape_filename = 'FILE NAME OF NIELSEN DMA GEOJSON'

#load geojson files as geojson objects
zips = geojson.load(open(zipshape_filename))
nielsen = geojson.load(open(nielsenshape_filename))


#for both zip codes and DMAs, extract properties and geometry from each geojson feature, add each one as a dictionary entry with the DMA ID or zip # as the key and a shapely object as the value
zips_ = {}
for i in zips.features:
    zips_[str(i['properties']['ZCTA5CE10'])] = shape(geojson.MultiPolygon(i['geometry']['coordinates']))

dmas_errors = []
dmas_ = {}
for i in nielsen.features:
    try:
        dmas_[str(i['properties']['id'])] = shape(geojson.Polygon(i['geometry']['coordinates']))
    except:
        try:
            dmas_[str(i['properties']['id'])] = shape(geojson.Polygon(i['geometry']['coordinates'][0]))
        except:
            print 'error'

A shapely method called ‘intersects’ can be used to identify the zip code shapes that intersect (are completely or partially contained within) the shapes of our Nielsen DMAs of interest. So for each shapely object representing a DMA, we can simply iterate through the zip code shapes to find those that intersect with it:

import numpy as np
import pandas as pd

#load Nielsen DMA metadata as Pandas df
dmametadata = pd.read_csv('FILE NAME OF DMA CSV', dtype = {'dma_id': 'str'})
dmametadata = dmametadata.set_index('dma_id')

target_dmas = {
 'Seattle': 819,
 'Phoenix': 753,
 'Houston': 618,
 'Atlanta', 524
}

def zips_in_DMA(one_dma_id):
 dmaziplist = []
 for i in zips_.iteritems():
 if dmas_[one_dma_id].intersects(i[1]):
 dmaziplist.append(i[0])
 else:
 pass

 return dmaziplist

dma_lists = {}

for place in target_dmas.iteritems():
 dma_lists[place[0]] = zips_in_DMA(place[1])

That’s it. We now have a list of US zip codes that overlap with or are contained in each of our four target Nielsen DMAs (the dictionary dma_lists). From here, we can get the right data from American FactFinder or other data sources to learn more about the people that live in the area where a marketing campaign is being launched.