The PAS Premium

Jeremy Leipzig
6/1/16

Background & Timeline

  • Sadie Tanner Mossell Alexander - envisanged in 1998, completed in 2001
  • 500 students
  • $1330 per student from UPenn
  • At capacity by 2011, 2013 winter debacle, lotteries conducted but no denials in 2014, 2015, 2016

PAS Premium Heuristics and Previous Studies

  • A “$100k premium”
  • “In fact, prices of homes can jump $50,000 or even $100,000 if they're in the Penn Alexander catchment—even if they're exactly the same kind of homes as the non-catchment homes across the street.” Curbed 1/16/13
  • Penn Urban Research
  • No pricing models I could find

Questions

  • Is there really a catchment premium?
  • Did it exist before the school was built?
  • Does it affect rentals?

Analysis strategy

  • Focus on border properties
  • Linear regression & conditional logistic regression modeling
  • Use Philadelphia Office of Property Assessment API and Casey Thomas' PHL-opa wrapper
  • Statistics and visualization in R with Leaflet for mapping

Property Data JSON

{"status"=>"success", "total"=>1, "data"=>{"properties"=>[{"property_id"=>"8871000921", "account_number"=>"461171000", "full_address"=>"921 S 46TH ST", "unit"=>"", "zip"=>"19143-3701", "address_match"=>{"original"=>"921 S 46th St", "standardized"=>"921 S 46TH ST", "similarity"=>100, "match_code"=>nil, "match_type"=>"Parcel"}, "geometry"=>{"x"=>-75.21404898004198, "y"=>39.94828586007601}, "ownership"=>{"owners"=>["MCCARTY MARY ELLEN", "LEIPZIG JEREMY"], "liaison"=>nil}, "characteristics"=>{"description"=>"ROW 3 STY MASONRY", "land_area"=>903, "improvement_area"=>1662, "improvement_description"=>"", "building_code"=>"O50", "homestead"=>nil}, "sales_information"=>{"sales_date"=>"/Date(1369713600000-0400)/", "sales_price"=>300000, "sales_type"=>"B"}, "valuation_history"=>[{"certification_year"=>"2017", "market_value"=>294200, "land_taxable"=>44130, "land_exempt"=>0, "improvement_taxable"=>220070, "improvement_exempt"=>30000, "total_exempt"=>294200, "taxes"=>nil, "certified"=>"Y"}], "proposed_valuation"=>{}}]}}

Choosing properties

main map

Clustering neighborhoods

dists<-dist(ldply(prop_xy@polygons,function(x){c(x@labpt[1],x@labpt[2])}))
hc <- hclust(dists)
clust <- cutree(hc, 8)
prop_xy$clust<-clust

cluster map

Property breakdown

By side and type

catchment_side type count
inside RENTAL 71
inside RESIDENCE 137
outside RENTAL 85
outside RESIDENCE 165

By decade

decade count
1980 45
1990 90
2000 178
2010 145

Sales

plot of chunk unnamed-chunk-4

Is there really a catchment premium?

Price per square foot

decade inside outside wilcox.test.pval
1980 73.48566 55.94440 0.12
1990 63.30345 59.60292 0.90
2000 138.70073 116.62971 0.03
2010 159.14145 141.54575 0.04

Can we model it?

model<-lm(adj_price ~ sqft + in_catchment*pas_era + type + caseshiller + clust, data=props)
summary(model)

Call:
lm(formula = adj_price ~ sqft + in_catchment * pas_era + type + 
    caseshiller + clust, data = props)

Residuals:
    Min      1Q  Median      3Q     Max 
-393314  -52877    4647   66183  283106 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  -3.544e+05  3.533e+04 -10.031  < 2e-16 ***
sqft                          9.442e+01  8.298e+00  11.380  < 2e-16 ***
in_catchmentTRUE             -2.983e+03  1.646e+04  -0.181 0.856213    
pas_eraTRUE                   1.767e+04  2.387e+04   0.740 0.459430    
typeRESIDENCE                 7.060e+04  1.061e+04   6.655 8.37e-11 ***
caseshiller                   2.299e+03  2.691e+02   8.542  < 2e-16 ***
clust2                        2.815e+04  3.166e+04   0.889 0.374424    
clust3                        7.389e+03  1.759e+04   0.420 0.674572    
clust4                        3.016e+04  1.708e+04   1.766 0.078076 .  
clust5                       -2.298e+04  1.629e+04  -1.410 0.159150    
clust6                        8.722e+04  1.563e+04   5.579 4.20e-08 ***
clust7                        2.748e+04  1.431e+04   1.921 0.055416 .  
clust8                        1.558e+05  2.822e+04   5.522 5.72e-08 ***
in_catchmentTRUE:pas_eraTRUE  7.075e+04  1.989e+04   3.556 0.000417 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 98940 on 444 degrees of freedom
Multiple R-squared:  0.6417,    Adjusted R-squared:  0.6312 
F-statistic: 61.16 on 13 and 444 DF,  p-value: < 2.2e-16

What does this say?

2000sqft residence inside the catchment in cluster 8 in 2016

`(Intercept)`+2000*sqft+in_catchmentTRUE+pas_eraTRUE+typeRESIDENCE+188.24*caseshiller+clust8+`in_catchmentTRUE:pas_eraTRUE`
[1] 579016.3

2000sqft residence outside the catchment in cluster 2 in 2016

`(Intercept)`+2000*sqft+typeRESIDENCE+188.24*caseshiller+clust2
[1] 365926.9

The PAS premium is the sum of the catchment and era (>=2001) regressors and their interaction

in_catchmentTRUE+pas_eraTRUE+`in_catchmentTRUE:pas_eraTRUE`
[1] 85439.43

Assumption problems

model<-lm(adj_price ~ sqft + in_catchment*pas_era + type + caseshiller + clust, data=props)
resids<-resid(model)
shapiro.test(resids)

    Shapiro-Wilk normality test

data:  resids
W = 0.9796, p-value = 4.881e-06

Can this be modeled differently to get on better statistical footing?

Using conditional logistic regression we can place each sale in a strata of comparable properties

summary(clogit(in_catchment ~ adj_price + sqft + pas_era + type + caseshiller + strata(clust) , props))
Call:
coxph(formula = Surv(rep(1, 458L), in_catchment) ~ adj_price + 
    sqft + pas_era + type + caseshiller + strata(clust), data = props, 
    method = "exact")

  n= 458, number of events= 208 

                    coef  exp(coef)   se(coef)      z Pr(>|z|)    
adj_price      4.593e-06  1.000e+00  1.054e-06  4.359 1.31e-05 ***
sqft           2.232e-04  1.000e+00  2.039e-04  1.094    0.274    
pas_eraTRUE   -3.845e-01  6.808e-01  4.766e-01 -0.807    0.420    
typeRESIDENCE -1.466e-01  8.636e-01  2.374e-01 -0.618    0.537    
caseshiller   -9.984e-03  9.901e-01  6.314e-03 -1.581    0.114    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

              exp(coef) exp(-coef) lower .95 upper .95
adj_price        1.0000     1.0000    1.0000     1.000
sqft             1.0002     0.9998    0.9998     1.001
pas_eraTRUE      0.6808     1.4690    0.2675     1.733
typeRESIDENCE    0.8636     1.1579    0.5423     1.375
caseshiller      0.9901     1.0100    0.9779     1.002

Rsquare= 0.076   (max possible= 0.72 )
Likelihood ratio test= 36.36  on 5 df,   p=8.061e-07
Wald test            = 32.32  on 5 df,   p=5.141e-06
Score (logrank) test = 34.78  on 5 df,   p=1.666e-06

Acknowledgements