This notebook / document provides ground data analysis used to make or confirm certain modeling conjectures and assumptions of a Pets Retail Dynamics Model (PRDM), [AA1]. Seattle pets licensing data is used, [SOD2].
We want to provide answers to the following questions.
- Does the Pareto principle manifests for pets breeds?
Does the Pareto principle manifests for ZIP codes?
Is there an upward trend for becoming a pet owner?
All three questions have positive answers, assuming the retrieved data, [SOD2], is representative. See the last section for an additional discussion.
We also discuss pet adoption simulations that are done using Quantile Regression, [AA2, AAp1].
The pet licensing data was taken from this page: “Seattle Pet Licenses”, https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/data.
The ZIP code coordinates data was taken from a GitHub repository,
“US Zip Codes from 2013 Government Data”, https://gist.github.com/erichurst/7882666.
Convert “Licence Issue Date” values into DateObjects.
Keep dogs and cats only
Since the number of animals that are not cats or dogs is very small we remove them from the data in order to produce more concise statistics.
ZIP code geo-coordinates
Pareto principle adherence
In this section we apply the Pareto principle statistic in order to see does the Pareto principle manifests over the different columns of the pet licensing data.
We see a typical Pareto principle adherence for both dog breeds and cat breeds. For example, 20% of the dog breeds correspond to 80% of all registered dogs.
Note that the number of unique cat breeds is 4 times smaller than the number of unique dog breeds.
We see a typical Pareto principle adherence for the frequencies of the pet names. For dogs, 10% of the unique names correspond to ~65% of the pets.
We see typical – even exaggerated – manifestation of the Pareto principle over ZIP codes of the registered pets.
In this section we visualize the pets licensing geo-distribution with geo-histograms.
Both cats and dogs
Dogs and cats separately
lsCoords=Map[If[KeyExistsQ[aZipLatLon,#],aZipLatLon[#],Nothing]&,Select[ToString/@Normal[dsPetLicenses[Select[#Species=="Dog"&],"ZIP Code"]],StringQ[#]&&StringLength[#]>=5&]]; gr1=GeoHistogram[lsCoords,GeoCenter->city,GeoRange->Quantity[20,"Miles"],PlotLegends->Automatic,ColorFunction->(Hue[2/3,2/3,1-#]&),opts];
lsCoords=Map[If[KeyExistsQ[aZipLatLon,#],aZipLatLon[#],Nothing]&,Select[ToString/@Normal[dsPetLicenses[Select[#Species=="Cat"&],"ZIP Code"]],StringQ[#]&&StringLength[#]>=5&]]; gr2=GeoHistogram[lsCoords,GeoCenter->city,GeoRange->Quantity[20,"Miles"],PlotLegends->Automatic,ColorFunction->(Hue[2/3,2/3,1-#]&),opts];
In this subsection we show the distribution of pet stores (in Seattle).
It is better instead of image retrieval to show corresponding geo-markers in the geo-histograms above. (This is not considered that important in the first version of this notebook/document.)
In this section we visualize the time series corresponding to the pet registrations.
Time series objects
Here we make time series objects:
Time series plots of all registrations
Here are time series plots corresponding to all registrations:
Time series plots of most recent registrations
It is an interesting question why the number of registrations is much higher in volume and frequency in the years 2018 and later.
Here we apply both Linear Regression and Quantile Regression:
We can see that there is clear upward trend for both dogs and cats.
Quantile regression application
In this section we investigate the possibility to simulate the pet adoption rate. We plan to use simulations of the pet adoption rate in PRDM.
We do that using the software monad
QRMon, [AAp1]. A list of steps follows.
- Split the time series into windows corresponding to the years 2018 and 2019.
Find the difference between the two years.
Apply Quantile Regression to the difference using a reasonable grid of probabilities.
Simulate the difference.
Add the simulated difference to year 2019.
In this sub-section we simulate the differences between the time series for 2018 and 2019, then we add the simulated difference to the time series of the year 2019.
Take the simulated time series difference:
Add the simulated time series difference to year 2019, clip the values less than zero, shift the result to 2020:
Plot all years together
This section has subsections that correspond to additional discussion questions. Not all questions are answered, the plan is to progressively answer the questions with the subsequent versions of the this notebook / document.
□ Too few pets
The number of registered pets seems too few. Seattle is a large city with more than 600000 citizens; approximately 50% of the USA households have dogs; hence the registered pets are too few (~50000).
□ Why too few pets?
Seattle is a high tech city and its citizens are too busy to have pets?
Most people do not register their pets? (Very unlikely if they have used veterinary services.)
□ Registration rates
Why the number of registrations is much higher in volume and frequency in the years 2018 and later?
□ Adoption rates
Can we tell apart the adoption rates of pet-less people and people who already have pets?
[SOD1] Seattle Open Data, “Seattle Pet Licenses”, https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/data .