Pets licensing data analysis

Introduction

This notebook / document provides ground data analysis used to make or confirm certain modeling conjectures and assumptions of a Pets Retail Dynamics Model (PRDM), [AA1]. Seattle pets licensing data is used, [SOD2].

We want to provide answers to the following questions.

  • Does the Pareto principle manifests for pets breeds?

  • Does the Pareto principle manifests for ZIP codes?

  • Is there an upward trend for becoming a pet owner?

All three questions have positive answers, assuming the retrieved data, [SOD2], is representative. See the last section for an additional discussion.

We also discuss pet adoption simulations that are done using Quantile Regression, [AA2, AAp1].

This notebook/document is part of the SystemsModeling at GitHub project “Pets retail dynamics”, [AA1].

Data

The pet licensing data was taken from this page: “Seattle Pet Licenses”, https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/data.

The ZIP code coordinates data was taken from a GitHub repository,
“US Zip Codes from 2013 Government Data”, https://gist.github.com/erichurst/7882666.

Animal licenses

image-3281001a-2f3d-4a8e-87b9-dc8a8b9803b3
image-3281001a-2f3d-4a8e-87b9-dc8a8b9803b3

Convert “Licence Issue Date” values into DateObjects.

Summary

image-49aecba4-2b43-40d7-87ba-15ceb848898d
image-49aecba4-2b43-40d7-87ba-15ceb848898d

Keep dogs and cats only

Since the number of animals that are not cats or dogs is very small we remove them from the data in order to produce more concise statistics.

ZIP code geo-coordinates

Summary

image-572ef441-b14e-438d-b5b7-85f244aa1857
image-572ef441-b14e-438d-b5b7-85f244aa1857
image-c0d4f154-ee22-457f-8a36-715b77c92e08
image-c0d4f154-ee22-457f-8a36-715b77c92e08

Pareto principle adherence

In this section we apply the Pareto principle statistic in order to see does the Pareto principle manifests over the different columns of the pet licensing data.

Breeds

We see a typical Pareto principle adherence for both dog breeds and cat breeds. For example, 20% of the dog breeds correspond to 80% of all registered dogs.

Note that the number of unique cat breeds is 4 times smaller than the number of unique dog breeds.

image-d1bac8f8-fe6c-42c0-8d52-45ed21ab6cc2
image-d1bac8f8-fe6c-42c0-8d52-45ed21ab6cc2
image-3c320985-1ed4-4d11-b983-29f87d4cdc7c
image-3c320985-1ed4-4d11-b983-29f87d4cdc7c

Animal names

We see a typical Pareto principle adherence for the frequencies of the pet names. For dogs, 10% of the unique names correspond to ~65% of the pets.

image-cb6368b6-b735-4f77-a3dd-bcb0be60f28e
image-cb6368b6-b735-4f77-a3dd-bcb0be60f28e
image-bbcac6bb-5247-400c-a093-f3002206b5cf
image-bbcac6bb-5247-400c-a093-f3002206b5cf

Zip codes

We see typical – even exaggerated – manifestation of the Pareto principle over ZIP codes of the registered pets.

image-72cae8dd-d342-4c90-a11d-11607545133e
image-72cae8dd-d342-4c90-a11d-11607545133e

Geo-distribution

In this section we visualize the pets licensing geo-distribution with geo-histograms.

Both cats and dogs

image-94ae1316-ada2-4195-b2fc-6864ff1fd642
image-94ae1316-ada2-4195-b2fc-6864ff1fd642

Dogs and cats separately

image-836dff19-7000-45e0-b0a4-1f3fe4a066c9
image-836dff19-7000-45e0-b0a4-1f3fe4a066c9

Pet stores

In this subsection we show the distribution of pet stores (in Seattle).

It is better instead of image retrieval to show corresponding geo-markers in the geo-histograms above. (This is not considered that important in the first version of this notebook/document.)

image-836dff19-7000-45e0-b0a4-1f3fe4a066c9
image-836dff19-7000-45e0-b0a4-1f3fe4a066c9

Time series

In this section we visualize the time series corresponding to the pet registrations.

Time series objects

Here we make time series objects:

image-49ae54cb-0644-427e-a015-0392284aaaa7
image-49ae54cb-0644-427e-a015-0392284aaaa7

Time series plots of all registrations

Here are time series plots corresponding to all registrations:

image-02632be6-ab52-41b8-959a-e200641fdd8f
image-02632be6-ab52-41b8-959a-e200641fdd8f

Time series plots of most recent registrations

It is an interesting question why the number of registrations is much higher in volume and frequency in the years 2018 and later.

image-85ebeab1-cad5-4fe3-bd5d-c7c8c94a753e
image-85ebeab1-cad5-4fe3-bd5d-c7c8c94a753e

Upward trend

Here we apply both Linear Regression and Quantile Regression:

image-6df4d9d2-e48a-4d63-885c-6ed5112c0f15
image-6df4d9d2-e48a-4d63-885c-6ed5112c0f15

We can see that there is clear upward trend for both dogs and cats.

Quantile regression application

In this section we investigate the possibility to simulate the pet adoption rate. We plan to use simulations of the pet adoption rate in PRDM.

We do that using the software monad QRMon, [AAp1]. A list of steps follows.

  • Split the time series into windows corresponding to the years 2018 and 2019.

  • Find the difference between the two years.

  • Apply Quantile Regression to the difference using a reasonable grid of probabilities.

  • Simulate the difference.

  • Add the simulated difference to year 2019.

Simulation

In this sub-section we simulate the differences between the time series for 2018 and 2019, then we add the simulated difference to the time series of the year 2019.

image-8f9e3af0-46b7-4417-bd1e-3201c1134f34
image-8f9e3af0-46b7-4417-bd1e-3201c1134f34
image-30b836dc-f166-4f21-9c0b-9cca922058e6
image-30b836dc-f166-4f21-9c0b-9cca922058e6
image-65e4d1bf-dfff-4073-88a0-63177eeed1b5
image-65e4d1bf-dfff-4073-88a0-63177eeed1b5
image-6d107cad-6fef-46c8-92a8-59ea78b5039f
image-6d107cad-6fef-46c8-92a8-59ea78b5039f
image-d0d517e0-925b-486c-88fd-287cfe02e799
image-d0d517e0-925b-486c-88fd-287cfe02e799

Take the simulated time series difference:

Add the simulated time series difference to year 2019, clip the values less than zero, shift the result to 2020:

image-2a29feca-73b8-4fce-8051-145d74ec499c
image-2a29feca-73b8-4fce-8051-145d74ec499c

Plot all years together

image-793f146a-07f9-455f-9bc7-2ef7d7897691
image-793f146a-07f9-455f-9bc7-2ef7d7897691

Discussion

This section has subsections that correspond to additional discussion questions. Not all questions are answered, the plan is to progressively answer the questions with the subsequent versions of the this notebook / document.

□ Too few pets

The number of registered pets seems too few. Seattle is a large city with more than 600000 citizens; approximately 50% of the USA households have dogs; hence the registered pets are too few (~50000).

□ Why too few pets?

Seattle is a high tech city and its citizens are too busy to have pets?

Most people do not register their pets? (Very unlikely if they have used veterinary services.)

Incomplete data?

□ Registration rates

Why the number of registrations is much higher in volume and frequency in the years 2018 and later?

□ Adoption rates

Can we tell apart the adoption rates of pet-less people and people who already have pets?

Preliminary definitions

References

[AA1] Anton Antonov, Pets retail dynamics project, (2020), SystemModeling at GitHub.

[AA2] Anton Antonov, A monad for Quantile Regression workflows, (2018), MathematicaForPrediction at WordPress.

[AAp1] Anton Antonov, Monadic Quantile Regression Mathematica package, (2018), MathematicaForPrediction at GitHub.

[SOD1] Seattle Open Data, “Seattle Pet Licenses”, https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb/data .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.