Time series search engines over COVID-19 data

Introduction

In this article we proclaim the preparation and availability of interactive interfaces to two Time Series Search Engines (TSSEs) over COVID-19 data. One TSSE is based on Apple Mobility Trends data, [APPL1]; the other on The New York Times COVID-19 data, [NYT1].

Here are links to interactive interfaces of the TSSEs hosted (and publicly available) at shinyapps.io by RStudio:

Motivation: The primary motivation for making the TSSEs and their interactive interfaces is to use them as exploratory tools. Combined with relevant data analysis (e.g. [AA1, AA2]) the TSSEs should help to form better intuition and feel of the spread of COVID-19 and related data aggregation, public reactions, and government polices.

The rest of the article is structured as follows:

  1. Brief descriptions the overall process, the data
  2. Brief descriptions the search engines structure and implementation
  3. Discussions of a few search examples and their (possible) interpretations

The overall process

For both search engines the overall process has the same steps:

  1. Ingest the data
  2. Do basic (and advanced) data analysis
  3. Make (and publish) reports detailing the data ingestion and transformation steps
  4. Enhances the data with transformed versions of it or with additional related data
  5. Make a Time Series Sparse Matrix Recommender (TSSMR)
  6. Make a Time Series Search Engine Interactive Interface (TSSEII)
  7. Make the interactive interface easily accessible over the World Wide Web

Here is a flow chart that corresponds to the steps listed above:

TSSMRFlowChart

Data

The Apple data

The Apple Mobility Trends data is taken from Apple’s site, see [APPL1]. The data ingestion, basic data analysis, time series seasonality demonstration, (graph) clusterings are given in [AA1]. (Here is a link to the corresponding R-notebook .)

The weather data was taken using the Mathematica function WeatherData, [WRI1].

(It was too much work to get the weather data using some of the well known weather data R packages.)

The New York Times data

The New York Times COVID-19 data is taken from GitHub, see [NYT1]. The data ingestion, basic data analysis, and visualizations are given in [AA2]. (Here is a link to the corresponding R-notebook .)

The search engines

The following sub-sections have screenshots of the TSSE interactive interfaces.

I did experiment with combining the data of the two engines, but did not turn out to be particularly useful. It seems that is more interesting and useful to enhance the Apple data engine with temperature data, and to enhance The New Your Times engine with the (consecutive) differences of the time series.

Structure

The interactive interfaces have three panels:

  • Nearest Neighbors
    • Gives the time series nearest neighbors for the time series of selected entity.
    • Has interactive controls for entity selection and filtering.
  • Trend Finding
    • Gives the time series that adhere to a specified named trend.
    • Has interactive controls for trend curves selection and entity filtering.
  • Notes
    • Gives references and data objects summary.

Implementation

Both TSSEs are implemented using the R packages “SparseMatrixRecommender”, [AAp1], and “SparseMatrixRecommenderInterfaces”, [AAp2].

The package “SparseMatrixRecommender” provides functions to create and use Sparse Matrix Recommender (SMR) objects. Both TSSEs use underlying SMR objects.

The package “SparseMatrixRecommenderInterfaces” provides functions to generate the server and client functions for the Shiny framework by RStudio.

As it was mentioned above, both TSSEs are published at shinyapps.io. The corresponding source codes can be found in [AAr1].

The Apple data TSSE has four types of time series (“entities”). The first three are normalized volumes of Apple maps requests while driving, transit transport use, and walking. (See [AA1] for more details.) The fourth is daily mean temperature at different geo-locations.

Here are screenshots of the panels “Nearest Neighbors” and “Trend Finding” (at interface launch):

AppleTSSENNs

AppleTSSETrends

The New York Times COVID-19 Data Search Engine

The New York Times TSSE has four types of time series (aggregated) cases and deaths, and their corresponding time series differences.

Here are screenshots of the panels “Nearest Neighbors” and “Trend Finding” (at interface launch):

NYTTSSENNs

NYTTSSETrends

Examples

In this section we discuss in some detail several examples of using each of the TSSEs.

Apple data search engine examples

Here are a few observations from [AA1]:

  • The COVID-19 lockdowns are clearly reflected in the time series.
  • The time series from the Apple Mobility Trends data shows strong weekly seasonality. Roughly speaking, people go to places they are not familiar with on Fridays and Saturdays. Other work week days people are more familiar with their trips. Since much lesser number of requests are made on Sundays, we can conjecture that many people stay at home or visit very familiar locations.

Here are a few assumptions:

  • Where people frequently go (work, school, groceries shopping, etc.) they do not need directions that much.
  • People request directions when they have more free time and will for “leisure trips.”
  • During vacations people are more likely to be in places they are less familiar with.
  • People are more likely to take leisure trips when the weather is good. (Warm, not raining, etc.)

Nice, France vs Florida, USA

Consider the results of the Nearest Neighbors panel for Nice, France.

Since French tend to go on vacation in July and August ([SS1, INSEE1]) we can see that driving, transit, and walking in Nice have pronounced peaks during that time:

Of course, we also observe the lockdown period in that geographical area.

Compare those time series with the time series from driving in Florida, USA:

We can see that people in Florida, USA have driving patterns unrelated to the typical weather seasons and vacation periods.

(Further TSSE queries show that there is a negative correlation with the temperature in south Florida and the volumes of Apple Maps directions requests.)

Italy and Balkan countries driving

We can see that according to the data people who have access to both iPhones and cars in Italy and the Balkan countries Bulgaria, Greece, and Romania have similar directions requests patterns:

(The similarities can be explained with at least a few “obvious” facts, but we are going to restrain ourselves.)

The New York Times data search engine examples

In Broward county, Florida, USA and Cook county, Illinois, USA we can see two waves of infections in the difference time series:

References

Data

[APPL1] Apple Inc., Mobility Trends Reports, (2020), apple.com.

[NYT1] The New York Times, Coronavirus (Covid-19) Data in the United States, (2020), GitHub.

[WRI1] Wolfram Research (2008), WeatherData, Wolfram Language function.

Articles

[AA1] Anton Antonov, “Apple mobility trends data visualization (for COVID-19)”, (2020), SystemModeling at GitHub/antononcube.

[AA2] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub/antononcube.

[INSEE1] Institut national de la statistique et des études économiques, “En 2010, les salariés ont pris en moyenne six semaines de congé”, (2012).

[SS1] Sam Schechner and Lee Harris, “What Happens When All of France Takes Vacation? 438 Miles of Traffic”, (2019), The Wall Street Journal

Packages, repositories

[AAp1] Anton Antonov, Sparse Matrix Recommender framework functions, (2019), R-packages at GitHub/antononcube.

[AAp2] Anton Antonov, Sparse Matrix Recommender framework interface functions, (2019), R-packages at GitHub/antononcube.

[AAr1] Anton Antonov, Coronavirus propagation dynamics, (2020), SystemModeling at GitHub/antononcube.

NY Times COVID-19 data visualization (Update)

Introduction

This post is both an update and a full-blown version of an older post — “NY Times COVID-19 data visualization” — using NY Times COVID-19 data up to 2021-01-13.

The purpose of this document/notebook is to give data locations, data ingestion code, and code for rudimentary analysis and visualization of COVID-19 data provided by New York Times, [NYT1].

The following steps are taken:

  • Ingest data
    • Take COVID-19 data from The New York Times, based on reports from state and local health agencies, [NYT1].
    • Take USA counties records data (FIPS codes, geo-coordinates, populations), [WRI1].
  • Merge the data.
  • Make data summaries and related plots.
  • Make corresponding geo-plots.
  • Do “out of the box” time series forecast.
  • Analyze fluctuations around time series trends.

Note that other, older repositories with COVID-19 data exist, like, [JH1, VK1].

Remark: The time series section is done for illustration purposes only. The forecasts there should not be taken seriously.

Import data

NYTimes USA states data

dsNYDataStates = ResourceFunction["ImportCSVToDataset"]["https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv"];
dsNYDataStates = dsNYDataStates[All, AssociationThread[Capitalize /@ Keys[#], Values[#]] &];
dsNYDataStates[[1 ;; 6]]
18qzu6j67rb6y
ResourceFunction["RecordsSummary"][dsNYDataStates]
0eh58fau8y8r1

NYTimes USA counties data

dsNYDataCounties = ResourceFunction["ImportCSVToDataset"]["https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv"];
dsNYDataCounties = dsNYDataCounties[All, AssociationThread[Capitalize /@ Keys[#], Values[#]] &];
dsNYDataCounties[[1 ;; 6]]
1cpd9bx9xi71h
ResourceFunction["RecordsSummary"][dsNYDataCounties]
1elzwfv0fe32k

US county records

dsUSACountyData = ResourceFunction["ImportCSVToDataset"]["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Data/dfUSACountyRecords.csv"];
dsUSACountyData = dsUSACountyData[All, Join[#, <|"FIPS" -> ToExpression[#FIPS]|>] &];
dsUSACountyData[[1 ;; 6]]
0ycsuwd577vov
ResourceFunction["RecordsSummary"][dsUSACountyData]
0tqfkpq6gxui9

Merge data

Verify that the two datasets have common FIPS codes:

Length[Intersection[Normal[dsUSACountyData[All, "FIPS"]], Normal[dsNYDataCounties[All, "Fips"]]]]

(*3133*)

Merge the datasets:

dsNYDataCountiesExtended = Dataset[JoinAcross[Normal[dsNYDataCounties], Normal[dsUSACountyData[All, {"FIPS", "Lat", "Lon", "Population"}]], Key["Fips"] -> Key["FIPS"]]];

Add a “DateObject” column and (reverse) sort by date:

dsNYDataCountiesExtended = dsNYDataCountiesExtended[All, Join[<|"DateObject" -> DateObject[#Date]|>, #] &];
dsNYDataCountiesExtended = dsNYDataCountiesExtended[ReverseSortBy[#DateObject &]];
dsNYDataCountiesExtended[[1 ;; 6]]
09o5nw7dv2wba

Basic data analysis

We consider cases and deaths for the last date only. (The queries can be easily adjusted for other dates.)

dfQuery = dsNYDataCountiesExtended[Select[#Date == dsNYDataCountiesExtended[1, "Date"] &], {"FIPS", "Cases", "Deaths"}];
dfQuery = dfQuery[All, Prepend[#, "FIPS" -> ToString[#FIPS]] &];
Total[dfQuery[All, {"Cases", "Deaths"}]]

(*<|"Cases" -> 22387340, "Deaths" -> 355736|>*)

Here is the summary of the values of cases and deaths across the different USA counties:

ResourceFunction["RecordsSummary"][dfQuery]
1kdnmrlhe4srx

The following table of plots shows the distributions of cases and deaths and the corresponding Pareto principle adherence plots:

opts = {PlotRange -> All, ImageSize -> Medium};
Rasterize[Grid[
   Function[{columnName}, 
     {Histogram[Log10[#], PlotLabel -> Row[{Log10, Spacer[3], columnName}], opts], ResourceFunction["ParetoPrinciplePlot"][#, PlotLabel -> columnName, opts]} &@Normal[dfQuery[All, columnName]] 
    ] /@ {"Cases", "Deaths"}, 
   Dividers -> All, FrameStyle -> GrayLevel[0.7]]]
13l8k7qfbkr3q

A couple of observations:

  • The logarithms of the cases and deaths have nearly Normal or Logistic distributions.
  • Typical manifestation of the Pareto principle: 80% of the cases and deaths are registered in 20% of the counties.

Remark: The top 20% counties of the cases are not necessarily the same as the top 20% counties of the deaths.

Distributions

Here we find the distributions that correspond to the cases and deaths (using FindDistribution ):

ResourceFunction["GridTableForm"][List @@@ Map[Function[{columnName}, 
     columnName -> FindDistribution[N@Log10[Select[#, # > 0 &]]] &@Normal[dfQuery[All, columnName]] 
    ], {"Cases", "Deaths"}], TableHeadings -> {"Data", "Distribution"}]
10hkfowjmj6oh

Pareto principle locations

The following query finds the intersection between that for the top 600 Pareto principle locations for the cases and deaths:

Length[Intersection @@ Map[Function[{columnName}, Keys[TakeLargest[Normal@dfQuery[Association, #FIPS -> #[columnName] &], 600]]], {"Cases", "Deaths"}]]

(*516*)

Geo-histogram

lsAllDates = Union[Normal[dsNYDataCountiesExtended[All, "Date"]]];
lsAllDates // Length

(*359*)
Manipulate[
  DynamicModule[{ds = dsNYDataCountiesExtended[Select[#Date == datePick &]]}, 
   GeoHistogram[
    Normal[ds[All, {"Lat", "Lon"}][All, Values]] -> N[Normal[ds[All, columnName]]], 
    Quantity[150, "Miles"], PlotLabel -> columnName, PlotLegends -> Automatic, ImageSize -> Large, GeoProjection -> "Equirectangular"] 
  ], 
  {{columnName, "Cases", "Data type:"}, {"Cases", "Deaths"}}, 
  {{datePick, Last[lsAllDates], "Date:"}, lsAllDates}]
1egny238t830i

Heat-map plots

An alternative of the geo-visualization is to use a heat-map plot. Here we use the package “HeatmapPlot.m”, [AAp1].

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/HeatmapPlot.m"]

Cases

Cross-tabulate states with dates over cases:

matSDC = ResourceFunction["CrossTabulate"][dsNYDataStates[All, {"State", "Date", "Cases"}], "Sparse" -> True];

Make a heat-map plot by sorting the columns of the cross-tabulation matrix (that correspond to states):

HeatmapPlot[matSDC, DistanceFunction -> {EuclideanDistance, None}, AspectRatio -> 1/2, ImageSize -> 1000]
1lmgbj4mq4wx9

Deaths

Cross-tabulate states with dates over deaths and plot:

matSDD = ResourceFunction["CrossTabulate"][dsNYDataStates[All, {"State", "Date", "Deaths"}], "Sparse" -> True];
HeatmapPlot[matSDD, DistanceFunction -> {EuclideanDistance, None}, AspectRatio -> 1/2, ImageSize -> 1000]
0g2oziu9g4a8d

Time series analysis

Cases

Time series

For each date sum all cases over the states, make a time series, and plot it:

tsCases = TimeSeries@(List @@@ Normal[GroupBy[Normal[dsNYDataCountiesExtended], #DateObject &, Total[#Cases & /@ #] &]]);
opts = {PlotTheme -> "Detailed", PlotRange -> All, AspectRatio -> 1/4,ImageSize -> Large};
DateListPlot[tsCases, PlotLabel -> "Cases", opts]
1i9aypjaqxdm0
ResourceFunction["RecordsSummary"][tsCases["Path"]]
1t61q3iuq40zn

Logarithmic plot:

DateListPlot[Log10[tsCases], PlotLabel -> Row[{Log10, Spacer[3], "Cases"}], opts]
0r01nxd19xj1x

“Forecast”

Fit a time series model to log 10 of the time series:

tsm = TimeSeriesModelFit[Log10[tsCases]]
1gz0j2673707m

Plot log 10 data and forecast:

DateListPlot[{tsm["TemporalData"], TimeSeriesForecast[tsm, {10}]}, opts, PlotLegends -> {"Data", "Forecast"}]
10vx2ydgcpq0c

Plot data and forecast:

DateListPlot[{tsCases, 10^TimeSeriesForecast[tsm, {10}]}, opts, PlotLegends -> {"Data", "Forecast"}]
04qu24g27fzi6

Deaths

Time series

For each date sum all cases over the states, make a time series, and plot it:

tsDeaths = TimeSeries@(List @@@ Normal[GroupBy[Normal[dsNYDataCountiesExtended], #DateObject &, Total[#Deaths & /@ #] &]]);
opts = {PlotTheme -> "Detailed", PlotRange -> All, AspectRatio -> 1/4,ImageSize -> Large};
DateListPlot[tsDeaths, PlotLabel -> "Deaths", opts]
1uc6wpre2zxl3
ResourceFunction["RecordsSummary"][tsDeaths["Path"]]
1olawss0k1gvd

“Forecast”

Fit a time series model:

tsm = TimeSeriesModelFit[tsDeaths, "ARMA"]
0e5p4c2hxhahd

Plot data and forecast:

DateListPlot[{tsm["TemporalData"], TimeSeriesForecast[tsm, {10}]}, opts, PlotLegends -> {"Data", "Forecast"}]
06uurgguaxyg9

Fluctuations

We want to see does the time series data have fluctuations around its trends and estimate the distributions of those fluctuations. (Knowing those distributions some further studies can be done.)

This can be efficiently using the software monad QRMon, [AAp2, AA1]. Here we load the QRMon package:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MonadicProgramming/MonadicQuantileRegression.m"]

Fluctuations presence

Here we plot the consecutive differences of the cases:

DateListPlot[Differences[tsCases], ImageSize -> Large, AspectRatio -> 1/4, PlotRange -> All]
1typufai7chn8

Here we plot the consecutive differences of the deaths:

DateListPlot[Differences[tsDeaths], ImageSize -> Large, AspectRatio -> 1/4, PlotRange -> All]
0wqagqqfj3p7l

From the plots we see that time series are not monotonically increasing, and there are non-trivial fluctuations in the data.

Absolute and relative errors distributions

Here we take interesting part of the cases data:

tsData = TimeSeriesWindow[tsCases, {{2020, 5, 1}, {2020, 12, 31}}];

Here we specify QRMon workflow that rescales the data, fits a B-spline curve to get the trend, and finds the absolute and relative errors (residuals, fluctuations) around that trend:

qrObj = 
   QRMonUnit[tsData]⟹
    QRMonEchoDataSummary⟹
    QRMonRescale[Axes -> {False, True}]⟹
    QRMonEchoDataSummary⟹
    QRMonQuantileRegression[16, 0.5]⟹
    QRMonSetRegressionFunctionsPlotOptions[{PlotStyle -> Red}]⟹
    QRMonDateListPlot[AspectRatio -> 1/4, ImageSize -> Large]⟹
    QRMonErrorPlots["RelativeErrors" -> False, AspectRatio -> 1/4, ImageSize -> Large, DateListPlot -> True]⟹
    QRMonErrorPlots["RelativeErrors" -> True, AspectRatio -> 1/4, ImageSize -> Large, DateListPlot -> True];
0mcebeqra4iqj
0lz7fflyitth2
0ke1wkttei4a3
0smqxx82ytyjq
1ct1s3qemddsi

Here we find the distribution of the absolute errors (fluctuations) using FindDistribution:

lsNoise = (qrObj⟹QRMonErrors["RelativeErrors" -> False]⟹QRMonTakeValue)[0.5];
FindDistribution[lsNoise[[All, 2]]]

(*CauchyDistribution[6.0799*10^-6, 0.000331709]*)

Absolute errors distributions for the last 90 days:

lsNoise = (qrObj⟹QRMonErrors["RelativeErrors" -> False]⟹QRMonTakeValue)[0.5];
FindDistribution[lsNoise[[-90 ;; -1, 2]]]

(*ExtremeValueDistribution[-0.000996315, 0.00207593]*)

Here we find the distribution of the of the relative errors:

lsNoise = (qrObj⟹QRMonErrors["RelativeErrors" -> True]⟹QRMonTakeValue)[0.5];
FindDistribution[lsNoise[[All, 2]]]

(*StudentTDistribution[0.0000511326, 0.00244023, 1.59364]*)

Relative errors distributions for the last 90 days:

lsNoise = (qrObj⟹QRMonErrors["RelativeErrors" -> True]⟹QRMonTakeValue)[0.5];
FindDistribution[lsNoise[[-90 ;; -1, 2]]]

(*NormalDistribution[9.66949*10^-6, 0.00394395]*)

References

[NYT1] The New York Times, Coronavirus (Covid-19) Data in the United States, (2020), GitHub.

[WRI1] Wolfram Research Inc., USA county records, (2020), System Modeling at GitHub.

[JH1] CSSE at Johns Hopkins University, COVID-19, (2020), GitHub.

[VK1] Vitaliy Kaurov, Resources For Novel Coronavirus COVID-19, (2020), community.wolfram.com.

[AA1] Anton Antonov, “A monad for Quantile Regression workflows”, (2018), at MathematicaForPrediction WordPress.

[AAp1] Anton Antonov, Heatmap plot Mathematica package, (2018), MathematicaForPrediciton at GitHub.

[AAp2] Anton Antonov, Monadic Quantile Regression Mathematica package, (2018), MathematicaForPrediciton at GitHub.

A monad for Epidemiologic Compartmental Modeling Workflows

Version 0.8

Introduction

In this document we describe the design and demonstrate the implementation of a (software programming) monad, [Wk1], for Epidemiology Compartmental Modeling (ECM) workflows specification and execution. The design and implementation are done with Mathematica / Wolfram Language (WL). A very similar implementation is also done in R.

Monad’s name is “ECMMon”, which stands for “Epidemiology Compartmental Modeling Monad”, and its monadic implementation is based on the State monad package “StateMonadCodeGenerator.m”, [AAp1, AA1], ECMMon is implemented in the package [AAp8], which relies on the packages [AAp3-AAp6]. The original ECM workflow discussed in [AA5] was implemented in [AAp7]. An R implementation of ECMMon is provided by the package [AAr2].

The goal of the monad design is to make the specification of ECM workflows (relatively) easy and straightforward by following a certain main scenario and specifying variations over that scenario.

We use real-life COVID-19 data, The New York Times COVID-19 data, see [NYT1, AA5].

The monadic programming design is used as a Software Design Pattern. The ECMMon monad can be also seen as a Domain Specific Language (DSL) for the specification and programming of epidemiological compartmental modeling workflows.

Here is an example of using the ECMMon monad over a compartmental model with two types of infected populations:

0ejgq885m1unt 14u60cftusqlc 06grm5lhtjiv9

The table above is produced with the package “MonadicTracing.m”, [AAp2, AA1], and some of the explanations below also utilize that package.

As it was mentioned above the monad ECMMon can be seen as a DSL. Because of this the monad pipelines made with ECMMon are sometimes called “specifications”.

Contents description

The document has the following structure.

  • The sections “Package load” and “Data load” obtain the needed code and data.
  • The section “Design consideration” provide motivation and design decisions rationale.
  • The section “Single site models” give brief descriptions of certain “seed” models that can be used in the monad.
  • The section “Single-site model workflow demo”, “Multi-site workflow demo” give demonstrations of how to utilize the ECMMon monad .
    • Using concrete practical scenarios and “real life” data.
  • The section “Batch simulation and calibration process” gives methodological preparation for the content of the next two sections.
  • The section “Batch simulation workflow” and “Calibration workflow” describe how to do most important monad workflows after the model is developed.
  • The section “Future plans” outlines future directions of development.

Remark: One can read only the sections “Introduction”, “Design consideration”, “Single-site models”, and “Batch simulation and calibration process”. That set of sections provide a fairly good, programming language agnostic exposition of the substance and novel ideas of this document.

Package load

In this section we load packages used in this notebook:

Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/MonadicEpidemiologyCompartmentalModeling.m"];Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/MultiSiteModelSimulation.m"]Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MonadicProgramming/MonadicTracing.m"]Import["https://raw.githubusercontent.com/antononcube/ConversationalAgents/master/Packages/WL/ExternalParsersHookup.m"]

Remark: The import commands above would trigger some other package imports.

Data load

In this section we ingest data using the “umbrella function” MultiSiteModelReadData from [AAp5]:

Read data

AbsoluteTiming[  aData = MultiSiteModelReadData[];  ](*{38.8635, Null}*)

Data summaries

ResourceFunction["RecordsSummary"] /@ aData
01mnhk5boxlto

Transform data

Here we transform the population related data in a form convenient for specifying the simulations with it:

aPopulations = Association@Map[{#Lon, #Lat} -> #Population &, Normal[aData["CountyRecords"]]];aInfected = Association@Map[{#Lon, #Lat} -> #Cases &, Normal[aData["CasesAndDeaths"]]];aDead = Association@Map[{#Lon, #Lat} -> #Deaths &, Normal[aData["CasesAndDeaths"]]];

Geo-visualizations

Using the built-in function GeoHistogram we summarize the USA county populations, and COViD-19 infection cases and deaths:

Row@MapThread[GeoHistogram[KeyMap[Reverse, #1], Quantity[140, "Miles"], PlotLabel -> #2, PlotTheme -> "Scientific", PlotLegends -> Automatic, ImageSize -> Medium] &, {{aPopulations, aInfected, aDead}, {"Populations", "Infected", "Dead"}}]
1km2posdmqoic

(Note that in the plots above we have to reverse the keys of the given population associations.)

Using the function HextileHistogram from [AAp7 ] here we visualize USA county populations over a hexagonal grid with cell radius 2 degrees ((\approx 140) miles (\approx 222) kilometers):

HextileHistogram[aPopulations, 2, PlotRange -> All, PlotLegends -> Automatic, ImageSize -> Large]
0lqx1d453pio1

In this notebook we prefer using HextileHistogram because it represents the simulation data in geometrically more faithful way.

Design considerations

The big picture

The main purpose of the designed epidemic compartmental modeling framework (i.e. software monad) is to have the ability to do multiple, systematic simulations for different scenario play-outs over large scale geographical regions. The target end-users are decision makers at government level and researchers of pandemic or other large scale epidemic effects.

Here is a diagram that shows the envisioned big picture workflow:

Model-development-and-decision-making

Large-scale modeling

The standard classical compartmental epidemiology models are not adequate over large geographical areas, like, countries. We design a software framework – the monad ECMMon – that allows large scale simulations using a simple principle workflow:

  1. Develop a single-site model for relatively densely populated geographical area for which the assumptions of the classical models (approximately) hold.
  2. Extend the single-site model into a large-scale multi-site model using statistically derived traveling patterns; see [AA4].
  3. Supply the multi-site model with appropriately prepared data.
  4. Run multiple simulations to see large scale implications of different policies.
  5. Calibrate the model to concrete observed (or faked) data. Go to 4.

Flow chart

The following flow chart visualizes the possible workflows the software monad ECMMon:

ECMMon-workflow

Two models in the monad

  • An ECMMon object can have one or two models. One of the models is a “seed”, single-site model from [AAp1], which, if desired, is scaled into a multi-site model, [AA3, AAp2].
  • Workflows with only the single-site model are supported.
    • Say, workflows for doing sensitivity analysis, [AA6, BC1].
  • Scaling of a single-site model into multi-site is supported and facilitated.
  • Workflows for the multi-site model include preliminary model scaling steps and simulation steps.
  • After the single-site model is scaled the monad functions use the multi-site model.
  • The workflows should be easy to specify and read.

Single-site model workflow

  1. Make a single-site model.
  2. Assign stocks initial conditions.
  3. Assign rates values.
  4. Simulate.
  5. Plot results.
  6. Go to 2.

Multi-site model workflow

  1. Make a single-site model.
  2. Assign initial conditions and rates.
  3. Scale the single-site model into a multi-site model.
    1. The single-site assigned rates become “global” when the single-site model is scaled.
    2. The scaling is based on assumptions for traveling patterns of the populations.
    3. There are few alternatives for that scaling:
      1. Using locations geo-coordinates
      2. Using regular grids covering a certain area based on in-habited locations geo-coordinates
      3. Using traveling patterns contingency matrices
      4. Using “artificial” patterns of certain regular types for qualitative analysis purposes
  4. Enhance the multi-site traveling patterns matrix and re-scale the single site model.
    1. We might want to combine traveling patterns by ground transportation with traveling patterns by airplanes.
    2. For quarantine scenarios this might a less important capability of the monad.
      1. Hence, this an optional step.
  5. Assign stocks initial conditions for each of the sites in multi-scale model.
  6. Assign rates for each of the sites.
  7. Simulate.
  8. Plot global simulation results.
  9. Plot simulation results for focus sites.

Single-site models

We have a collection of single-site models that have different properties and different modeling goals, [AAp3, AA7, AA8]. Here is as diagram of a single-site model that includes hospital beds and medical supplies as limitation resources, [AA7]:

Coronavirus-propagation-simple-dynamics

SEI2HR model

In this sub-section we briefly describe the model SEI2HR, which is used in the examples below.

Remark: SEI2HR stands for “Susceptible, Exposed, Infected Two, Hospitalized, Recovered” (populations).

Detailed description of the SEI2HR model is given in [AA7].

Verbal description

We start with one infected (normally symptomatic) person, the rest of the people are susceptible. The infected people meet other people directly or get in contact with them indirectly. (Say, susceptible people touch things touched by infected.) For each susceptible person there is a probability to get the decease. The decease has an incubation period: before becoming infected the susceptible are (merely) exposed. The infected recover after a certain average infection period or die. A certain fraction of the infected become severely symptomatic. If there are enough hospital beds the severely symptomatic infected are hospitalized. The hospitalized severely infected have different death rate than the non-hospitalized ones. The number of hospital beds might change: hospitals are extended, new hospitals are build, or there are not enough medical personnel or supplies. The deaths from infection are tracked (accumulated.) Money for hospital services and money from lost productivity are tracked (accumulated.)

The equations below give mathematical interpretation of the model description above.

Equations

Here are the equations of one the epidemiology compartmental models, SEI2HR, [AA7], implemented in [AAp3]:

ModelGridTableForm[SEI2HRModel[t], "Tooltips" -> False]["Equations"] /. {eq_Equal :> TraditionalForm[eq]}
1fjs0wfx4mrpj

The equations for Susceptible, Exposed, Infected, Recovered populations of SEI2R are “standard” and explanations about them are found in [WK2, HH1]. For SEI2HR those equations change because of the stocks Hospitalized Population and Hospital Beds.

The equations time unit is one day. The time horizon is one year. In this document we consider COVID-19, [Wk2, AA1], hence we do not consider births.

Single-site model workflow demo

In this section we demonstrate some of the sensitivity analysis discussed in [AA6, BC1].

Make a single-site model, SEI2HR:

model1 = SEI2HRModel[t, "InitialConditions" -> True, "RateRules" -> True, "TotalPopulationRepresentation" -> "AlgebraicEquation"];

Make an association with “default” parameters:

aDefaultPars = <|    aip -> 26,     aincp -> 5,     \[Beta][ISSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],     \[Beta][INSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],     qsd -> 60,     ql -> 21,     qcrf -> 0.25,     \[Beta][HP] -> 0.01,     \[Mu][ISSP] -> 0.035/aip,     \[Mu][INSP] -> 0.01/aip,     nhbr[TP] -> 3/1000,     lpcr[ISSP, INSP] -> 1,     hscr[ISSP, INSP] -> 1    |>;

Execute the workflow multiple times with different quarantine starts:

qlVar = 56;lsRes =  Map[   ECMMonUnit[]     ECMMonSetSingleSiteModel[model1]     ECMMonAssignRateRules[Join[aDefaultPars, <|qsd -> #, ql -> qlVar|>]]     ECMMonSimulate[365]     ECMMonPlotSolutions[{"Infected Severely Symptomatic Population"}, 240,        "Together" -> True, "Derivatives" -> False,        PlotRange -> {0, 12000}, ImageSize -> 250,        Epilog -> {Orange, Dashed, Line[{{#1, 0}, {#1, 12000}}], Line[{{#1 + qlVar, 0}, {#1 + qlVar, 12000}}]},        PlotLabel -> Row[{"Quarantine start:", Spacer[5], #1, ",", Spacer[5], "end:", Spacer[5], #1 + qlVar}],        "Echo" -> False]     ECMMonTakeValue &, Range[25, 100, 5]];

Plot the simulation solutions for “Infected Severely Symptomatic Population”:

Multicolumn[#[[1, 1]] & /@ lsRes, 4]
1db54x7hgf4iw

Both theoretical and computational details of the workflow above are given [AA7, AA8].

Multi-site workflow demo

In this section we demonstrate the multi-site model workflow using COVID-19 data for USA, [WRI2, NYT1].

Here a “seed”, single-site model is created:

model1 = SEI2HRModel[t, "InitialConditions" -> True, "RateRules" -> True, "TotalPopulationRepresentation" -> "AlgebraicEquation"];

Here we specify a multi-site model workflow (the monadic steps are separated and described with purple print-outs):

ecmObj =    ECMMonUnit[]    ECMMonSetSingleSiteModel[model1]    ECMMonAssignRateRules[     <|      aip -> 26,       aincp -> 5,       \[Beta][ISSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],       \[Beta][INSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],       qsd -> 0,       ql -> 56,       qcrf -> 0.25,       \[Beta][HP] -> 0.01,       \[Mu][ISSP] -> 0.035/aip,       \[Mu][INSP] -> 0.01/aip,       nhbr[TP] -> 3/1000,       lpcr[ISSP, INSP] -> 1,       hscr[ISSP, INSP] -> 1      |>     ]    ECMMonEcho[Style["Show the single-site model tabulated form:", Bold, Purple]]    ECMMonEchoFunctionContext[Magnify[ModelGridTableForm[#singleSiteModel], 1] &]    ECMMonMakePolygonGrid[Keys[aPopulations], 1.5, "BinningFunction" -> Automatic]    ECMMonEcho[Style["Show the grid based on population coordinates:", Bold, Purple]]    ECMMonPlotGrid["CellIDs" -> True, ImageSize -> Large]    ECMMonExtendByGrid[aPopulations, 0.12]    ECMMonAssignInitialConditions[aPopulations, "Total Population", "Default" -> 0]    ECMMonAssignInitialConditions[DeriveSusceptiblePopulation[aPopulations, aInfected, aDead], "Susceptible Population", "Default" -> 0]    ECMMonAssignInitialConditions[<||>, "Exposed Population", "Default" -> 0]    ECMMonAssignInitialConditions[aInfected, "Infected Normally Symptomatic Population", "Default" -> 0]    ECMMonAssignInitialConditions[<||>, "Infected Severely Symptomatic Population", "Default" -> 0]    ECMMonEcho[Style["Show total populations initial conditions data:", Bold, Purple]]    ECMMonPlotGridHistogram[aPopulations, ImageSize -> Large, PlotLabel -> "Total populations"]    ECMMonEcho[Style["Show infected and deceased initial conditions data:", Bold,Purple]]    ECMMonPlotGridHistogram[aInfected, ColorFunction -> ColorData["RoseColors"], "ShowDataPoints" -> False, ImageSize -> Large, PlotLabel -> "Infected"]    ECMMonPlotGridHistogram[aDead, ColorFunction -> ColorData["RoseColors"], "ShowDataPoints" -> False, ImageSize -> Large, PlotLabel -> "Deceased"]    ECMMonEcho[Style["Simulate:", Bold, Purple]]    ECMMonSimulate[365]    ECMMonEcho[Style["Show global population simulation results:", Bold, Purple]]    ECMMonPlotSolutions[__ ~~ "Population", 365]    ECMMonEcho[Style["Show site simulation results for Miami and New York areas:", Bold, Purple]]    ECMMonPlotSiteSolutions[{160, 174}, __ ~~ "Population", 365]    ECMMonEcho[Style["Show deceased and hospitalzed populations results for Miami and New York areas:", Bold, Purple]]    ECMMonPlotSiteSolutions[{160, 174}, {"Deceased Infected Population", "Hospitalized Population","Hospital Beds"}, 300, "FocusTime" -> 120];
0y0cfyttq7yn6 0gh46eljka9v5 04xdyad4mmniu 1mn5x0eb0hbvl 009qnefuurlhw 09xeiox9ane2w 13kldiv0a47ua 0jjxrwhkaj7ys 1qgjcibkzvib0 1bm71msuv1s02 0l96q6lg8vg9n 1w1u7nso952k8 0gp4bff16s2rj 1pfvwkdwlm72j 0in55gql6v1yi 1fxxap0npojve 1e0ctc7lt1tbh

Theoretical and computational details about the multi-site workflow can be found in [AA4, AA5].

Batch simulations and calibration processes

In this section we describe the in general terms the processes of model batch simulations and model calibration. The next two sections give more details of the corresponding software design and workflows.

Definitions

Batch simulation: If given a SD model (M), the set (P) of parameters of (M), and a set (B) of sets of values (P), (B\text{:=}\left{V_i\right}), then the set of multiple runs of (M) over (B) are called batch simulation.

Calibration: If given a model (M), the set (P) of parameters of (M), and a set of (k) time series (T\text{:=}\left{T_i\right}_{i=1}^k) that correspond to the set of stocks (S\text{:=}\left{S_i\right}_{i=1}^k) of (M) then the process of finding concrete the values (V) for (P) that make the stocks (S) to closely resemble the time series (T) according to some metric is called calibration of (M) over the targets (T).

Roles

  • There are three types of people dealing with the models:
    • Modeler, who develops and implements the model and prepares it for calibration.
    • Calibrator, who calibrates the model with different data for different parameters.
    • Stakeholder, who requires different features of the model and outcomes from different scenario play-outs.
  • There are two main calibration scenarios:
    • Modeler and Calibrator are the same person
    • Modeler and Calibrator are different persons

Process

Model development and calibration is most likely going to be an iterative process.

For concreteness let us assume that the model has matured development-wise and batch simulation and model calibration is done in a (more) formal way.

Here are the steps of a well defined process between the modeling activity players described above:

  1. Stakeholder requires certain scenarios to be investigated.
  2. Modeler prepares the model for those scenarios.
  3. Stakeholder and Modeler formulate a calibration request.
  4. Calibrator uses the specifications from the calibration request to:
    1. Calibrate the model
    2. Derive model outcomes results
    3. Provide model qualitative results
    4. Provide model sensitivity analysis results
  5. Modeler (and maybe Stakeholder) review the results and decides should more calibration be done.
    1. I.e. go to 3.
  6. Modeler does batch simulations with the calibrated model for the investigation scenarios.
  7. Modeler and Stakeholder prepare report with the results.

See the documents [AA9, AA10] have questionnaires that further clarify the details of interaction between the modelers and calibrators.

Batch simulation vs calibration

In order to clarify the similarities and differences between batch simulation and calibration we list the following observations:

  • Each batch simulation or model calibration is done either for model development purposes or for scenario play-out studies.
  • Batch simulation is used for qualitative studies of the model. For example, doing sensitivity analysis; see [BC1, AA7, AA8].
  • Before starting the calibration we might want to study the “landscape” of the search space of the calibration parameters using batch simulations.
  • Batch simulation is also done after model calibration in order to evaluate different scenarios,
  • For some models with large computational complexity batch simulation – together with some evaluation metric – can be used instead of model calibration.

Batch simulations workflow

In this section we describe the specification and execution of model batch simulations.

Batch simulations can be time consuming, hence it is good idea to

In the rest of the section we go through the following steps:

  1. Make a model object
  2. Batch simulate over a few combinations of parameters and show:
    1. Plots of the simulation results for all populations
    2. Plots of the simulation results for a particular population
  3. Batch simulate over the Cartesian (outer) product of values lists of a selected pair of parameters and show the corresponding plots of all simulations

Model (object) for batch simulations

Here we make a new ECMMon object:

ecmObj2 =  ECMMonUnit[]   ECMMonSetSingleSiteModel[model1]   ECMMonAssignRateRules[aDefaultPars];

Direct specification of combinations of parameters

All populations

Here we simulate the model in the object of different parameter combinations given in a list of associations:

res1 =  ecmObj2   ECMMonBatchSimulate[___ ~~ "Population", {<|qsd -> 60, ql -> 28|>, <|qsd -> 55, ql -> 28|>, <|qsd -> 75, ql -> 21, \[Beta][ISSP] -> 0|>}, 240]   ECMMonTakeValue;

Remark: The stocks in the results are only stocks that are populations – that is specified with the string expression pattern ___~~”Population”.

Here is the shape of the result:

Short /@ res1
0a1hc3qgvuof7

Here are the corresponding plots:

ListLinePlot[#, PlotTheme -> "Detailed", ImageSize -> Medium, PlotRange -> All] & /@ res1
1veewx20b1778

Focus population

We might be interested in the batch simulations results for only one, focus populations. Here is an example:

res2 =    ecmObj2    ECMMonBatchSimulate["Infected Normally Symptomatic Population", {<|qsd -> 60, ql -> 28|>, <|qsd -> 55, ql -> 28|>, <|qsd -> 75, ql -> 21, \[Beta][ISSP] -> 0|>}, 240]    ECMMonTakeValue;

Here is the shape of the result:

Short /@ res2
0es5cj4hgrljm

Here are the corresponding plots:

Multicolumn[ KeyValueMap[  ListLinePlot[#2, PlotLabel -> #1, PlotTheme -> "Detailed",     Epilog -> {Directive[Orange, Dashed],       Line[{Scaled[{0, -1}, {#1[qsd], 0}], Scaled[{0, 1}, {#1[qsd], 0}]}],       Line[{Scaled[{0, -1}, {#1[qsd] + #1[ql], 0}], Scaled[{0, 1}, {#1[qsd] + #1[ql], 0}]}]},     ImageSize -> Medium] &, res2]]
17ofnpddvw8ud

Outer product of parameters

Instead of specifying an the combinations of parameters directly we can specify the values taken by each parameter using an association in which the keys are parameters and the values are list of values:

res3 =    ecmObj2    ECMMonBatchSimulate[__ ~~ "Population", <|qsd -> {60, 55, 75}, ql -> {28, 21}|>, 240]    ECMMonTakeValue;

Here is the shallow form of the results

Short /@ res3
1e260m2euunsl

Here are the corresponding plots:

Multicolumn[ KeyValueMap[  ListLinePlot[#2, PlotLabel -> #1, PlotTheme -> "Detailed",     Epilog -> {Directive[Gray, Dashed],       Line[{Scaled[{0, -1}, {#1[qsd], 0}], Scaled[{0, 1}, {#1[qsd], 0}]}],       Line[{Scaled[{0, -1}, {#1[qsd] + #1[ql], 0}], Scaled[{0, 1}, {#1[qsd] + #1[ql], 0}]}]},     ImageSize -> Medium] &, res3]]
1u4ta8aypszhz

Calibration workflow

In this section we go through the computation steps of the calibration of single-site SEI2HR model.

Remark: We use real data in this section, but the presented calibration results and outcome plots are for illustration purposes only. A rigorous study with discussion of the related assumptions and conclusions is beyond the scope of this notebook/document.

Calibration steps

Here are the steps performed in the rest of the sub-sections of this section:

  1. Ingest data for infected cases, deaths due to disease, etc.
  2. Choose a model to calibrate.
  3. Make the calibration targets – those a vectors corresponding to time series over regular grids.
    1. Consider using all of the data in order to evaluate model’s applicability.
    2. Consider using fractions of the data in order to evaluate model’s ability to predict the future or reconstruct data gaps.
  4. Choose calibration parameters and corresponding ranges for their values.
  5. If more than one target choose the relative weight (or importance) of the targets.
  6. Calibrate the model.
  7. Evaluate the fitting between the simulation results and data.
    1. Using statistics and plots.
  8. Make conclusions. If insufficiently good results are obtained go to 2 or 4.

Remark: When doing calibration epidemiological models a team of people it is better certain to follow (rigorously) well defined procedures. See the documents:

Remark: We plan to prepare have several notebooks dedicated to calibration of both single-site and multi-site models.

USA COVID-19 data

Here data for the USA COVID-19 infection cases and deaths from [NYT1] (see [AA6] data ingestion details):

lsCases = {1, 1, 1, 2, 3, 5, 5, 5, 5, 6, 7, 8, 11, 11, 11, 12, 12, 12,12, 12, 13, 13, 14, 15, 15, 15, 15, 25, 25, 25, 27, 30, 30, 30, 43, 45, 60, 60, 65, 70, 85, 101, 121, 157, 222, 303, 413, 530, 725,976, 1206, 1566, 2045, 2603, 3240, 4009, 5222, 6947, 9824, 13434, 17918, 23448, 30082, 37696, 46791, 59678, 73970, 88796, 103318, 119676, 139165, 160159, 184764, 212033, 241127, 262275, 288195, 314991, 341540, 370689, 398491, 423424, 445213, 467106, 490170, 512972, 539600, 566777, 590997, 613302, 637812, 660549, 685165, 714907, 747741, 777098, 800341, 820764, 844225, 868644, 895924, 927372, 953923, 977395, 998136, 1020622, 1043873, 1069587, 1095405,1118643, 1137145, 1155671, 1176913, 1196485, 1222057, 1245777, 1267911, 1285105, 1306316, 1326559, 1349019, 1373255, 1395981, 1416682, 1436260, 1455183, 1473813, 1491974, 1513223, 1536848, 1559020, 1578876, 1600414, 1620096, 1639677, 1660303, 1688335, 1709852, 1727711, 1745546, 1763803, 1784049, 1806724, 1831494, 1855870, 1874023, 1894074, 1918373, 1943743, 1970066, 2001470, 2031613, 2057493, 2088420, 2123068, 2159633, 2199841, 2244876, 2286401, 2324563, 2362875, 2411709, 2461341, 2514500, 2573030, 2622980, 2667278, 2713656, 2767129, 2825865, 2885325, 2952393, 3012349, 3069369, 3129738, 3194944, 3263077, 3338308, 3407962, 3469137, 3529938, 3588229, 3653114, 3721574, 3790356, 3862588, 3928575, 3981476, 4039440, 4101329, 4167741, 4235717, 4303663, 4359188, 4408708, 4455340, 4507370, 4560539, 4617036, 4676822, 4730639, 4777548, 4823529, 4876038, 4929115, 4981066, 5038637, 5089258, 5130147, 5166032, 5206970, 5251824, 5297150, 5344322, 5388034, 5419494, 5458726, 5497530, 5541830, 5586297, 5631403, 5674714, 5707327, 5742814, 5786178, 5817338, 5862014, 5917466, 5958619, 5988001, 6012054, 6040456, 6073671, 6110645, 6157050, 6195893, 6228601, 6264192, 6301923, 6341145, 6385411, 6432677, 6472474, 6507345, 6560827, 6597281, 6638806, 6682079, 6734971, 6776512, 6812354, 6847745, 6889421, 6930523, 6975693, 7027692, 7073962, 7107992, 7168298, 7210171, 7261433, 7315687, 7373073, 7422798, 7466501, 7513020, 7565839, 7625285, 7688761, 7757326, 7808123, 7853753, 7916879, 7976530, 8039653, 8113165, 8193658, 8270925, 8328369, 8401001, 8473618, 8555199, 8642599, 8737995, 8814233, 8892933, 8983153, 9074711, 9182627, 9301455, 9436244, 9558668, 9659817, 9784920, 9923082, 10065150, 10222441, 10405550, 10560047, 10691686, 10852769, 11011484, 11183982, 11367840, 11561152, 11727724, 11864571, 12039323, 12213742, 12397014, 12495699, 12693598, 12838076, 12972986, 13135728, 13315143, 13516558, 13728192, 13958512, 14158135, 14325555, 14519697, 14731424, 14954596, 15174109, 15447371, 15647963, 15826415, 16020169, 16218331, 16465552, 16697862, 16941306, 17132902, 17305013, 17498382, 17694678, 17918870, 18106293, 18200349, 18410644, 18559596, 18740591, 18932346, 19157710};
lsDeaths = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 6, 10, 12, 12, 15, 19, 22, 26, 31, 37, 43, 50, 59, 63, 84, 106, 137, 181, 223, 284, 335, 419, 535, 694, 880, 1181, 1444, 1598, 1955, 2490, 3117, 3904, 4601, 5864, 6408, 7376, 8850, 10159, 11415,12924, 14229, 15185, 16320, 18257, 20168, 21941, 23382, 24617, 26160, 27535, 29821, 31633, 33410, 35104, 36780, 37660, 38805, 40801, 42976, 44959, 46552, 48064, 49122, 50012, 52079, 54509, 56277, 57766, 59083, 59903, 60840, 62299, 63961, 65623, 67143, 68260, 68959, 69633, 71042, 72474, 73718, 74907, 75891, 76416, 76888, 77579, 79183, 80329, 81452, 82360, 82904, 83582, 84604, 85545, 86487, 87559, 88256, 88624, 89235, 90220, 91070, 91900, 92621, 93282, 93578, 93988, 94710, 95444, 96123, 96760, 97297, 97514, 97832, 98638, 99372, 101807, 102525, 103018, 103258, 103594,104257, 104886, 105558, 106148, 106388, 106614, 106998, 107921, 108806, 109705, 110522, 111177, 111567, 111950, 112882, 113856, 114797, 115695, 116458, 116854, 117367, 118475, 119606, 120684, 121814, 122673, 123105, 124782, 126089, 127465, 128720, 130131, 131172, 131584, 132174, 133517, 134756, 135271, 136584, 137547, 138076, 138601, 140037, 141484, 142656, 143793, 144819, 145315, 145831, 147148, 148517, 149550, 150707, 151641, 152065, 152551, 153741, 154908, 156026, 157006, 157869, 158235, 158714, 159781, 160851, 161916, 162869, 163573, 163970, 164213, 164654, 165811, 166713, 167913, 168594, 168990, 169423, 170674, 171663, 172493, 173408, 174075, 174281, 174689, 175626, 176698, 177559, 178390, 179145, 179398, 179736, 180641, 181607, 182455, 183282, 183975, 184298, 184698, 185414, 186359, 187280, 188168, 188741, 189165, 189501, 190312, 191267, 192077, 192940, 193603, 193976, 194481, 195405, 196563, 197386, 198271, 199126, 199462, 199998, 200965, 201969, 202958, 203878, 204691, 205119, 205623, 206757, 208331, 209417, 210829, 211838, 212277, 212989, 214442, 215872, 217029, 218595, 219791, 220402, 221165, 222750, 224641, 226589, 228452, 229868, 230695, 231669, 233856, 236127, 237284, 238636, 239809, 240607, 241834, 244430, 247258, 250102, 252637, 254784, 255857, 257282, 260038, 263240, 266144, 268940, 271172, 272499, 274082, 277088, 280637, 283899, 286640, 289168, 290552, 292346, 295549, 298953, 301713, 302802, 304403, 305581, 307396, 310994, 314654};

Remark: The COVID-19 data was ingested from [NYT1] on 2020-12-31,

Calibration targets

From the data we make the calibration targets association:

aTargets = <|{ISSP -> 0.2 lsCases, INSP -> 0.8 lsCases, DIP -> lsDeaths}|>;

Remark: Note that we split the infection cases into 20% severely symptomatic cases and 80% normally symptomatic cases.

Here is the corresponding plot:

ListLogPlot[aTargets, PlotTheme -> "Detailed", PlotLabel -> "Calibration targets", ImageSize -> Medium]
1u9akes6b4dx0

Here we prepare a smaller set of the targets data for the calibration experiments below:

aTargetsShort = Take[#, 170] & /@ aTargets;

Model creation

modelSEI2HR = SEI2HRModel[t, "TotalPopulationRepresentation" -> "AlgebraicEquation"];

Here are the parameters we want to experiment with (or do calibration with):

lsFocusParams = {aincp, aip, sspf[SP], \[Beta][HP], qsd, ql, qcrf, nhbcr[ISSP, INSP], nhbr[TP]};

Here we set custom rates and initial conditions:

aDefaultPars = <|    \[Beta][ISSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],     \[Beta][INSP] -> 0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],     qsd -> 60,     ql -> 8*7,     qcrf -> 0.25,     \[Beta][HP] -> 0.01,     \[Mu][ISSP] -> 0.035/aip,     \[Mu][INSP] -> 0.01/aip,     nhbr[TP] -> 3/1000,     lpcr[ISSP, INSP] -> 1,     hscr[ISSP, INSP] -> 1    |>;

Remark: Note the piecewise functions for (\beta [\text{ISSP}]) and (\beta [\text{INSP}]).

Calibration

Here is the USA population number we use for calibration:

usaPopulation = QuantityMagnitude@CountryData["UnitedStates", "Population"](*329064917*)

Here is we create a ECMMon object that has default parameters and initial conditions assigned above:

AbsoluteTiming[  ecmObj3 =     ECMMonUnit[]     ECMMonSetSingleSiteModel[modelSEI2HR]     ECMMonAssignInitialConditions[<|TP[0] -> usaPopulation, SP[0] -> usaPopulation - 1, ISSP[0] -> 1|>]     ECMMonAssignRateRules[KeyDrop[aDefaultPars, {aip, aincp, qsd, ql, qcrf}]]     ECMMonCalibrate[      "Target" -> KeyTake[aTargetsShort, {ISSP, DIP}],       "StockWeights" -> <|ISSP -> 0.8, DIP -> 0.2|>,       "Parameters" -> <|aip -> {10, 35}, aincp -> {2, 16}, qsd -> {60, 120}, ql -> {20, 160}, qcrf -> {0.1, 0.9}|>,       DistanceFunction -> EuclideanDistance,       Method -> {"NelderMead", "PostProcess" -> False},       MaxIterations -> 1000      ];  ](*{28.0993, Null}*)

Here are the found parameters:

calRes = ecmObj3ECMMonTakeValue(*{152516., {aip -> 10., aincp -> 3.67018, qsd -> 81.7067, ql -> 111.422, qcrf -> 0.312499}}*)

Using different minimization methods and distance functions

In the monad the calibration of the models is done with NMinimize. Hence, the monad function ECMMonCalibrate takes all options of NMinimize and can do calibrations with the same data and parameter search space using different global minima finding methods and distance functions.

Remark: EuclideanDistance is an obvious distance function, but use others like infinity norm and sum norm. Also, we can use a distance function that takes parts of the data. (E.g. between days 50 and 150 because the rest of the data is, say, unreliable.)

Verification of the fit

maxTime = Length[aTargets[[1]]];
ecmObj4 =    ECMMonUnit[]    ECMMonSetSingleSiteModel[modelSEI2HR]    ECMMonAssignInitialConditions[<|TP[0] -> usaPopulation, SP[0] -> usaPopulation - 1, ISSP[0] -> 1|>]    ECMMonAssignRateRules[Join[aDefaultPars, Association[calRes[[2]]]]]    ECMMonSimulate[maxTime]    ECMMonPlotSolutions[___ ~~ "Population" ~~ ___, maxTime, ImageSize -> Large, LogPlot -> False];
12rw9e4vl0vy0
aSol4 = ecmObj4ECMMonGetSolutionValues[Keys[aTargets], maxTime]ECMMonTakeValue;
Map[ListLogPlot[{aSol4[#], aTargets[#]}, PlotLabel -> #, PlotRange -> All, ImageSize -> Medium, PlotLegends -> {"Calibrated model", "Target"}] &, Keys[aTargets]]
10kkuh7578jjz

Conclusions from the calibration results

We see the that with the calibration found parameter values the model can fit the data for the first 200 days, after that it overestimates the evolution of the infected and deceased popupulations.

We can conjecture that:

  • The model is too simple, hence inadequate
  • That more complicated quarantine policy functions have to be used
  • That the calibration process got stuck in some local minima

Future plans

In this section we outline some of the directions in which the presented work on ECMMon can be extended.

More unit tests and random unit tests

We consider the preparation and systematic utilization of unit tests to be a very important component of any software development. Unit tests are especially important when complicated software package like ECMMon are developed.

For the presented software monad (and its separately developed, underlying packages) have implemented a few collections of tests, see [AAp10, AAp11].

We plan to extend and add more complicated unit tests that test for both quantitative and qualitative behavior. Here are some examples for such tests:

  • Stock-vs-stock orbits produced by simulations of certain epidemic models
  • Expected theoretical relationships between populations (or other stocks) for certain initial conditions and rates
  • Wave-like propagation of the proportions of the infected populations in multi-site models over artificial countries and traveling patterns
  • Finding of correct parameter values with model calibration over different data (both artificial and real life)
  • Expected number of equations for different model set-ups
  • Expected (relative) speed of simulations with respect to model sizes

Further for the monad ECMMon we plant to develop random pipeline unit tests as the ones in [AAp12] for the classification monad ClCon, [AA11].

More comprehensive calibration guides and documentation

We plan to produce more comprehensive guides for doing calibration with ECMMon and in general with Mathematica’s NDSolve and NMinimize functions.

Full correspondence between the Mathematica and R implementations

The ingredients of the software monad ECMMon and ECMMon itself were designed and implemented in Mathematica first. The corresponding design and implementation was done in R, [AAr2]. To distinguish the two implementations we call the R one ECMMon-R and Mathematica (Wolfram Language) one ECMMon-WL.

At this point the calibration is not implemented in ECMMon-R, but we plan to do that soon.

Using ECMMon-R (and the RStudio’s Shiny ecosystem) allows for highly shareable interactive interfaces to be programed. Here is an example: https://antononcube.shinyapps.io/SEI2HR-flexdashboard/ .

(With Mathematica similar interactive interfaces are presented in [AA7, AA8].)

Model transfer between Mathematica and R

We are very interested in transferring epidemiological models from Mathematica to R (or Python, or Julia.)

This can be done in two principle ways: (i) using Mathematica expressions parsers, or (ii) using matrix representations. We plan to investigate the usage of both approaches.

Conversational agent

Consider the making of a conversational agent for epidemiology modeling workflows building. Initial design and implementation is given in [AA13, AA14].

Consider the following epidemiology modeling workflow specification:

lsCommands = "create with SEI2HR;assign 100000 to total population;set infected normally symptomatic population to be 0;set infected severely symptomatic population to be 1;assign 0.56 to contact rate of infected normally symptomatic population;assign 0.58 to contact rate of infected severely symptomatic population;assign 0.1 to contact rate of the hospitalized population;simulate for 240 days;plot populations results;calibrate for target DIPt -> tsDeaths, over parameters contactRateISSP in from 0.1 to 0.7;echo pipeline value";

Here is the ECMMon code generated using the workflow specification:

ToEpidemiologyModelingWorkflowCode[lsCommands, "Execute" -> False, "StringResult" -> True](*"ECMMonUnit[SEI2HRModel[t]] ECMMonAssignInitialConditions[<|TP[0] -> 100000|>] ECMMonAssignInitialConditions[<|INSP[0] -> 0|>] ECMMonAssignInitialConditions[<|ISSP[0] -> 1|>] ECMMonAssignRateRules[<|\\[Beta][INSP] -> 0.56|>] ECMMonAssignRateRules[<|\\[Beta][ISSP] -> 0.58|>] ECMMonAssignRateRules[<|\\[Beta][HP] -> 0.1|>] ECMMonSimulate[\"MaxTime\" -> 240] ECMMonPlotSolutions[ \"Stocks\" -> __ ~~ \"Population\"] ECMMonCalibrate[ \"Target\" -> <|DIP -> tsDeaths|>, \"Parameters\" -> <|\\[Beta][ISSP] -> {0.1, 0.7}|> ] ECMMonEchoValue[]"*)

Here is the execution of the code above:

Block[{tsDeaths = Take[lsDeaths, 150]}, ToEpidemiologyModelingWorkflowCode[lsCommands]];
0cs73r03ki1mi 0227wiq3locdf

Different target languages

Using the natural commands workflow specification we can generate code to other languages, like, Python or R:

ToEpidemiologyModelingWorkflowCode[lsCommands, "Target" -> "Python"](*"obj = ECMMonUnit( model = SEI2HRModel())obj = ECMMonAssignInitialConditions( ecmObj = obj, initConds = [TPt = 100000])obj = ECMMonAssignInitialConditions( ecmObj = obj, initConds = [INSPt = 0])obj = ECMMonAssignInitialConditions( ecmObj = obj, initConds = [ISSPt = 1])obj = ECMMonAssignRateValues( ecmObj = obj, rateValues = [contactRateINSP = 0.56])obj = ECMMonAssignRateValues( ecmObj = obj, rateValues = [contactRateISSP = 0.58])obj = ECMMonAssignRateValues( ecmObj = obj, rateValues = [contactRateHP = 0.1])obj = ECMMonSimulate( ecmObj = obj, maxTime = 240)obj = ECMMonPlotSolutions( ecmObj = obj, stocksSpec = \".*Population\")obj = ECMMonCalibrate( ecmObj = obj,  target = [DIPt = tsDeaths], parameters = [contactRateISSP = [0.1, 0.7]] )"*)

References

Articles

[Wk1] Wikipedia entry, Monad.

[Wk2] Wikipedia entry, “Compartmental models in epidemiology”.

[Wk3] Wikipedia entry, “Coronavirus disease 2019”.

[BC1] Lucia Breierova, Mark Choudhari, An Introduction to Sensitivity Analysis, (1996), Massachusetts Institute of Technology.

[JS1] John D.Sterman, Business Dynamics: Systems Thinking and Modeling for a Complex World. (2000), New York: McGraw.

[HH1] Herbert W. Hethcote (2000). “The Mathematics of Infectious Diseases”. SIAM Review. 42 (4): 599–653. Bibcode:2000SIAMR..42..599H. doi:10.1137/s0036144500371907.

[AA1] Anton Antonov, ”Monad code generation and extension”, (2017), MathematicaForPrediction at GitHub/antononcube.

[AA2] Anton Antonov, “Coronavirus propagation modeling considerations”, (2020), SystemModeling at GitHub/antononcube.

[AA3] Anton Antonov, “Basic experiments workflow for simple epidemiological models”, (2020), SystemModeling at GitHub/antononcube.

[AA4] Anton Antonov, “Scaling of Epidemiology Models with Multi-site Compartments”, (2020), SystemModeling at GitHub/antononcube.

[AA5] Anton Antonov, “WirVsVirus hackathon multi-site SEI2R over a hexagonal grid graph”, (2020), SystemModeling at GitHub/antononcube.

[AA6] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub/antononcube.

[AA7] Anton Antonov, “SEI2HR model with quarantine scenarios”, (2020), SystemModeling at GitHub/antononcube.

[AA8] Anton Antonov, “SEI2HR-Econ model with quarantine and supplies scenarios”, (2020), SystemModeling at GitHub/antononcube.

[AA9] Anton Antonov, Modelers questionnaire, (2020), SystemModeling at GitHub/antononcube.

[AA10] Anton Antonov, Calibrators questionnaire, (2020), SystemModeling at GitHub/antononcube.

[AA11] Anton Antonov, A monad for classification workflows, (2018), MathematicaForPrediction at WordPress.

Repositories, packages

[WRI1] Wolfram Research, Inc., “Epidemic Data for Novel Coronavirus COVID-19”, WolframCloud.

[WRI2] Wolfram Research Inc., USA county records, (2020), System Modeling at GitHub.

[NYT1] The New York Times, Coronavirus (Covid-19) Data in the United States, (2020), GitHub.

[AAr1] Anton Antonov, Coronavirus propagation dynamics project, (2020), SystemModeling at GitHub/antononcube.

[AAr2] Anton Antonov, Epidemiology Compartmental Modeling Monad R package, (2020), ECMMon-R at GitHu/antononcube.

[AAp1] Anton Antonov, State monad code generator Mathematica package, (2017), MathematicaForPrediction at GitHub/antononcube.

[AAp2] Anton Antonov, Monadic tracing Mathematica package, (2017), MathematicaForPrediction at GitHub/antononcube.

[AAp3] Anton Antonov, Epidemiology models Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp4] Anton Antonov, Epidemiology models modifications Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp5] Anton Antonov, Epidemiology modeling visualization functions Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp6] Anton Antonov, System dynamics interactive interfaces functions Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp7] Anton Antonov, Multi-site model simulation Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp8] Anton Antonov, Monadic Epidemiology Compartmental Modeling Mathematica package, (2020), SystemModeling at GitHub/antononcube.

[AAp9] Anton Antonov, Hextile bins Mathematica package, (2020), MathematicaForPrediction at GitHub/antononcube.

[AAp10] Anton Antonov, Monadic Epidemiology Compartmental Modeling Mathematica unit tests, (2020), SystemModeling at GitHub/antononcube.

[AAp11] Anton Antonov, Epidemiology Models NDSolve Mathematica unit tests, (2020), SystemModeling at GitHub/antononcube.

[AAp12] Anton Antonov, Monadic contextual classification random pipelines Mathematica unit tests, (2018), MathematicaForPrediction at GitHub/antononcube.

[AAp13] Anton Antonov, Epidemiology Modeling Workflows Raku package, (2020), Raku-DSL-English-EpidemiologyModelingWorkflows at GitHu/antononcube.

[AAp14] Anton Antonov, External Parsers Hookup Mathematica package, (2019), ConversationalAgents at GitHub.

Generation of Random Bethlehem Stars

Introduction

This document/notebook is inspired by the Mathematica Stack Exchange (MSE) question “Plotting the Star of Bethlehem”, [MSE1]. That MSE question requests efficient and fast plotting of a certain mathematical function that (maybe) looks like the Star of Bethlehem, [Wk1]. Instead of doing what the author of the questions suggests, I decided to use a generative art program and workflows from three of most important Machine Learning (ML) sub-cultures: Latent Semantic Analysis, Recommendations, and Classification.

Although we discuss making of Bethlehem Star-like images, the ML workflows and corresponding code presented in this document/notebook have general applicability – in many situations we have to make classifiers based on data that has to be “feature engineered” through pipeline of several types of ML transformative workflows and that feature engineering requires multiple iterations of re-examinations and tuning in order to achieve the set goals.

The document/notebook is structured as follows:

  1. Target Bethlehem Star images
  2. Simplistic approach
  3. Elaborated approach outline
  4. Sections that follow through elaborated approach outline:
    1. Data generation
    2. Feature extraction
    3. Recommender creation
    4. Classifier creation and utilization experiments

(This document/notebook is a “raw” chapter for the book “Simplified Machine Learning Workflows”, [AAr3].)

Target images

Here are the images taken from [MSE1] that we consider to be “Bethlehem Stars” in this document/notebook:

imgStar1 = Import["https://i.stack.imgur.com/qmmOw.png"];
imgStar2 = Import["https://i.stack.imgur.com/5gtsS.png"];
Row[{imgStar1, Spacer[5], imgStar2}]
00dxgln7hhmjl

We notice that similar images can be obtained using the Wolfram Function Repository (WFR) function RandomMandala, [AAr1]. Here are a dozen examples:

SeedRandom[5];
Multicolumn[Table[MandalaToWhiterImage@ResourceFunction["RandomMandala"]["RotationalSymmetryOrder" -> 2, "NumberOfSeedElements" -> RandomInteger[{2, 8}], "ConnectingFunction" -> FilledCurve@*BezierCurve], 12], 6, Background -> Black]
0dwkbztss087q

Simplistic approach

We can just generate a large enough set of mandalas and pick the ones we like.

More precisely we have the following steps:

  1. We generate, say, 200 random mandalas using BlockRandom and keeping track of the random seeds
    1. The mandalas are generated with rotational symmetry order 2 and filled Bezier curve connections.
  2. We pick mandalas that look, more or less, like Bethlehem Stars
  3. Add picked mandalas to the results list
  4. If too few mandalas are in the results list go to 1.

Here are some mandalas generated with those steps:

lsStarReferenceSeeds = DeleteDuplicates@{697734, 227488491, 296515155601, 328716690761, 25979673846, 48784395076, 61082107304, 63772596796, 128581744446, 194807926867, 254647184786, 271909611066, 296515155601, 575775702222, 595562118302, 663386458123, 664847685618, 680328164429, 859482663706};
Multicolumn[
  Table[BlockRandom[ResourceFunction["RandomMandala"]["RotationalSymmetryOrder" -> 2, "NumberOfSeedElements" -> Automatic, "ConnectingFunction" -> FilledCurve@*BezierCurve, ColorFunction -> (White &), Background -> Black], RandomSeeding -> rs], {rs, lsStarReferenceSeeds}] /. GrayLevel[0.25`] -> White, 6, Appearance -> "Horizontal", Background -> Black]
1aedatd1zb3fh

Remark: The plot above looks prettier in notebook converted with the resource function DarkMode.

Elaborated approach

Assume that we want to automate the simplistic approach described in the previous section.

One way to automate is to create a Machine Learning (ML) classifier that is capable of discerning which RandomMandala objects look like Bethlehem Star target images and which do not. With such a classifier we can write a function BethlehemMandala that applies the classifier on multiple results from RandomMandala and returns those mandalas that the classifier says are good.

Here are the steps of building the proposed classifier:

  • Generate a large enough Random Mandala Images Set (RMIS)
  • Create a feature extractor from a subset of RMIS
  • Assign features to all of RMIS
  • Make a recommender with the RMIS features and other image data (like pixel values)
  • Apply the RMIS recommender over the target Bethlehem Star images and determine and examine image sets that are:
    • the best recommendations
    • the worse recommendations
  • With the best and worse recommendations sets compose training data for classifier making
  • Train a classifier
  • Examine classifier application to (filtering of) random mandala images (both in RMIS and not in RMIS)
  • If the results are not satisfactory redo some or all of the steps above

Remark: If the results are not satisfactory we should consider using the obtained classifier at the data generation phase. (This is not done in this document/notebook.)

Remark: The elaborated approach outline and flow chart have general applicability, not just for generation of random images of a certain type.

Flow chart

Here is a flow chart that corresponds to the outline above:

09agsmbmjhhv4

A few observations for the flow chart follow:

  • The flow chart has a feature extraction block that shows that the feature extraction can be done in several ways.
    • The application of LSA is a type of feature extraction which this document/notebook uses.
  • If the results are not good enough the flow chart shows that the classifier can be used at the data generation phase.
  • If the results are not good enough there are several alternatives to redo or tune the ML algorithms.
    • Changing or tuning the recommender implies training a new classifier.
    • Changing or tuning the feature extraction implies making a new recommender and a new classifier.

Data generation and preparation

In this section we generate random mandala graphics, transform them into images and corresponding vectors. Those image-vectors can be used to apply dimension reduction algorithms. (Other feature extraction algorithms can be applied over the images.)

Generated data

Generate large number of mandalas:

k = 20000;
knownSeedsQ = False;
SeedRandom[343];
lsRSeeds = Union@RandomInteger[{1, 10^9}, k];
AbsoluteTiming[
  aMandalas = 
    If[TrueQ@knownSeedsQ, 
     Association@Table[rs -> BlockRandom[ResourceFunction["RandomMandala"]["RotationalSymmetryOrder" -> 2, "NumberOfSeedElements" -> Automatic, "ConnectingFunction" -> FilledCurve@*BezierCurve], RandomSeeding -> rs], {rs, lsRSeeds}], 
    (*ELSE*) 
     Association@Table[i -> ResourceFunction["RandomMandala"]["RotationalSymmetryOrder" -> 2, "NumberOfSeedElements" -> Automatic, "ConnectingFunction" -> FilledCurve@*BezierCurve], {i, 1, k}] 
    ]; 
 ]

(*{18.7549, Null}*)

Check the number of mandalas generated:

Length[aMandalas]

(*20000*)

Show a sample of the generated mandalas:

Magnify[Multicolumn[MandalaToWhiterImage /@ RandomSample[Values@aMandalas, 40], 10, Background -> Black], 0.7]
1gpblane63eo9

Data preparation

Convert the mandala graphics into images using appropriately large (or appropriately small) image sizes:

AbsoluteTiming[
  aMImages = ParallelMap[ImageResize[#, {120, 120}] &, aMandalas]; 
 ]

(*{248.202, Null}*)

Flatten each of the images into vectors:

AbsoluteTiming[
  aMImageVecs = ParallelMap[Flatten[ImageData[Binarize@ColorNegate@ColorConvert[#, "Grayscale"]]] &, aMImages]; 
 ]

(*{16.0125, Null}*)

Remark: Below those vectors are called image-vectors.

Feature extraction

In this section we use the software monad LSAMon, [AA1, AAp1], to do dimension reduction over a subset of random mandala images.

Remark: Other feature extraction methods can be used through the built-in functions FeatureExtraction and FeatureExtract.

Dimension reduction

Create an LSAMon object and extract image topics using Singular Value Decomposition (SVD) or Independent Component Analysis (ICA), [AAr2]:

SeedRandom[893];
AbsoluteTiming[
  lsaObj = 
    LSAMonUnit[]⟹
     LSAMonSetDocumentTermMatrix[SparseArray[Values@RandomSample[aMImageVecs, UpTo[2000]]]]⟹
     LSAMonApplyTermWeightFunctions["None", "None", "Cosine"]⟹
     LSAMonExtractTopics["NumberOfTopics" -> 40, Method -> "ICA", "MaxSteps" -> 240, "MinNumberOfDocumentsPerTerm" -> 0]⟹
     LSAMonNormalizeMatrixProduct[Normalized -> Left]; 
 ]

(*{16.1871, Null}*)

Show the importance coefficients of the topics (if SVD was used the plot would show the singular values):

ListPlot[Norm /@ SparseArray[lsaObj⟹LSAMonTakeH], Filling -> Axis, PlotRange -> All, PlotTheme -> "Scientific"]
1sy1zsgpxysof

Show the interpretation of the extracted image topics:

lsaObj⟹
   LSAMonNormalizeMatrixProduct[Normalized -> Right]⟹
   LSAMonEchoFunctionContext[ImageAdjust[Image[Partition[#, ImageDimensions[aMImages[[1]]][[1]]]]] & /@ SparseArray[#H] &];
16h8a7jwknnkt

Approximation

Pick a test image that is a mandala image or a target image and pre-process it:

If[True, 
   ind = RandomChoice[Range[Length[Values[aMImages]]]]; 
   imgTest = MandalaToWhiterImage@aMandalas[[ind]]; 
   matImageTest = ToSSparseMatrix[SparseArray@List@ImageToVector[imgTest, ImageDimensions[aMImages[[1]]]], "RowNames" -> Automatic, "ColumnNames" -> Automatic], 
  (*ELSE*) 
   imgTest = Binarize[imgStar2, 0.5]; 
   matImageTest = ToSSparseMatrix[SparseArray@List@ImageToVector[imgTest, ImageDimensions[aMImages[[1]]]], "RowNames" -> Automatic, "ColumnNames" -> Automatic] 
  ];
imgTest
0vlq50ryrw0hl

Find the representation of the test image with the chosen feature extractor (LSAMon object here):

matReprsentation = lsaObj⟹LSAMonRepresentByTopics[matImageTest]⟹LSAMonTakeValue;
lsCoeff = Normal@SparseArray[matReprsentation[[1, All]]];
ListPlot[lsCoeff, Filling -> Axis, PlotRange -> All]
1u57b208thtfz

Show the interpretation of the found representation:

H = SparseArray[lsaObj⟹LSAMonNormalizeMatrixProduct[Normalized -> Right]⟹LSAMonTakeH];
vecReprsentation = lsCoeff . H;
ImageAdjust@Image[Rescale[Partition[vecReprsentation, ImageDimensions[aMImages[[1]]][[1]]]]]
1m7r3b5bx32ow

Recommendations

In this section we utilize the software monad SMRMon, [AAp3], to create a recommender for the random mandala images.

Remark: Instead of the Sparse Matrix Recommender (SMR) object the built-in function Nearest can be used.

Create SSparseMatrix object for all image-vectors:

matImages = ToSSparseMatrix[SparseArray[Values@aMImageVecs], "RowNames" -> Automatic, "ColumnNames" -> Automatic]
029x975bs3q7w

Normalize the rows of the image-vectors matrix:

AbsoluteTiming[
  matPixel = WeightTermsOfSSparseMatrix[matImages, "None", "None", "Cosine"] 
 ]
1k9xucwektmhh

Get the LSA topics matrix:

matH = (lsaObj⟹LSAMonNormalizeMatrixProduct[Normalized -> Right]⟹LSAMonTakeH)
05zsn0o1jyqj6

Find the image topics representation for each image-vector (assuming matH was computed with SVD or ICA):

AbsoluteTiming[
  matTopic = matPixel . Transpose[matH] 
 ]
028u1jz1hgzx9

Here we create a recommender based on the images data (pixels) and extracted image topics (or other image features):

smrObj = 
   SMRMonUnit[]⟹
    SMRMonCreate[<|"Pixel" -> matPixel, "Topic" -> matTopic|>]⟹
    SMRMonApplyNormalizationFunction["Cosine"]⟹
    SMRMonSetTagTypeWeights[<|"Pixel" -> 0.2, "Topic" -> 1|>];

Remark: Note the weights assigned to the pixels and the topics in the recommender object above. Those weights were derived by examining the recommendations results shown below.

Here is the image we want to find most similar mandala images to – the target image:

imgTarget = Binarize[imgStar2, 0.5]
1qdmarfxa5i78

Here is the profile of the target image:

aProf = MakeSMRProfile[lsaObj, imgTarget, ImageDimensions[aMImages[[1]]]];
TakeLargest[aProf, 6]

(*<|"10032-10009-4392" -> 0.298371, "3906-10506-10495" -> 0.240086, "10027-10014-4387" -> 0.156797, "8342-8339-6062" -> 0.133822, "3182-3179-11222" -> 0.131565, "8470-8451-5829" -> 0.128844|>*)

Using the target image profile here we compute the recommendation scores for all mandala images of the recommender:

aRecs = 
   smrObj⟹
    SMRMonRecommendByProfile[aProf, All]⟹
    SMRMonTakeValue;

Here is a plot of the similarity scores:

Row[{ResourceFunction["RecordsSummary"][Values[aRecs]], ListPlot[Values[aRecs], ImageSize -> Medium, PlotRange -> All, PlotTheme -> "Detailed", PlotLabel -> "Similarity scores"]}]
1kdiisj4jg4ut

Here are the closest (nearest neighbor) mandala images:

Multicolumn[Values[ImageAdjust@*ColorNegate /@ aMImages[[ToExpression /@ Take[Keys[aRecs], 48]]]], 12, Background -> Black]
096uazw8izidy

Here are the most distant mandala images:

Multicolumn[Values[ImageAdjust@*ColorNegate /@ aMImages[[ToExpression /@ Take[Keys[aRecs], -48]]]], 12, Background -> Black]
0zb7hf24twij4

Classifier creation and utilization

In this section we:

  • Prepare classifier data
  • Build and examine a classifier using the software monad ClCon, [AA2, AAp2], using appropriate training, testing, and validation data ratios
  • Build a classifier utilizing all training data
  • Generate Bethlehem Star mandalas by filtering mandala candidates with the classifier

As it was mentioned above we prepare the data to build classifiers with by:

  • Selecting top, highest scores recommendations and labeling them with True
  • Selecting bad, low score recommendations and labeling them with False
AbsoluteTiming[
  Block[{
    lsBest = Values@aMandalas[[ToExpression /@ Take[Keys[aRecs], 120]]], 
    lsWorse = Values@aMandalas[[ToExpression /@ Join[Take[Keys[aRecs], -200], RandomSample[Take[Keys[aRecs], {3000, -200}], 200]]]]}, 
   lsTrainingData = 
     Join[
      Map[MandalaToWhiterImage[#, ImageDimensions@aMImages[[1]]] -> True &, lsBest], 
      Map[MandalaToWhiterImage[#, ImageDimensions@aMImages[[1]]] -> False &, lsWorse] 
     ]; 
  ] 
 ]

(*{27.9127, Null}*)

Using ClCon train a classifier and show its performance measures:

clObj = 
   ClConUnit[lsTrainingData]⟹
    ClConSplitData[0.75, 0.2]⟹
    ClConMakeClassifier["NearestNeighbors"]⟹
    ClConClassifierMeasurements⟹
    ClConEchoValue⟹
    ClConClassifierMeasurements["ConfusionMatrixPlot"]⟹
    ClConEchoValue;
0jkfza6x72kb5
03uf3deiz0hsd

Remark: We can re-run the ClCon workflow above several times until we obtain a classifier we want to use.

Train a classifier with all prepared data:

clObj2 = 
   ClConUnit[lsTrainingData]⟹
    ClConSplitData[1, 0.2]⟹
    ClConMakeClassifier["NearestNeighbors"];

Get the classifier function from ClCon object:

cfBStar = clObj2⟹ClConTakeClassifier
0awjjib00ihgg

Here we generate Bethlehem Star mandalas using the classifier trained above:

SeedRandom[2020];
Multicolumn[MandalaToWhiterImage /@ BethlehemMandala[12, cfBStar, 0.87], 6, Background -> Black]
0r37g633mpq0y

Generate Bethlehem Star mandala images utilizing the classifier (with a specified classifier probabilities threshold):

SeedRandom[32];
KeyMap[MandalaToWhiterImage, BethlehemMandala[12, cfBStar, 0.87, "Probabilities" -> True]]
0osesxm4gdvvf

Show unfiltered Bethlehem Star mandala candidates:

SeedRandom[32];
KeyMap[MandalaToWhiterImage, BethlehemMandala[12, cfBStar, 0, "Probabilities" -> True]]
0rr12n6savl9z

Remark: Examine the probabilities in the image-probability associations above – they show that the classifier is “working.“

Here is another set generated Bethlehem Star mandalas using rotational symmetry order 4:

SeedRandom[777];
KeyMap[MandalaToWhiterImage, BethlehemMandala[12, cfBStar, 0.8, "RotationalSymmetryOrder" -> 4, "Probabilities" -> True]]
0rgzjquk4amz4

Remark: Note that although a higher rotational symmetry order is used the highly scored results still seem relevant – they have the features of the target Bethlehem Star images.

References

[AA1] Anton Antonov, “A monad for Latent Semantic Analysis workflows”, (2019), MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, “A monad for classification workflows”, (2018)), MathematicaForPrediction at WordPress.

[MSE1] “Plotting the Star of Bethlehem”, (2020),Mathematica Stack Exchange, question 236499,

[Wk1] Wikipedia entry, Star of Bethlehem.

Packages

[AAr1] Anton Antonov, RandomMandala, (2019), Wolfram Function Repository.

[AAr2] Anton Antonov, IdependentComponentAnalysis, (2019), Wolfram Function Repository.

[AAr3] Anton Antonov, “Simplified Machine Learning Workflows” book, (2019), GitHub/antononcube.

[AAp1] Anton Antonov, Monadic Latent Semantic Analysis Mathematica package, (2017), MathematicaForPrediction at GitHub/antononcube.

[AAp2] Anton Antonov, Monadic contextual classification Mathematica package, (2017), MathematicaForPrediction at GitHub/antononcube.

[AAp3] Anton Antonov, Monadic Sparse Matrix Recommender Mathematica package, (2018), MathematicaForPrediction at GitHub/antononcube.

Code definitions

urlPart = "https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MonadicProgramming/";
Get[urlPart <> "MonadicLatentSemanticAnalysis.m"];
Get[urlPart <> "MonadicSparseMatrixRecommender.m"];
Get[urlPart <> "/MonadicContextualClassification.m"];
Clear[MandalaToImage, MandalaToWhiterImage];
MandalaToImage[gr_Graphics, imgSize_ : {120, 120}] := ColorNegate@ImageResize[gr, imgSize];
MandalaToWhiterImage[gr_Graphics, imgSize_ : {120, 120}] := ColorNegate@ImageResize[gr /. GrayLevel[0.25`] -> Black, imgSize];
Clear[ImageToVector];
ImageToVector[img_Image] := Flatten[ImageData[ColorConvert[img, "Grayscale"]]];
ImageToVector[img_Image, imgSize_] := Flatten[ImageData[ColorConvert[ImageResize[img, imgSize], "Grayscale"]]];
ImageToVector[___] := $Failed;
Clear[MakeSMRProfile];
MakeSMRProfile[lsaObj_LSAMon, gr_Graphics, imgSize_] := MakeSMRProfile[lsaObj, {gr}, imgSize];
MakeSMRProfile[lsaObj_LSAMon, lsGrs : {_Graphics}, imgSize_] := MakeSMRProfile[lsaObj, MandalaToWhiterImage[#, imgSize] & /@ lsGrs, imgSize]
MakeSMRProfile[lsaObj_LSAMon, img_Image, imgSize_] := MakeSMRProfile[lsaObj, {img}, imgSize];
MakeSMRProfile[lsaObj_LSAMon, lsImgs : {_Image ..}, imgSize_] := 
   Block[{lsImgVecs, matTest, aProfPixel, aProfTopic}, 
    lsImgVecs = ImageToVector[#, imgSize] & /@ lsImgs; 
    matTest = ToSSparseMatrix[SparseArray[lsImgVecs], "RowNames" -> Automatic, "ColumnNames" -> Automatic]; 
    aProfPixel = ColumnSumsAssociation[lsaObj⟹LSAMonRepresentByTerms[matTest]⟹LSAMonTakeValue]; 
    aProfTopic = ColumnSumsAssociation[lsaObj⟹LSAMonRepresentByTopics[matTest]⟹LSAMonTakeValue]; 
    aProfPixel = Select[aProfPixel, # > 0 &]; 
    aProfTopic = Select[aProfTopic, # > 0 &]; 
    Join[aProfPixel, aProfTopic] 
   ];
MakeSMRProfile[___] := $Failed;
Clear[BethlehemMandalaCandiate];
BethlehemMandalaCandiate[opts : OptionsPattern[]] := ResourceFunction["RandomMandala"][opts, "RotationalSymmetryOrder" -> 2, "NumberOfSeedElements" -> Automatic, "ConnectingFunction" -> FilledCurve@*BezierCurve];
Clear[BethlehemMandala];
Options[BethlehemMandala] = Join[{ImageSize -> {120, 120}, "Probabilities" -> False}, Options[ResourceFunction["RandomMandala"]]];
BethlehemMandala[n_Integer, cf_ClassifierFunction, opts : OptionsPattern[]] := BethlehemMandala[n, cf, 0.87, opts];
BethlehemMandala[n_Integer, cf_ClassifierFunction, threshold_?NumericQ, opts : OptionsPattern[]] := 
   Block[{imgSize, probsQ, res, resNew, aResScores = <||>, aResScoresNew = <||>}, 
     
     imgSize = OptionValue[BethlehemMandala, ImageSize]; 
     probsQ = TrueQ[OptionValue[BethlehemMandala, "Probabilities"]]; 
     
     res = {}; 
     While[Length[res] < n, 
      resNew = Table[BethlehemMandalaCandiate[FilterRules[{opts}, Options[ResourceFunction["RandomMandala"]]]], 2*(n - Length[res])]; 
      aResScoresNew = Association[# -> cf[MandalaToImage[#, imgSize], "Probabilities"][True] & /@ resNew]; 
      aResScoresNew = Select[aResScoresNew, # >= threshold &]; 
      aResScores = Join[aResScores, aResScoresNew]; 
      res = Keys[aResScores] 
     ]; 
     
     aResScores = TakeLargest[ReverseSort[aResScores], UpTo[n]]; 
     If[probsQ, aResScores, Keys[aResScores]] 
    ] /; n > 0;
BethlehemMandala[___] := $Failed

Making Graphs over System Dynamics Models

Introduction

In this document we give usage examples for the functions of the package, “SystemDynamicsModelGraph.m”, [AAp1]. The package provides functions for making dependency graphs for the stocks in System Dynamics (SD) models. The primary motivation for creating the functions in this package is to have the ability to introspect, proofread, and verify the (typical) ODE models made in SD.

A more detailed explanation is:

  • For a given SD system S of Ordinary Differential Equations (ODEs) we make Mathematica graph objects that represent the interaction of variable dependent functions in S.
  • Those graph objects give alternative (and hopefully convenient) way of visualizing the model of S.

Load packages

The following commands load the packages [AAp1, AAp2, AAp3]:

Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/WL/SystemDynamicsModelGraph.m"]
Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/EpidemiologyModels.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/CallGraph.m"]

Usage examples

Equations

Here is a system of ODEs of a slightly modified SEIR model:

lsEqs = {Derivative[1][SP][t] == -((IP[t] SP[t] \[Beta][IP])/TP[t]) - SP[t] \[Mu][TP], Derivative[1][EP][t] == (IP[t] SP[t] \[Beta][IP])/TP[t] - EP[t] (1/aincp + \[Mu][TP]), Derivative[1][IP][t] == EP[t]/aincp - IP[t]/aip - IP[t] \[Mu][IP], Derivative[1][RP][t] == IP[t]/aip - RP[t] \[Mu][TP], TP[t] == Max[0, EP[t] + IP[t] + RP[t] + SP[t]]};
ResourceFunction["GridTableForm"][List /@ lsEqs, TableHeadings -> {"Equations"}]
01xbi9kqh0cfv

Model graph

Here is a graph of the dependencies between the populations:

ModelDependencyGraph[lsEqs, {EP, IP, RP, SP, TP}, t]
08d1a9tfgog31

When the second argument given to ModelDependencyGraph is Automatic the stocks in the equations are heuristically found with the function ModelHeuristicStocks:

ModelHeuristicStocks[lsEqs, t]

(*{EP, IP, RP, SP, TP}*)

Also, the function ModelDependencyGraph takes all options of Graph:

ModelDependencyGraph[lsEqs, Automatic, t, 
  GraphLayout -> "GravityEmbedding", VertexLabels -> "Name", VertexLabelStyle -> Directive[Red, Bold, 16], EdgeLabelStyle -> Directive[Blue, 16], ImageSize -> Large]
0nbr2tt4704fd

Dependencies only

The dependencies in the model can be found with the function ModelDependencyGraphEdges:

lsEdges = ModelDependencyGraphEdges[lsEqs, Automatic, t]
0oqkkrnakv89r
lsEdges[[4]] // FullForm
0x9s286b3noms

Focus stocks

Here is a graph for a set of “focus” stocks-sources to a set of “focus” stocks-destinations:

gr = ModelDependencyGraph[lsEqs, {IP, SP}, {EP}, t]
13di08vbzgsyi

Compare with the graph in which the argument positions of sources and destinations of the previous command are swapped:

ModelDependencyGraph[lsEqs, {EP}, {IP, SP}, t]
009we6s5tmxek

Additional interfacing

The functions of this package work with the models from the package “EpidemiologyModels.m”, [AAp2].

Here is a model from [AAp2]:

model = SEIRModel[t, "TotalPopulationRepresentation" -> "AlgebraicEquation"];
ModelGridTableForm[model]
0aojbcw5zogfw

Here we make the corresponding graph:

ModelDependencyGraph[model, t]
1v1sbwz9d6peq

Generating equations from graph specifications

A related, dual, or inverse task to the generation of graphs from systems of ODEs is the generation of system of ODEs from graphs.

Here is a model specifications through graph edges (using DirectedEdge):

0qp613dyiglzo

Here is the corresponding graph:

grModel = Graph[lsEdges, VertexLabels -> "Name", EdgeLabels -> "EdgeTag", ImageSize -> Large]
1vrnyvwpgmcz9

Here we generate the system of ODEs using the function ModelGraphEquations:

lsEqsGen = ModelGraphEquations[grModel, t];
ResourceFunction["GridTableForm"][List /@ lsEqsGen, TableHeadings -> {"Equations"}]
1dl7z5ohgof6h

Remark: ModelGraphEquations works with both graph and list of edges as a first argument.

Here we replace the symbolically represented rates with concrete values:

08ewn5gxhx8d5
1e4wq9d04yhro

Here we solve the system of ODEs:

sol = First@NDSolve[{lsEqsGen2, SP[0] == 99998, EP[0] == 0, IP[0] == 1, RP[0] == 0,MLP[0] == 0, TP[0] == 100000}, Union[First /@ lsEdges], {t, 0, 365}]
1p9civying0hn

Here we plot the results:

ListLinePlot[sol[[All, 2]], PlotLegends -> sol[[All, 1]]]
13xvmif6i6o2n

Call graph

The functionalities provided by the presented package “SystemDynamicsModelGraph.m”, [AAp1], resemble in spirit those of the package “CallGraph.m”, [AAp3].

Here is call graph for the functions in the package [AAp1] made with the function CallGraph from the package [AAp3]:

CallGraph`CallGraph[Context[ModelDependencyGraph], "PrivateContexts" -> False, "UsageTooltips" -> True]
0c1vpbf585pe9

References

Packages

[AAp1] Anton Antonov, “System Dynamics Model Graph Mathematica package”, (2020), SystemsModeling at GitHub/antononcube.

[AAp2] Anton Antonov, “Epidemiology models Mathematica package”, (2020), SystemsModeling at GitHub/antononcube.

[AAp3] Anton Antonov, “Call graph generation for context functions Mathematica package”, (2018), MathematicaForPrediction at GitHub/antononcube.

Articles

[AA1] Anton Antonov, “Call graph generation for context functions”, (2019), MathematicaForPrediction at WordPress.

How to simplify Machine learning workflows specifications? (useR! 2020)

Introduction

This blog post is with the slides of my lighting talk for the useR! 2020 Conference, St. Louis, USA.

Here is the video recording:

How to simplify Machine Learning workflows specifications?

useR! 2020 Conference, lightning talk

Anton Antonov
Senior Research Scientist
Accendo Data LLC
https://github.com/antononcube

What is this about?

Rapid specification of Machine Learning (ML) workflows using natural language commands.

The easiest things to automate with ML are ML workflows.

This presentation demonstrates that with natural language interfaces to ML algorithms.

Motivation

Assume that:

  1. We want to create conversation agents that help Data Science (DS) and ML practitioners to quickly create first, initial versions of different DS and ML workflows for different programming languages and related packages.
  2. We expect that the initial versions of programming code are tweaked further. (In order to produce desired outcomes in the application area of interest.)

The workflows considered

In this presentation we focus on these three ML areas:

  • Quantile Regression (QR)
  • Latent Semantic Analysis (LSA)
  • Recommendations

First example data

Assume we have a data frame with temperature data. Here is a summary:

summary(dfTemperatureData)
##       Time            Temperature   
##  Min.   :3.629e+09   Min.   : 4.72  
##  1st Qu.:3.661e+09   1st Qu.:19.67  
##  Median :3.692e+09   Median :23.33  
##  Mean   :3.692e+09   Mean   :22.28  
##  3rd Qu.:3.724e+09   3rd Qu.:26.11  
##  Max.   :3.755e+09   Max.   :31.17
ggplot2::ggplot(dfTemperatureData) + ggplot2::geom_point( ggplot2::aes( x = Time, y = Temperature ) )

Quantile Regression workflow: first example

Here is a Quantile Regression (QR) workflow specification:

qrmon2 <- 
  eval( expr = to_QRMon_R_command( 
    "create from dfTemperatureData;
     compute quantile regression with 12 knots and probabilities 0.25, 0.5, and 0.75;
     show date list plot with date origin 1900-01-01;", parse = TRUE) )

How it is done?

For a given ML domain (like QR or LSA) we create two types of Domain Specific Languages (DSL’s):

  1. a software monad (i.e. programming language pipeline package) and
  2. a DSL that is a subset of a spoken language.

These two DSL’s are combined: the latter translates natural language commands into the former.

By executing those translations we interpret commands of spoken DSL’s into ML computational results.

Note, that we assume that there is a separate system that converts speech into text.

Development cycle

Here is a clarification diagram:

MonadicMakingOfMLConversationalAgents
MonadicMakingOfMLConversationalAgents

Grammars and parsers

For each type of workflow is developed a specialized DSL translation Raku module.

Each Raku module:

  1. Has grammars for parsing a sequence of natural commands of a certain DSL
  2. Translates the parsing results into corresponding software monad code

Different programming languages and packages can be the targets of the DSL translation.

(At this point are implemented DSL-translators to Python, R, and Wolfram Language.)

Here is an example grammar.

Quantile Regression workflow: translation

Here is how a sequence of natural commands that specifies a QR workflow is translated into code for a QR software monad:

qrExpr <- 
  to_QRMon_R_command( 
    "create from dfTemperatureData;
     compute quantile regression with knots 12 and probabilities 0.05, 0.95;
     find outliers;", parse = FALSE )
qrExpr
## [1] "QRMonUnit( data = dfTemperatureData) %>%"                            
## [2] "QRMonQuantileRegression(df = 12, probabilities = c(0.05, 0.95)) %>%"
## [3] "QRMonOutliers() %>% QRMonOutliersPlot()"

Quantile Regression workflow: evaluation

Here we evaluate the generated QR monad code:

qrmon2 <- eval(expr = parse( text = paste(qrExpr)))
## Warning in QRMonSetData(res, data): The argument data is expected to be a data frame with columns: { Regressor, Value }.
## Warning in QRMonSetData(res, data): Proceeding by renaming the first columm "Time" as "Regressor" and renaming the second columm "Temperature" as "Value".

Latent Semantic Analysis workflow: translation

Here is how a sequence of natural commands that specifies a LSA workflow is translated into code for a LSA software monad:

lsaExpr <- 
  to_LSAMon_R_command( 
    "create from textHamlet;
     make document term matrix with automatic stop words and without stemming;
     apply lsi functions global weight function idf, local term weight function none, normalizer function cosine;
     extract 12 topics using method SVD, max steps 120, and min number of documents per term 2;
     show thesaurus table for ghost and grave;", parse = FALSE  )
lsaExpr
## [1] "LSAMonUnit(textHamlet) %>%"                                                                                                         
## [2] "LSAMonMakeDocumentTermMatrix( stopWords = NULL, stemWordsQ = FALSE) %>%"                                                            
## [3] "LSAMonApplyTermWeightFunctions(globalWeightFunction = \"IDF\", localWeightFunction = \"None\", normalizerFunction = \"Cosine\") %>%"
## [4] "LSAMonExtractTopics( numberOfTopics = 12, method = \"SVD\",  maxSteps = 120, minNumberOfDocumentsPerTerm = 2) %>%"                  
## [5] "LSAMonEchoStatisticalThesaurus( words = c(\"ghost\", \"grave\"))"

Latent Semantic Analysis workflow: evaluation

Here we execute the generated LSA monad code:

lsamon2 <- eval(expr = parse( text = paste(lsaExpr)))
## Warning in NonNegativeMatrixFactorization::NearestWords(lsaObj$H, word, : More that one column name corresponds to the search word; the first match is used.
##    SearchTerm Word.Distance Word.Index Word.Word
## 1       ghost  0.0000000000        623     ghost
## 2       ghost  2.7104582353         27     again
## 3       ghost  3.0323549513        513    father
## 4       ghost  3.1750339154       1465     stage
## 5       ghost  3.2285642571       1644     under
## 6       ghost  3.2393248725        319     cries
## 7       ghost  3.3826537734       1312         s
## 8       ghost  3.4975768169       1512     swear
## 9       ghost  3.5204077364        935       mar
## 10      ghost  3.5868377279       1550      thee
## 11      ghost  3.5869119178        744       hor
## 12      ghost  3.5901934408       1587       thy
## 13      grave  0.0000000000        648     grave
## 14      grave  0.0002598782       1597    tongue
## 15      grave  0.0002828355        151    better
## 16      grave  0.0002891626       1317      said
## 17      grave  0.0003122740        741    honour
## 18      grave  0.0003327156       1419     sleep
## 19      grave  0.0003395627        897      long
## 20      grave  0.0003459771         60       any
## 21      grave  0.0004090686       1251    reason
## 22      grave  0.0004264220         58    answer
## 23      grave  0.0004381933        643     grace
## 24      grave  0.0004560758        429      each

Recommender workflow: data

Consider the making of a recommender over the Titanic data:

dfTitanic[sample(1:nrow(dfTitanic), 12), ]
##           id passengerClass passengerAge passengerSex passengerSurvival
## 1225 id.1225            3rd           20         male              died
## 443   id.443            2nd           20         male              died
## 761   id.761            3rd           30         male          survived
## 835   id.835            3rd           30         male              died
## 706   id.706            3rd           -1         male              died
## 339   id.339            2nd           30         male              died
## 515   id.515            2nd            0         male          survived
## 10     id.10            1st           70         male              died
## 579   id.579            2nd           30         male              died
## 1248 id.1248            3rd           -1       female          survived
## 673   id.673            3rd           -1         male              died
## 1205 id.1205            3rd           20         male              died
summary(as.data.frame(unclass(dfTitanic), stringsAsFactors = TRUE))
##        id       passengerClass  passengerAge   passengerSex passengerSurvival
##  id.1   :   1   1st:323        Min.   :-1.00   female:466   died    :809     
##  id.10  :   1   2nd:277        1st Qu.:10.00   male  :843   survived:500     
##  id.100 :   1   3rd:709        Median :20.00                                 
##  id.1000:   1                  Mean   :23.55                                 
##  id.1001:   1                  3rd Qu.:40.00                                 
##  id.1002:   1                  Max.   :80.00                                 
##  (Other):1303

Recommender workflow

Here are recommender workflow specification and evaluation results:

smrmon2 <- 
  eval( expr = to_SMRMon_R_command( 
    "create from dfTitanic; 
     apply the LSI functions inverse document frequency, term frequency, and cosine;
     compute the top 6 recommendations for the profile female=1, 30=1; 
     extend recommendations with dfTitanic;
     show pipeline value", parse = TRUE ) )
##   Score Index      id passengerClass passengerAge passengerSex passengerSurvival
## 1     2     1    id.1            1st           30       female          survived
## 2     2    33 id.1027            3rd           30       female          survived
## 3     2    68 id.1059            3rd           30       female              died
## 4     2    72 id.1062            3rd           30       female          survived
## 5     2    99 id.1087            3rd           30       female              died
## 6     2   108 id.1095            3rd           30       female          survived

Handling misspellings

The approach taken in the design and implementation of the natural language commands interpreters can handle misspellings:

smrmon2 <- 
  eval( expr = to_SMRMon_R_command( 
    "create from dfTitanic; 
     aply the LSI functions inverse document frequency, term frequency, and cosine;
     compute the top 6 recomendations for the profle female=1, 30=1; 
     extend recommendations with dfTitanic;
     show pipeline value" ) )
## [1] "Possible misspelling of 'apply' as 'aply'."                     
## [2] "Possible misspelling of 'recommendations' as 'recomendations'."
## [3] "Possible misspelling of 'profile' as 'profle'."                
##   Score Index      id passengerClass passengerAge passengerSex passengerSurvival
## 1     2     1    id.1            1st           30       female          survived
## 2     2    33 id.1027            3rd           30       female          survived
## 3     2    68 id.1059            3rd           30       female              died
## 4     2    72 id.1062            3rd           30       female          survived
## 5     2    99 id.1087            3rd           30       female              died
## 6     2   108 id.1095            3rd           30       female          survived

Not just ML workflows

Obviously this approach can be used for any type of computational workflows.

Here is an example of an Epidemiology Modeling workflow:

ecmCommands <- 
'create with the model susceptible exposed infected two hospitalized recovered;
 assign 100000 to the susceptible population;
 set infected normally symptomatic population to be 0;
 set infected severely symptomatic population to be 1;
 assign 0.56 to contact rate of infected normally symptomatic population;
 assign 0.58 to contact rate of infected severely symptomatic population;
 assign 0.1 to contact rate of the hospitalized population;
 simulate for 240 days;
 plot populations results;'
to_ECMMon_R_command(ecmCommands, parse = TRUE)
## expression(
##     ECMMonUnit(model = SEI2HRModel()) %>% 
##     ECMMonAssignInitialConditions(initConds = c(SPt = 1e+05)) %>% 
##     ECMMonAssignInitialConditions(initConds = c(INSPt = 0)) %>% 
##     ECMMonAssignInitialConditions(initConds = c(ISSPt = 1)) %>% 
##     ECMMonAssignRateValues(rateValues = c(contactRateINSP = 0.56)) %>% 
##     ECMMonAssignRateValues(rateValues = c(contactRateISSP = 0.58)) %>% 
##     ECMMonAssignRateValues(rateValues = c(contactRateHP = 0.1)) %>% 
##     ECMMonSimulate(maxTime = 240) %>% ECMMonPlotSolutions(stocksSpec = ".*Population")
## )
ecmmon2 <- eval( to_ECMMon_R_command(ecmCommands) )

References

Repositories

[AAr1] Anton Antonov, R-packages, (2019), GitHub.

[AAr2] Anton Antonov, Conversational Agents, (2017), GitHub.

Packages

[AAp1] Anton Antonov, Quantle Regression Monad in R, (2019), GitHub.

[AAp2] Anton Antonov, Latent Semantic Analysis Monad in R, (2019), R-packages at GitHub.

[AAp3] Anton Antonov, Sparse Matrix Recommender Monad in R, (2019), R-packages at GitHub.

[AAp4] Anton Antonov, Epidemiology Compartmental Modeling Monad in R, (2020), GitHub.

Investigating COVID-19 with R: data analysis and simulations

Methodological presentation
R-Ladies Miami Meetup, May 28th 2020

The extended abstract of the presentation was loosely followed. Here is the presentation mind-map:

MainMindMap

(Note that mind-map’s PDF has hyperlinks. Also, see the folder Presentation-aids. )

The organizers and I did a poll for what people want to hear. After discussing the results of the 15 votes from that poll we decided the presentation to be a methodological one instead of a know-how one.

Approximately 30% of the presentation was based on the R-project “COVID-19-modeling-in-R”, [AA1].

Approximately 30% of the presentation was based on an R-programmed software monad for epidemiology compartmental models, ECMMon-R, [AAr2].

For the rest were used frameworks, simulations, and graphics made with Mathematica, [AAr1].

The presentation was given online (because of COVID-19) using Zoom. 90 people registered. Nearly 40 showed up (and maybe 20 stayed throughout.)

Here is a link to the video recording.

Screenshots

Here are screenshots of statistics used in the introduction:

References

Coronavirus

[Wk1] Wikipedia entry, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

[Wk2] Wikipedia entry, Coronavirus disease 2019.

Modeling

[Wk3] Wikipedia entry, Compartmental models in epidemiology.

[Wk4] Wikipedia entry, System dynamics.

R code/software

[KS1] Karline Soetaert, Thomas Petzoldt, R. Woodrow Setzer, “deSolve: Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’)”, CRAN.

[AA1] Anton Antonov, “COVID-19-modeling-in-R”, 2020, SystemModeling at GitHub.

[AAr1] Anton Antonov, Coronavirus-propagation-dynamics, 2020, SystemModeling at GitHub.

[AAr2] Anton Antonov, Epidemiology Compartmental Modeling Monad in R, 2020, ECMMon-R at GitHub.

Apple mobility trends data visualization (for COVID-19)

Introduction

I this notebook/document we ingest and visualize the mobility trends data provided by Apple, [APPL1].

We take the following steps:

  1. Download the data

     

  2. Import the data and summarise it

  3. Transform the data into long form

  4. Partition the data into subsets that correspond to combinations of geographical regions and transportation types

  5. Make contingency matrices and corresponding heat-map plots

  6. Make nearest neighbors graphs over the contingency matrices and plot communities

  7. Plot the corresponding time series

Data description

From Apple’s page https://www.apple.com/covid19/mobility

About This Data The CSV file and charts on this site show a relative volume of directions requests per country/region or city compared to a baseline volume on January 13th, 2020. We define our day as midnight-to-midnight, Pacific time. Cities represent usage in greater metropolitan areas and are stably defined during this period. In many countries/regions and cities, relative volume has increased since January 13th, consistent with normal, seasonal usage of Apple Maps. Day of week effects are important to normalize as you use this data. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of your movements and searches. Apple Maps has no demographic information about our users, so we can’t make any statements about the representativeness of our usage against the overall population.

Observations

The observations listed in this subsection are also placed under the relevant statistics in the following sections and indicated with “Observation”.

  • The directions request volumes reference date for normalization is 2020-01-13 : all the values in that column are 100.

     

  • From the community clusters of the nearest neighbor graphs (derived from the time series of the normalized driving directions requests volume) we see that countries and cities are clustered in expected ways. For example, in the community graph plot corresponding to “{city, driving}” the cities Oslo, Copenhagen, Helsinki, Stockholm, and Zurich are placed in the same cluster. In the graphs corresponding to “{city, transit}” and “{city, walking}” the Japanese cities Tokyo, Osaka, Nagoya, and Fukuoka are clustered together.

  • In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.

Load packages

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/DataReshape.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/HeatmapPlot.m"]

Data ingestion

Apple mobile data was provided in this WWW page: https://www.apple.com/covid19/mobility , [APPL1]. (The data has to be download from that web page – there is an “agreement to terms”, etc.)

dsAppleMobility = ResourceFunction["ImportCSVToDataset"]["~/Downloads/applemobilitytrends-2021-01-15.csv"]
1po4mftcckaca

Observation: The directions requests volumes reference date for normalization is 2020-01-13 : all the values in that column are 100.

Data dimensions:

Dimensions[dsAppleMobility]

(*{4691, 375}*)

Data summary:

Magnify[ResourceFunction["RecordsSummary"][dsAppleMobility], 0.6]

Number of unique “country/region” values:

Length[Union[Normal[dsAppleMobility[Select[#["geo_type"] == "country/region" &], "region"]]]]

(*63*)

Number of unique “city” values:

Length[Union[Normal[dsAppleMobility[Select[#["geo_type"] == "city" &], "region"]]]]

(*295*)

All unique geo types:

lsGeoTypes = Union[Normal[dsAppleMobility[All, "geo_type"]]]

(*{"city", "country/region", "county", "sub-region"}*)

All unique transportation types:

lsTransportationTypes = Union[Normal[dsAppleMobility[All, "transportation_type"]]]

(*{"driving", "transit", "walking"}*)

Data transformation

It is better to have the data in long form (narrow form). For that I am using the package “DataReshape.m”, [AAp1].

(*lsIDColumnNames={"geo_type","region","transportation_type"};*) (*For the initial dataset of Apple's mobility data.*)
  lsIDColumnNames = {"geo_type", "region", "transportation_type", "alternative_name", "sub-region", "country"}; 
   dsAppleMobilityLongForm = ToLongForm[dsAppleMobility, lsIDColumnNames, Complement[Keys[dsAppleMobility[[1]]], lsIDColumnNames]]; 
   Dimensions[dsAppleMobilityLongForm]

(*{1730979, 8}*)

Remove the rows with “empty” values:

dsAppleMobilityLongForm = dsAppleMobilityLongForm[Select[#Value != "" &]];
Dimensions[dsAppleMobilityLongForm]

(*{1709416, 8}*)

Rename the column “Variable” to “Date” and add a related “DateObject” column:

AbsoluteTiming[
  dsAppleMobilityLongForm = dsAppleMobilityLongForm[All, Join[KeyDrop[#, "Variable"], <|"Date" -> #Variable, "DateObject" -> DateObject[#Variable]|>] &]; 
 ]

(*{714.062, Null}*)

Add “day name” (“day of the week”) field:

AbsoluteTiming[
  dsAppleMobilityLongForm = dsAppleMobilityLongForm[All, Join[#, <|"DayName" -> DateString[#DateObject, {"DayName"}]|>] &]; 
 ]

(*{498.026, Null}*)

Here is sample of the transformed data:

SeedRandom[3232];
RandomSample[dsAppleMobilityLongForm, 12]

Here is summary:

ResourceFunction["RecordsSummary"][dsAppleMobilityLongForm]

Partition the data into geo types × transportation types:

aQueries = Association@Flatten@Outer[Function[{gt, tt}, {gt, tt} -> dsAppleMobilityLongForm[Select[#["geo_type"] == gt && #["transportation_type"] == tt &]]], lsGeoTypes, lsTransportationTypes];
aQueries = Select[aQueries, Length[#] > 0 &];
Keys[aQueries]

(*{{"city", "driving"}, {"city", "transit"}, {"city", "walking"}, {"country/region", "driving"}, {"country/region", "transit"}, {"country/region", "walking"}, {"county", "driving"}, {"county", "transit"}, {"county", "walking"}, {"sub-region", "driving"}, {"sub-region", "transit"}, {"sub-region", "walking"}}*)

Basic data analysis

We consider relative volume o directions requests for the last date only. (The queries can easily adjusted for other dates.)

lastDate = Last@Sort@Normal@dsAppleMobilityLongForm[All, "Date"]

(*"2021-01-15"*)
aDayQueries = Association@Flatten@Outer[Function[{gt, tt}, {gt, tt} -> dsAppleMobilityLongForm[Select[#["geo_type"] == gt && #Date == lastDate && #["transportation_type"] == tt &]]], lsGeoTypes, lsTransportationTypes];
Dimensions /@ aDayQueries

(*<|{"city", "driving"} -> {299, 10}, {"city", "transit"} -> {197, 10}, {"city", "walking"} -> {294, 10}, {"country/region", "driving"} -> {63, 10}, {"country/region", "transit"} -> {27, 10}, {"country/region", "walking"} -> {63, 10}, {"county", "driving"} -> {2090, 10}, {"county", "transit"} -> {152, 10}, {"county", "walking"} -> {396, 10}, {"sub-region", "driving"} -> {557, 10}, {"sub-region", "transit"} -> {175, 10}, {"sub-region", "walking"} -> {339, 10}|>*)

Here we plot histograms and Pareto principle adherence:

opts = {PlotRange -> All, ImageSize -> Medium};
Grid[
    Function[{columnName}, 
      {Histogram[#, 12, PlotLabel -> columnName, opts], ResourceFunction["ParetoPrinciplePlot"][#, PlotLabel -> columnName, opts]} &@Normal[#[All, "Value"]] 
     ] /@ {"Value"}, 
    Dividers -> All, FrameStyle -> GrayLevel[0.7]] & /@ aDayQueries
1mdtonh8hp7bw

 

Heat-map plots

We can visualize the data using heat-map plots. Here we use the package “HeatmapPlot.m”, [AAp2].

Remark: Using the contingency matrices prepared for the heat-map plots we can do further analysis, like, finding correlations or nearest neighbors. (See below.)

Cross-tabulate dates with regions:

aMatDateRegion = ResourceFunction["CrossTabulate"][#[All, {"Date", "region", "Value"}], "Sparse" -> True] & /@ aQueries;

Make a heat-map plot by sorting the columns of the cross-tabulation matrix (that correspond to countries):

aHeatMapPlots = Association@KeyValueMap[#1 -> Rasterize[HeatmapPlot[#2, PlotLabel -> #1, DistanceFunction -> {None, EuclideanDistance}, AspectRatio -> 1/1.6, ImageSize -> 1600]] &, aMatDateRegion]

(We use Rasterize in order to reduce the size of the notebook.)

Here we take closer look to one of the plots:

aHeatMapPlots[{"country/region", "driving"}]

Nearest neighbors graphs

Graphs overview

Here we create nearest neighbor graphs of the contingency matrices computed above and plot cluster the nodes:

Manipulate[
  Multicolumn[Normal@Map[CommunityGraphPlot@Graph@EdgeList@NearestNeighborGraph[Normal[Transpose[#SparseMatrix]], nns, ImageSize -> Medium] &, aMatDateRegion], 2, Dividers -> All], 
  {{nns, 5, "Number of nearest neighbors:"}, 2, 30, 1, Appearance -> "Open"}, SaveDefinitions -> True]

Closer look into the graphs

Here we endow each nearest neighbors graph with appropriate vertex labels:

aNNGraphs = Map[(gr = NearestNeighborGraph[Normal[Transpose[#SparseMatrix]], 4, GraphLayout -> "SpringEmbedding", VertexLabels -> Thread[Rule[Normal[Transpose[#SparseMatrix]], #ColumnNames]]];Graph[EdgeList[gr], VertexLabels -> Thread[Rule[Normal[Transpose[#SparseMatrix]], #ColumnNames]]]) &, aMatDateRegion];

Here we plot the graphs with clusters:

ResourceFunction["GridTableForm"][List @@@ Normal[CommunityGraphPlot[#, ImageSize -> 800] & /@ aNNGraphs], TableHeadings -> {"region & transportation type", "communities of nearest neighbors graph"}, Background -> White, Dividers -> All]

Observation: From the community clusters of the nearest neighbor graphs (derived from the time series of the normalized driving directions requests volume) we see that countries and cities are clustered in expected ways. For example in the community graph plot corresponding to “{city, driving}” the cities Oslo, Copenhagen, Helsinki, Stockholm, and Zurich are placed in the same cluster. In the graphs corresponding to “{city, transit}” and “{city, walking}” the Japanese cities Tokyo, Osaka, Nagoya, and Fukuoka are clustered together.

Time series analysis

Time series

In this section for each date we sum all cases over the region-transportation pairs, make a time series, and plot them.

Remark: In the plots the Sundays are indicated with orange dashed lines.

Here we make the time series:

aTSDirReqByCountry = 
  Map[
   Function[{dfQuery}, 
    TimeSeries@(List @@@ Normal[GroupBy[Normal[dfQuery], #DateObject &, Total[#Value & /@ #] &]]) 
   ], 
   aQueries 
  ]

Here we plot them:

opts = {PlotTheme -> "Detailed", PlotRange -> All, AspectRatio -> 1/4,ImageSize -> Large};
Association@KeyValueMap[
   Function[{transpType, ts}, 
    transpType -> 
     DateListPlot[ts, GridLines -> {AbsoluteTime /@ Union[Normal[dsAppleMobilityLongForm[Select[#DayName == "Sunday" &], "DateObject"]]], Automatic}, GridLinesStyle -> {Directive[Orange, Dashed], Directive[Gray, Dotted]}, PlotLabel -> Capitalize[transpType], opts] 
   ], 
   aTSDirReqByCountry 
  ]

Observation: In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.

“Forecast”

He we do “forecast” for code-workflow demonstration purposes – the forecasts should not be taken seriously.

Fit a time series model to the time series:

aTSModels = TimeSeriesModelFit /@ aTSDirReqByCountry
1v02kqhrfj7pk
1kp9msj22dd19

Plot data and forecast:

Map[DateListPlot[{#["TemporalData"], TimeSeriesForecast[#, {10}]}, opts, PlotLegends -> {"Data", "Forecast"}] &, aTSModels]
0axzczhqlntju

References

[APPL1] Apple Inc., Mobility Trends Reports, (2020), apple.com.

[AA1] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub.

[AAp1] Anton Antonov, Data reshaping Mathematica package, (2018), MathematicaForPrediciton at GitHub.

[AAp2] Anton Antonov, Heatmap plot Mathematica package, (2018), MathematicaForPrediciton at GitHub.

Coronavirus propagation modeling, useR! Boston April 2020

Tutorial presentation

The extended abstract of the presentation was loosely followed. Here is the (main) presentation mind-map:

MainMindMap

(Note that mind-map’s PDF has hyperlinks. Also, see the folder Presentation-aids. )

Approximately 70% of the presentation was based on an R-programmed software monad for epidemiology compartmental models, ECMMon-R, [AAr2]. For the rest were used frameworks, simulations, and graphics made with Mathematica, [AAr1], and Wolfram System Modeler .

The presentation was given online (because of COVID-19) using Zoom. 190 people registered. Nearly 70 showed up (and maybe 60 stayed throughout.)

Here is a link to the video recording.

A similar presentation was given two weeks prior for OMLDS.

References

Coronavirus

[Wk1] Wikipedia entry, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

[Wk2] Wikipedia entry, Coronavirus disease 2019.

Modeling

[Wk3] Wikipedia entry, Compartmental models in epidemiology.

[Wk4] Wikipedia entry, System dynamics.

[JD1] Jim Duggan, System Dynamics Modeling with R, 2016, Springer.

R code/software

[KS1] Karline Soetaert, Thomas Petzoldt, R. Woodrow Setzer, “deSolve: Solvers for Initial Value Problems of Differential Equations (‘ODE’, ‘DAE’, ‘DDE’)”, CRAN.

[JD2] Jim Duggan, “SDMR”, 2016, GitHub.
(Resources for text book “System Dynamics Modeling with R”.)

[AA1] Anton Antonov, “COVID-19-modeling-in-R”, 2020, SystemModeling at GitHub.

[AAr1] Anton Antonov, Coronavirus-propagation-dynamics, 2020, SystemModeling at GitHub.

[AAr2] Anton Antonov, Epidemiology Compartmental Modeling Monad in R, 2020, ECMMon-R at GitHub.

SEI2HR-Econ model with quarantine and supplies scenarios

Introduction

The epidemiology compartmental model, [Wk1], presented in this notebook – SEI2HR-Econ – deals with all three rectangles in this diagram:

ImageResize[Import["https://github.com/antononcube/SystemModeling/raw/master/Projects/Coronavirus-propagation-dynamics/Diagrams/Coronavirus-propagation-simple-dynamics.jpeg"], 900]
1nfurrmjzvnjk

“SEI2HR” stands for “Susceptible, Exposed, Infected two, Hospitalized, Recovered” (populations.) “Econ” stands for “Economic”.

In this notebook we also deal with both quarantine scenarios and medical supplies scenarios. In the notebook [AA4] we deal with quarantine scenarios over a simpler model, SEI2HR.

Remark: We consider the contagious disease propagation models as instances of the more general System Dynamics (SD) models. We use SD terminology in this notebook.

The models

SEI2R

The model SEI2R is introduced and explained in the notebook [AA2]. SEI2R differs from the classical SEIR model, [Wk1, HH1], with the following elements:

  1. Two separate infected populations: one is “severely symptomatic”, the other is “normally symptomatic”
  2. The monetary equivalent of lost productivity due to infected or died people is tracked

SEI2HR

For the formulation of SEI2HR we use a system of Differential Algebraic Equations (DAE’s). The package [AAp1] allows the use of a formulation that has just Ordinary Differential Equations (ODE’s).

Here are the unique features of SEI2HR:

  • People stocks
    • There are two types of infected populations: normally symptomatic and severely symptomatic.
    • There is a hospitalized population.
    • There is a deceased from infection population.
  • Hospital beds
    • Hospital beds are a limited resource that determines the number of hospitalized people.
    • Only severely symptomatic people are hospitalized according to the available hospital beds.
    • The hospital beds stock is not assumed constant, it has its own change rate.
  • Money stocks
    • The money from lost productivity is tracked.
    • The money for hospital services is tracked.

SEI2HR-Econ

SEI2HR-Econ adds the following features to SEI2HR:

  • Medical supplies
    • Medical supplies production is part of the model.
    • Medical supplies delivery is part of the model..
    • Medical supplies accumulation at hospitals is taken into account.
    • Medical supplies demand tracking.
  • Hospitalization
    • Severely symptomatic people are hospitalized according to two limited resources: hospital beds and medical supplies.
  • Money stocks
    • Money for medical supplies production is tracked.

SEI2HR-Econ’s place a development plan

This graph shows the “big picture” of the model development plan undertaken in [AAr1] and SEI2HR (discussed in this notebook) is in that graph:

0led86ishkred

Notebook structure

The rest of notebook has the following sequence of sections:

  • Package load section
  • SEI2HR-Econ structure in comparison of SEI2HR
  • Explanations of the equations of SEI2HR-Econ
  • Quarantine scenario modeling preparation
  • Medical supplies production and delivery scenario modeling preparation
  • Parameters and initial conditions setup
    • Populations, hospital beds, quarantine scenarios, medical supplies scenarios
  • Simulation solutions
  • Interactive interface
  • Sensitivity analysis

Load packages

The epidemiological models framework used in this notebook is implemented with the packages [AAp1-AAp4, AA3]; many of the plot functions are from the package [AAp5].

Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/EpidemiologyModels.m"];
Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/EpidemiologyModelModifications.m"];
Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/EpidemiologyModelingVisualizationFunctions.m"];
Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/Projects/Coronavirus-propagation-dynamics/WL/EpidemiologyModelingSimulationFunctions.m"];
Import["https://raw.githubusercontent.com/antononcube/SystemModeling/master/WL/SystemDynamicsInteractiveInterfacesFunctions.m"];

SEI2HR-Econ extends SEI2HR

The model SEI2HR-Econ is an extension of the model SEI2HR, [AA4].

Here is SEI2HR:

reprTP = "AlgebraicEquation";
lsModelOpts = {"Tooltips" -> True, 
   TooltipStyle -> {Background -> Yellow, CellFrameColor -> Gray, 
     FontSize -> 20}};
modelReference = 
  SEI2HRModel[t, "InitialConditions" -> True, "RateRules" -> True, 
   "TotalPopulationRepresentation" -> reprTP];
ModelGridTableForm[modelReference, lsModelOpts]
123yrlk2zl1zu

Here is SEI2HR-Econ:

modelSEI2HREcon = 
  SEI2HREconModel[t, "InitialConditions" -> True, "RateRules" -> True,
    "TotalPopulationRepresentation" -> reprTP];
ModelGridTableForm[modelSEI2HREcon, lsModelOpts]
10vmc7pwzubaa

Here are the “differences” between the two models:

ModelGridTableForm@
 Merge[{modelSEI2HREcon, modelReference}, 
  If[AssociationQ[#[[1]]], KeyComplement[#], Complement @@ #] &]
036glal3iv3uy

Equations explanations

In this section we provide rationale for the equations of SEI2HR-Econ.

The equations for Susceptible, Exposed, Infected, Recovered populations of SEI2R are “standard” and explanations about them are found in [WK1, HH1]. For SEI2HR those equations change because of the stocks Hospitalized Population and Hospital Beds. For SEI2HR-Econ the SEI2HR equations change because of the stocks Medical Supplies, Medical Supplies Demand, and Hospital Medical Supplies.

The equations time unit is one day. The time horizon is one year. Since we target COVID-19, [Wk2, AA1], we do not consider births.

Remark: For convenient reading the equations in this section have tooltips for the involved stocks and rates.

Verbalization description of the model

We start with one infected (normally symptomatic) person, the rest of the people are susceptible. The infected people meet other people directly or get in contact with them indirectly. (Say, susceptible people touch things touched by infected.) For each susceptible person there is a probability to get the decease. The decease has an incubation period: before becoming infected the susceptible are (merely) exposed. The infected recover after a certain average infection period or die. A certain fraction of the infected become severely symptomatic. The severely symptomatic infected are hospitalized if there are enough hospital beds and enough medical supplies. The hospitalized severely infected have different death rate than the non-hospitalized ones. The number of hospital beds might change: hospitals are extended, new hospitals are build, or there are not enough medical personnel or supplies.

The different types of populations (infected, hospitalized, recovered, etc.) have their own consumption rates of medical supplies. The medical supplies are produced with a certain rate (units per day) and delivered after a certain delay period. The hospitals have their own storage for medical supplies. Medical supplies are delivered to the hospitals only, non-hospitalized people go to the medical supplies producer to buy supplies. The hospitals have precedence for the medical supplies: if the medical supplies are not enough for everyone, the hospital needs are covered first (as much as possible.)

The medical supplies producer has a certain storage capacity (for supplies.) The medical supplies delivery vehicles have a certain – generally speaking, smaller – capacity. The hospitals have a certain capacity to store medical supplies. It is assumed that both producer and hospitals have initial stocks of medical supplies. (Following a certain normal, general preparedness protocol.)

The combined demand from all populations for medical supplies is tracked (accumulated.) The deaths from infection are tracked (accumulated.) Money for medical supplies production, money for hospital services, and money from lost productivity are tracked (accumulated.)

The equations below give mathematical interpretation of the model description above.

Code for the equations

Each equation in this section are derived with code like this:

ModelGridTableForm[modelSEI2HREcon, lsModelOpts]["Equations"][[1, 
 EquationPosition[modelSEI2HREcon, RP] + 1, 2]]

and then the output cell is edited to be “DisplayFormula” and have CellLabel value corresponding to the stock of interest.

The infected and hospitalized populations

SEI2HR has two types of infected populations: a normally symptomatic one and a severely symptomatic one. A major assumption for SEI2HR is that only the severely symptomatic people are hospitalized. (That assumption is also reflected in the diagram in the introduction.)

Each of those three populations have their own contact rates and mortality rates.

Here are the contact rates from the SEI2HR-Econ dictionary

ColumnForm@
 Cases[Normal@modelSEI2HREcon["Rates"], 
  HoldPattern[\[Beta][_] -> _], \[Infinity]]
1nlv3w1o6bvs1

Here are the mortality rates from the SEI2HR-Econ dictionary

ColumnForm@
 Cases[Normal@modelSEI2HREcon["Rates"], 
  HoldPattern[\[Mu][_] -> _], \[Infinity]]
1ile9wky2yqy5

Remark: Below with “Infected Population” we mean both stocks Infected Normally Symptomatic Population (INSP) and Infected Severely Symptomatic Population (ISSP).

Total Population

In this notebook we consider a DAE’s formulation of SEI2HR-Econ. The stock Total Population has the following (obvious) algebraic equation:

0z7878rbvh3ut
0z7878rbvh3ut

Note that with Max we specified that the total population cannot be less than 0.

Remark: As mentioned in the introduction, the package [AAp1] allows for the use of non-algebraic formulation, without an equation for TP.

Susceptible Population

The stock Susceptible Population (SP) is decreased by (1) infections derived from stocks Infected Populations and Hospitalized Population (HP), and (2) morality cases derived with the typical mortality rate.

1gercwr5szizt
1gercwr5szizt

Because we hospitalize the severely infected people only instead of the term

09xmdu8xq7e0p
09xmdu8xq7e0p

we have the terms

0dkc6td55qv7p
0dkc6td55qv7p

The first term is for the infections derived from the hospitalized population. The second term for the infections derived from people who are infected severely symptomatic and not hospitalized.

Births term

Note that we do not consider in this notebook births, but the births term can be included in SP’s equation:

Block[{m = SEI2HREconModel[t, "BirthsTerm" -> True]},
 ModelGridTableForm[m]["Equations"][[1, EquationPosition[m, SP] + 1, 
  2]]
 ]
16thktzpo0dk9
16thktzpo0dk9

The births rate is the same as the death rate, but it can be programmatically changed. (See [AAp2].)

Exposed Population

The stock Exposed Population (EP) is increased by (1) infections derived from the stocks Infected Populations and Hospitalized Population, and (2) mortality cases derived with the typical mortality rate. EP is decreased by (1) the people who after a certain average incubation period (aincp) become ill, and (2) mortality cases derived with the typical mortality rate.

083x1htqp0kkx
083x1htqp0kkx

Infected Normally Symptomatic Population

INSP is increased by a fraction of the people who have been exposed. That fraction is derived with the parameter severely symptomatic population fraction (sspf). INSP is decreased by (1) the people who recover after a certain average infection period (aip), and (2) the normally symptomatic people who die from the disease.

01cfzpu1jv3o4
01cfzpu1jv3o4

Infected Severely Symptomatic Population

ISSP is increased by a fraction of the people who have been exposed. That fraction is corresponds to the parameter severely symptomatic population fraction (sspf). ISSP is decreased by (1) the people who recover after a certain average infection period (aip), (2) the hospitalized severely symptomatic people who die from the disease, and (3) the non-hospitalized severely symptomatic people who die from the disease.

0zhs2d6lt6gz8
0zhs2d6lt6gz8

Note that we do not assume that severely symptomatic people recover faster if they are hospitalized, only that they have a different death rate.

Hospitalized Population

The amount of people that can be hospitalized is determined by the available Hospital Beds (HB) – the stock Hospitalized Population (HP) is subject to a resource limitation by the stock HB.

The equation of the stock HP can be easily understood from the following dynamics description points:

  • If the number of hospitalized people is less that the number of hospital beds we hospitalize the new ISSP people.
  • The Available Hospital Beds (AHB) are determined by the minimum of (i) the non-occupied hospital beds, and (ii) the hospital medical supplies divided by the ISSP consumption rate.
  • If the new ISSP people are more than AHB the hospital takes as many as AHB.
  • Hospitalized people have the same average infection period (aip).
  • Hospitalized (severely symptomatic) people have their own mortality rate.

Here is the HP equation:

1c35mhvi92ffe
1c35mhvi92ffe

Note that although we know that in a given day some hospital beds are going to be freed they are not considered in the hospitalization plans for that day. Similarly, we know that new medical supplies are coming but we do not include them into AHB.

Recovered Population

The stock Recovered Population (RP) is increased by the recovered infected people and decreased by mortality cases derived with the typical mortality rate.

04y15eqwki71f
04y15eqwki71f

Deceased Infected Population

The stock Deceased Infected Population (DIP) accumulates the deaths of the people who are infected. Note that we utilize the different death rates for HP and ISSP.

1ikrv4a51h9xo
1ikrv4a51h9xo

Hospital Beds

The stock Hospital Beds (HB) can change with a rate that reflects the number of hospital beds change rate (nhbcr) per day. Generally speaking, using nhbcr we can capture scenarios, like, extending hospitals, building new hospitals, recruitment of new medical personnel, loss of medical personnel (due to infections.)

1is45mamhsayd
1is45mamhsayd

Hospital Medical Supplies

The Hospital Medical Supplies (HMS) are decreased according to the medical supplies consumption rate (mscr) of HP and increased by a Medical Supplies (MS) delivery term (to be described next.)

The MS delivery term is build with the following assumptions / postulates:

  • Every day the hospital attempts to order MS that correspond to HB multiplied by mscr.
  • The hospital has limited capacity of MS storage, \kappa [\text{HMS}].
  • The MS producer has limited capacity for delivery, \kappa [\text{MDS}].
  • The hospital demand for MS has precedence over the demands for the non-hospitalized populations.
  • Hence, if the MS producer has less stock of MS than the demand of the hospital then MS producer’s whole amount of MS goes to the hospital.
  • The supplies are delivered with some delay period: the medical supplies delivery period (msdp).

Here is the MS delivery term:

1uxby6ipa49dw
1uxby6ipa49dw

Here is the corresponding HMS equation:

0ri4ohdddjxle
0ri4ohdddjxle

Medical Supplies

The equation of the Medical Supplies (MS) stock is based on the following assumptions / postulates:

  • The non-hospitalized people go to the MS producer to buy supplies. (I.e. MS delivery is to the hospital only.)
  • The MS producer vehicles have certain capacity, \kappa [\text{MSD}].
  • The MS producer has a certain storage capacity (for MS stock.)
  • Each of the populations INSP, ISSP, and HP has its own specific medical supplies consumption rate (mscr). EP, RP, and TP have the same mscr.
  • The hospital has precedence in its MS order. I.e. the demand from the hospital is satisfied first, and then the demand of the rest of the populations.

Here is the MS delivery term described in the previous section:

1dy38wd13b0h6
1dy38wd13b0h6

Here is the MS formula with the MS delivery term replaced with “Dlvr”:

ModelGridTableForm[modelSEI2HREcon, "Tooltips" -> False][
   "Equations"][[1, EquationPosition[modelSEI2HREcon, MS] + 1, 2]] /. 
 dlvr -> Dlvr
1r0u6ie10kbi4
1r0u6ie10kbi4

We can see from that equation that MS is increased by medical supplies production rate (mspr) with measuring dimension number of units per day. The production is restricted by the storage capacity, \kappa [\text{MS}]:

(*Min[mspr[HB], -MS[t] + \[Kappa][MS]]*)

MS is decreased by the MS delivery term and the demand from the non-hospitalized populations. Because the hospital has precedence, we use this term form in the equation:

(*Min[-Dlvr + MS[t], "non-hospital demand"]*)

Here is the full MS equation:

0xc7vujwb2npw
0xc7vujwb2npw

Medical Supplies Demand

The stock Medical Supplies Demand (MSD) simply accumulates the MS demand derived from population stocks and their corresponding mscr:

0ut8smszrobx2
0ut8smszrobx2

Money for Hospital Services

The stock Money for Hospital Services (MHS) simply tracks expenses for hospitalized people. The parameter hospital services cost rate (hscr) with unit money per bed per day simply multiplies HP.

0evzoim1tjwsj
0evzoim1tjwsj

Money from Lost Productivity

The stock Money from Lost Productivity (MLP) simply tracks the work non-availability of the infected and died from infection people. The parameter lost productivity cost rate (lpcr) with unit money per person per day multiplies the total count of the infected and dead from infection.

1fhewr6izy46n
1fhewr6izy46n

Quarantine scenarios

In order to model quarantine scenarios we use piecewise constant functions for the contact rates \beta [\text{ISSP}] and \beta [\text{INSP}].

Remark: Other functions can be used, like, functions derived through some statistical fitting.

Here is an example plot :

Block[{func = \[Beta]*
    Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1]},
 Legended[
  Block[{\[Beta] = 0.56, qsd = 60, ql = 8*7, qcrf = 0.25},
   ListLinePlot[Table[func, {t, 0, 365}], PlotStyle -> "Detailed"]
   ], func]]
1nf41pq3u0gvg

To model quarantine with a piecewise constant function we use the following parameters:

  • \text{qsd} for quarantine’s start
  • \text{ql} for quarantines duration
  • \text{qcrf} for the effect on the quarantine on the contact rate

Medical supplies scenarios

We consider three main scenarios for the medical supplies:

  1. Constant production rate and consistent delivery (constant delivery period)
  2. Constant production rate and disrupted delivery
  3. Disrupted production and disrupted delivery

The disruptions are simulated with simple pulse functions – we want to see how the system being modeled reacts to singular, elementary disruption.

Here is an example plot of a disruption of delivery period plot :

Block[{func = 
   dbp*Piecewise[{{1, t < dds}, {dsf, dds <= t <= dds + ddl}}, 1]},
 Legended[
  Block[{dbp = 1, dds = 70, ddl = 7, dsf = 1.8},
   ListLinePlot[Table[func, {t, 0, 365}], PlotStyle -> "Detailed"]
   ], func]]
093okrtfqf4bj
093okrtfqf4bj

To model disruption of delivery with a piecewise constant function we use the following parameters:

  • \text{dbp} for the delivery base period
  • \text{dds} for delivery disruption start
  • \text{ddl} for delivery disruption duration
  • \text{dsf} for how much slower or faster the delivery period becomes

Parameters and actual simulation equations code

Here are the parameters we want to experiment with (or do calibration with):

lsFocusParams = {aincp, aip, sspf[SP], \[Beta][HP], qsd, ql, qcrf, 
   nhbcr[ISSP, INSP], nhbr[TP], mspr[HB]};

Here we set custom rates and initial conditions:

population = 10^6;
modelSEI2HREcon =
  SetRateRules[
   modelSEI2HREcon,
   <|
    TP[0] -> population,
    \[Beta][ISSP] -> 
     0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],
    \[Beta][INSP] -> 
     0.5*Piecewise[{{1, t < qsd}, {qcrf, qsd <= t <= qsd + ql}}, 1],
    qsd -> 60,
    ql -> 8*7,
    qcrf -> 0.25,
    \[Beta][HP] -> 0.01,
    \[Mu][ISSP] -> 0.035/aip,
    \[Mu][INSP] -> 0.01/aip,
    nhbr[TP] -> 3/1000,
    lpcr[ISSP, INSP] -> 1,
    hscr[ISSP, INSP] -> 1,
    msdp[HB] -> 
     dbp*Piecewise[{{1, t < dds}, {dsf, dds <= t <= dds + ddl}}, 1],
    dbp -> 1,
    dds -> 70,
    ddl -> 7,
    dsf -> 2
    |>
   ];

Remark: Note the piecewise functions for \beta [\text{ISSP}], \beta [\text{INSP}], and \text{msdp}[\text{HB}].

Here is the system of ODE’s we use to do parametrized simulations:

lsActualEquations = 
  ModelNDSolveEquations[modelSEI2HREcon, 
   KeyDrop[modelSEI2HREcon["RateRules"], lsFocusParams]];
ResourceFunction["GridTableForm"][List /@ lsActualEquations]
1gymf7m70fgyh
1gymf7m70fgyh
lsActualEquations = 
  ModelNDSolveEquations[modelSEI2HREcon, modelSEI2HREcon["RateRules"]];
ResourceFunction["GridTableForm"][List /@ lsActualEquations]
1avljj6kz4oeo
1avljj6kz4oeo

Simulation

Instead of using ParametricNDSolve as in [AA4] in this notebook we use ModelNDSolve and SetRateRules from the package [AAp4].

Different quarantine starts

Here we compute simulation solutions for a set of quarantine starts:

AbsoluteTiming[
 aVarSolutions =
   Association@
    Map[
     Function[{qsdVar},
      qsdVar -> 
       Association[
        ModelNDSolve[
          SetRateRules[
           modelSEI2HREcon, <|ql -> 56, qsd -> qsdVar|>], {t, 365}][[
         1]]]
      ],
     Range[40, 120, 20]
     ];
 ]

(*{0.366168, Null}*)

Here we plot the results for ISSP only:

SeedRandom[2532]
aVals = #[ISSP][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, #1], #1, {If[#1 <= 70, 
      RandomInteger[{120, 200}], RandomInteger[{80, 110}]], Above}] &,
   aVals], PlotLegends -> 
  SwatchLegend[Keys[aVals], LegendLabel -> "Quarantine start"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> ISSP[t] /. modelSEI2HREcon["Stocks"]]
0idl7dzxetaox

Remark: We use the code in this section to do the computations in the section “Sensitivity Analysis”.

Interactive interface

Using the interface in this section we can interactively see the effects of changing parameters. (This interface is programmed without using parametrized NDSolve solutions in order to be have code that corresponds to the interface implementations in [AAr2].)

opts = {PlotRange -> All, PlotLegends -> None, 
   PlotTheme -> "Detailed", PerformanceGoal -> "Speed", 
   ImageSize -> 400};
lsPopulationKeys = {TP, SP, EP, ISSP, INSP, HP, RP, DIP, HB};
lsSuppliesKeys = {MS, MSD, HMS};
lsMoneyKeys = {MHS, MLP, MMSP};
Manipulate[
 DynamicModule[{modelLocal = modelSEI2HREcon, 
   aStocks = modelSEI2HREcon["Stocks"], aSolLocal = aParSol, 
   lsPopulationPlots, lsMoneyPlots, lsSuppliesPlots},
  
  modelLocal = 
   SetRateRules[
    modelLocal, <|aincp -> aincpM, aip -> aipM, 
     sspf[SP] -> sspfM, \[Beta][HP] -> crhpM, qsd -> qsdM, ql -> qlM, 
     qcrf -> qcrfM, nhbr[TP] -> nhbrM/1000, 
     nhbcr[ISSP, ISNP] -> nhbcrM, mspr[HB] -> msprM, 
     msdp[HB] -> msdpM|>];
  aSolLocal = Association[ModelNDSolve[modelLocal, {t, ndays}][[1]]];
  
  lsPopulationPlots =
   Quiet@ParametricSolutionsPlots[
     aStocks,
     KeyTake[aSolLocal, 
      Intersection[lsPopulationKeys, displayPopulationStocks]],
     None, ndays,
     "LogPlot" -> popLogPlotQ, "Together" -> popTogetherQ, 
     "Derivatives" -> popDerivativesQ, 
     "DerivativePrefix" -> "\[CapitalDelta]", opts, 
     Epilog -> {Gray, Dashed, 
       Line[{{qsdM, 0}, {qsdM, 1.5*population}}], 
       Line[{{qsdM + qlM, 0}, {qsdM + qlM, 1.5*population}}]}];
  
  lsSuppliesPlots =
   If[Length[
      KeyDrop[aSolLocal, Join[lsPopulationKeys, lsMoneyKeys]]] == 
     0, {},
    (*ELSE*)
    Quiet@ParametricSolutionsPlots[
      aStocks,
      KeyTake[KeyDrop[aSolLocal, Join[lsPopulationKeys, lsMoneyKeys]],
        displaySupplyStocks],
      None, ndays,
      "LogPlot" -> supplLogPlotQ, "Together" -> supplTogetherQ, 
      "Derivatives" -> supplDerivativesQ, 
      "DerivativePrefix" -> "\[CapitalDelta]", opts]
    ];
  
  lsMoneyPlots =
   Quiet@ParametricSolutionsPlots[
     aStocks,
     KeyTake[aSolLocal, Intersection[lsMoneyKeys, displayMoneyStocks]],
     None, ndays,
     "LogPlot" -> moneyLogPlotQ, "Together" -> moneyTogetherQ, 
     "Derivatives" -> moneyDerivativesQ, 
     "DerivativePrefix" -> "\[CapitalDelta]", opts];
  
  Multicolumn[Join[lsPopulationPlots, lsSuppliesPlots, lsMoneyPlots], 
   nPlotColumns, Dividers -> All, FrameStyle -> GrayLevel[0.8]],
  SaveDefinitions -> True
  ],
 {{ndays, 365, "Number of days"}, 1, 365, 1, Appearance -> {"Open"}},
 Delimiter,
 {{aincpM, 6., "Average incubation period (days)"}, 1, 60., 1, 
  Appearance -> {"Open"}},
 {{aipM, 21., "Average infectious period (days)"}, 1, 60., 1, 
  Appearance -> {"Open"}},
 {{sspfM, 0.2, "Severely symptomatic population fraction"}, 0, 1, 
  0.025, Appearance -> {"Open"}},
 {{crhpM, 0.1, "Contact rate of the hospitalized population"}, 0, 30, 
  0.1, Appearance -> {"Open"}},
 Delimiter,
 {{qsdM, 55, "Quarantine start days"}, 0, 365, 1, 
  Appearance -> {"Open"}},
 {{qlM, 8*7, "Quarantine length (in days)"}, 0, 120, 1, 
  Appearance -> {"Open"}},
 {{qcrfM, 0.25, "Quarantine contact rate fraction"}, 0, 1, 0.01, 
  Appearance -> {"Open"}},
 Delimiter,
 {{nhbrM, 2.9, "Number of hospital beds rate (per 1000 people)"}, 0, 
  100, 0.1, Appearance -> {"Open"}},
 {{nhbcrM, 0, "Number of hospital beds change rate"}, -0.5, 0.5, 
  0.001, Appearance -> {"Open"}},
 {{msprM, 200, "Medical supplies production rate"}, 0, 50000, 10, 
  Appearance -> {"Open"}},
 {{msdpM, 1.2, "Medical supplies delivery period"}, 0, 10, 0.1, 
  Appearance -> {"Open"}},
 Delimiter,
 {{displayPopulationStocks, lsPopulationKeys, 
   "Population stocks to display:"}, lsPopulationKeys, 
  ControlType -> TogglerBar},
 {{popTogetherQ, True, "Plot populations together"}, {False, True}},
 {{popDerivativesQ, False, "Plot populations derivatives"}, {False, 
   True}},
 {{popLogPlotQ, False, "LogPlot populations"}, {False, True}},
 Delimiter,
 {{displaySupplyStocks, lsSuppliesKeys, 
   "Supplies stocks to display:"}, lsSuppliesKeys, 
  ControlType -> TogglerBar},
 {{supplTogetherQ, True, "Plot supplies functions together"}, {False, 
   True}},
 {{supplDerivativesQ, False, 
   "Plot supplies functions derivatives"}, {False, True}},
 {{supplLogPlotQ, True, "LogPlot supplies functions"}, {False, 
   True}},
 Delimiter,
 {{displayMoneyStocks, lsMoneyKeys, "Money stocks to display:"}, 
  lsMoneyKeys, ControlType -> TogglerBar},
 {{moneyTogetherQ, True, "Plot money functions together"}, {False, 
   True}},
 {{moneyDerivativesQ, False, 
   "Plot money functions derivatives"}, {False, True}},
 {{moneyLogPlotQ, True, "LogPlot money functions"}, {False, True}},
 {{nPlotColumns, 1, "Number of plot columns"}, Range[5]},
 ControlPlacement -> Left, ContinuousAction -> False]
1nbd62e0zn26p

Sensitivity analysis

When making and using this kind of dynamics models it is important to see how the solutions react to changes of different parameters. For example, we should try to find answers to questions like “What ranges of which parameters bring dramatic changes into important stocks?”

Sensitivity Analysis (SA) is used to determine how sensitive is a SD model to changes of the parameters and to changes of model’s equations, [BC1]. More specifically, parameter sensitivity, which we apply below, allows us to see the changes of stocks dynamic behaviour for different sequences (and combinations) of parameter values.

Remark: This section to mirrors to a point the section with same name in [AA4], except in this notebook we are more interested in medical supplies related SA because quarantine related SA is done in [AA4].

Remark: SA shown below should be done for other stocks and rates. In order to keep this exposition short we focus on ISSP, DIP, and HP. Also, it is interesting to think in terms of “3D parameter sensitivity plots.” We also do such plots.

Evaluations by Area under the curve

For certain stocks we might be not just interested in their evolution in time but also in their cumulative values. I.e. we are interested in the so called Area Under the Curve (AUC) metric for those stocks.

There are three ways to calculate AUC for stocks of interest:

  1. Add aggregation equations in the system of ODE’s. (Similar to the stock DIP in SEI2HR.)
    • For example, in order to compute AUC for ISSP we can add to SEI2HR the equation:
(*aucISSP'[t] = ISSP[t]*)
- More details for such equation addition are given in [AA2].
  1. Apply NIntegrate over stocks solution functions.
  2. Apply Trapezoidal rule to stock solution function values over a certain time grid.

Below we use 1 and 3.

Variation of medical supplies delivery period

Here we calculate the solutions for a certain combination of capacities and rates:

AbsoluteTiming[
 aVarSolutions =
   Association@
    Map[
     Function[{msdpVar},
      model2 = SEI2HREconModel[t];
      model2 = 
       SetRateRules[
        model2, <|\[Kappa][MS] -> 10000, \[Kappa][HMS] -> 100, 
         mspr[HB] -> 100, msdp[HB] -> msdpVar|>];
      msdpVar -> Association[ModelNDSolve[model2, {t, 365}][[1]]]
      ],
     Union[Join[Range[0.2, 1, 0.2], Range[1, 3, 0.5]]]
     ];
 ]

(*{0.231634, Null}*)

As expected more frequent delivery results in fuller utilization of the non-occupied hospital beds:

SeedRandom[23532]
focusStock = HP;
aVals = #[focusStock][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, #1], #1, {If[#1 < 1, RandomInteger[{120, 150}], 
      RandomInteger[{160, 260}]], Above}] &, aVals], 
 PlotLegends -> 
  SwatchLegend[Keys[aVals], 
   LegendLabel -> "Medical supplies\ndelivery period"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
0xacdbpxbv6yo

Here are the corresponding AUC values:

aAUCs = TrapezoidalRule[Transpose[{Range[0, 365], #}]] & /@ aVals;
ResourceFunction["GridTableForm"][aAUCs]
1a3d3me9wh6bh
BarChart[aAUCs, ChartLabels -> Keys[aAUCs], ColorFunction -> "Pastel",
  PlotLabel -> 
  Row[{focusStock[t] /. modelSEI2HREcon["Stocks"], Spacer[5], "AUC"}]]
1ncranm3ufw0h

Variation of medical supplies production rate

In order to demonstrate the effect of medical supplies production rate (mspr) it is beneficial to eliminate the hospital beds availability restriction – we assume that we have enough hospital beds for all infected severely symptomatic people.

Here we calculate the solutions for a certain combination of capacities and rates:

AbsoluteTiming[
 aVarSolutions =
   Association@
    Map[
     Function[{msprVar},
      model2 = SEI2HREconModel[t];
      model2 = 
       SetRateRules[
        model2, <|\[Kappa][MS] -> 100000, \[Kappa][HMS] -> 
          10000, \[Kappa][MSD] -> 1000, mspr[HB] -> msprVar, 
         msdp[HB] -> 1.5, mscr[ISSP] -> 0.2, mscr[TP] -> 0.001, 
         mscr[ISSP] -> 1, nhbr[TP] -> 100/1000|>];
      msprVar -> Association[ModelNDSolve[model2, {t, 365}][[1]]]
      ],
     {20, 60, 100, 200, 300, 1000, 10000}
     ];
 ]

(*{0.156794, Null}*)

Hospitalized Population

As expected we can see that with smaller production rates we get less hospitalized people:

SeedRandom[1232]
focusStock = HP;
aVals = #[focusStock][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, #1], #1, {RandomInteger[{180, 240}], Above}] &, 
  aVals], PlotLegends -> 
  SwatchLegend[Keys[aVals], 
   LegendLabel -> "Medical supplies\nproduction rate"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
0vxim0nbznmmi

Here are the corresponding AUC values:

aAUCs = TrapezoidalRule[Transpose[{Range[0, 365], #}]] & /@ aVals;
ResourceFunction["GridTableForm"][aAUCs]
07nfkyx6s9mjo
BarChart[aAUCs, ChartLabels -> Keys[aAUCs], ColorFunction -> "Pastel",
  PlotLabel -> 
  Row[{focusStock[t] /. modelSEI2HREcon["Stocks"], Spacer[5], "AUC"}]]
08xdwgkj6ljeh

Medical Supplies

Here we plot the availability of MS at MS producer’s storage:

SeedRandom[821]
focusStock = MS;
aVals = #[MS][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, #1], #1, {RandomInteger[{100, 160}], Above}] &, 
  aVals], PlotLegends -> 
  SwatchLegend[Keys[aVals], 
   LegendLabel -> "Medical supplies\nproduction rate"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
05il7vfu2msga

Here are the corresponding AUC values:

aAUCs = TrapezoidalRule[Transpose[{Range[0, 365], #}]] & /@ aVals;
ResourceFunction["GridTableForm"][aAUCs]
03yccn85qjh1a
BarChart[aAUCs, 
 ChartLabels -> Map[Rotate[ToString[#], \[Pi]/6] &, Keys[aAUCs]], 
 ColorFunction -> "Pastel", 
 PlotLabel -> 
  Row[{focusStock[t] /. modelSEI2HREcon["Stocks"], Spacer[5], "AUC"}],
  ImageSize -> Medium]
0z5dthm5iuod6

Variation of delivery disruption starts

Here we compute simulation solutions for a set of delivery disruption starts using disruption length of 7 days and disruption “slowing down” factor 2:

AbsoluteTiming[
 aVarSolutions =
   Association@
    Map[
     Function[{ddsVar},
      ddsVar -> 
       Association[
        ModelNDSolve[
          SetRateRules[
           modelSEI2HREcon, <|\[Kappa][MS] -> 100000, \[Kappa][HMS] ->
              1000, mspr[HB] -> 100, ql -> 56, qsd -> 60, 
            nhbr[TP] -> 3/1000, dbp -> 1, dds -> ddsVar, ddl -> 7, 
            dsf -> 2|>], {t, 365}][[1]]]
      ],
     Append[Range[40, 120, 20], 365]
     ];
 ]

(*{0.45243, Null}*)

Note, that disruption start at day 365 means no disruption. Also, we use three hospital beds per thousand people.

Here we plot the results for HP only:

SeedRandom[009]
focusStock = HP;
aVals = #[focusStock][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, #1], #1, {RandomInteger[{60, 140}], Bottom}] &, 
  aVals], PlotLegends -> 
  SwatchLegend[Keys[aVals], 
   LegendLabel -> "Medical supplies\ndisruption start"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
1sdi97njh54gj
1sdi97njh54gj

Here are the corresponding AUC values:

aAUCs = TrapezoidalRule[Transpose[{Range[0, 365], #}]] & /@ aVals;
ResourceFunction["GridTableForm"][aAUCs]
0s1zcqcmvxbrs
0s1zcqcmvxbrs
BarChart[aAUCs, 
 ChartLabels -> Map[Rotate[ToString[#], \[Pi]/6] &, Keys[aAUCs]], 
 ColorFunction -> "Pastel", 
 PlotLabel -> 
  Row[{focusStock[t] /. modelSEI2HREcon["Stocks"], Spacer[5], "AUC"}],
  ImageSize -> Medium]
0vq0berxbjf9v

Combined variability of delivery start and disruption

Here we calculate the solutions for a set of combinations of delivery periods and delivery disruption starts:

AbsoluteTiming[
 aVarSolutions =
   Association@
    Map[
     Function[{par},
      model2 = modelSEI2HREcon;
      model2 = 
       SetRateRules[
        model2, <|\[Kappa][MS] -> 100000, \[Kappa][HMS] -> 10000, 
         mspr[HB] -> 1000, dbp -> par[[1]], dds -> par[[2]], ddl -> 7,
          dsf -> 4, nhbr[TP] -> 3/1000|>];
      par -> Association[ModelNDSolve[model2, {t, 365}][[1]]]
      ],
     Flatten[Outer[List, {0.5, 1, 1.5}, {60, 100, 365}], 1]
     ];
 ]

(*{0.759922, Null}*)

As expected more frequent, less disrupted delivery brings fuller utilization of the non-occupied hospital beds:

SeedRandom[3233]
focusStock = HP;
aVals = #[focusStock][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, ToString[#1]], 
    ToString[#1], {RandomInteger[{60, 160}], Left}] &, aVals], 
 PlotLegends -> 
  SwatchLegend[ToString /@ Keys[aVals], 
   LegendLabel -> 
    "Medical supplies\ndelivery period & disruption start"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
1qqm9hcj40qmx
1qqm9hcj40qmx

Here are the corresponding AUC values:

aAUCs = TrapezoidalRule[Transpose[{Range[0, 365], #}]] & /@ aVals;
ResourceFunction["GridTableForm"][aAUCs]
19kpmr551tt1c
19kpmr551tt1c
BarChart[aAUCs, 
 ChartLabels -> Map[Rotate[ToString[#], \[Pi]/6] &, Keys[aAUCs]], 
 ColorFunction -> "Pastel", 
 PlotLabel -> 
  Row[{focusStock[t] /. modelSEI2HREcon["Stocks"], Spacer[5], "AUC"}],
  ImageSize -> Medium]
1ia6ik6cr7kg5
1ia6ik6cr7kg5
SeedRandom[3233]
focusStock = DIP;
aVals = #[focusStock][Range[0, 365]] & /@ aVarSolutions;
ListLinePlot[
 KeyValueMap[
  Callout[Tooltip[#2, ToString[#1]], 
    ToString[#1], {RandomInteger[{60, 160}], Left}] &, aVals], 
 PlotLegends -> 
  SwatchLegend[ToString /@ Keys[aVals], 
   LegendLabel -> 
    "Medical supplies\ndelivery period & disruption start"], 
 PlotRange -> All, ImageSize -> Large, 
 PlotLabel -> focusStock[t] /. modelSEI2HREcon["Stocks"]]
0xwdkry34kcfq
0xwdkry34kcfq
ResourceFunction["GridTableForm"][Last /@ aVals]
1u293ap7k53o5
1u293ap7k53o5
BarChart[Last /@ aVals, 
 ChartLabels -> Map[Rotate[ToString[#], \[Pi]/6] &, Keys[aAUCs]], 
 ColorFunction -> "Pastel", 
 PlotLabel -> "Deceased Population at day 365", ImageSize -> Medium]
0m5va3qv461nx
0m5va3qv461nx

References

Articles

[Wk1] Wikipedia entry, “Compartmental models in epidemiology”.

[Wl2] Wikipedia entry, “Coronavirus disease 2019”.

[HH1] Herbert W. Hethcote (2000). “The Mathematics of Infectious Diseases”. SIAM Review. 42 (4): 599–653. Bibcode:2000SIAMR..42..599H. doi:10.1137/s0036144500371907.

[BC1] Lucia Breierova, Mark Choudhari, An Introduction to Sensitivity Analysis, (1996), Massachusetts Institute of Technology.

[AA1] Anton Antonov, “Coronavirus propagation modeling considerations”, (2020), SystemModeling at GitHub.

[AA2] Anton Antonov, “Basic experiments workflow for simple epidemiological models”, (2020), SystemModeling at GitHub.

[AA3] Anton Antonov, “Scaling of Epidemiology Models with Multi-site Compartments”, (2020), SystemModeling at GitHub.

[AA4] Anton Antonov, “SEI2HR model with quarantine scenarios”, (2020), SystemModeling at GitHub.

Repositories, packages

[WRI1] Wolfram Research, Inc., “Epidemic Data for Novel Coronavirus COVID-19”, WolframCloud.

[AAr1] Anton Antonov, Coronavirus propagation dynamics project, (2020), SystemModeling at GitHub.

[AAr2] Anton Antonov, “Epidemiology Compartmental Modeling Monad in R”, (2020), ECMon-R at GitHub.

[AAp1] Anton Antonov, “Epidemiology models Mathematica package”, (2020), SystemModeling at GitHub.

[AAp2] Anton Antonov, “Epidemiology models modifications Mathematica package”, (2020), SystemModeling at GitHub.

[AAp3] Anton Antonov, “Epidemiology modeling visualization functions Mathematica package”, (2020), SystemModeling at GitHub.

[AAp4] Anton Antonov, “Epidemiology modeling simulation functions Mathematica package”, (2020), SystemModeling at GitHub.

[AAp5] Anton Antonov, “System dynamics interactive interfaces functions Mathematica package”, (2020), SystemsModeling at GitHub.