| Title: | SOCSIM |
|---|---|
| Description: | Tools for preparing inputs, running SOCSIM (SOCial SIMulator) demographic kinship microsimulations, and reading simulation outputs from R. The package includes helpers for creating simulation folders, downloading demographic rate schedules, starting simulations, and loading population and marriage result files. |
| Authors: | Tom Theile [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-0573-9093>), Diego Alburez-Gutierrez [aut] (ORCID: <https://orcid.org/0000-0002-9823-5179>), Mallika Snyder [aut], Liliana P. Calderón-Bernal [aut] |
| Maintainer: | Tom Theile <[email protected]> |
| License: | GPL-3 |
| Version: | 1.9.18 |
| Built: | 2026-06-06 07:17:10 UTC |
| Source: | https://github.com/mpidr/rsocsim |
Create a two-level directory structure. If the first-level argument is NULL, we look for and, if needed, create the directory 'socsim' in the current temporary directory. If the second-level argument is NULL, we create a directory named like 'socsim_sim_' followed by a random component in the first-level directory.
create_simulation_folder(basedir = NULL, simdir = NULL)create_simulation_folder(basedir = NULL, simdir = NULL)
basedir |
A string. Optional. First-level directory where the
simulation-specific directory will be created. Defaults to
|
simdir |
A string. Optional. Simulation-specific directory which will
be created within 'basedir'. Defaults to 'socsim_sim_' plus a random
component created with |
A string. The full path to the simulation-specific directory.
The simulation is only a simple one. The file will be saved into the directory 'simdir'.
create_sup_file(simdir, simname = "socsim")create_sup_file(simdir, simname = "socsim")
simdir |
A string. The directory where the .sup file will be saved. |
simname |
A string. The base name of the simulation. Defaults to
|
A string. The basename of the created supervisory file, for example
"socsim.sup". The file is written to simdir, and the function also
copies the bundled rate and initial-population input files into simdir.
Given a population file ('opop') generated by rsocsim, the function estimates age-specific fertility rates.
estimate_fertility_rates( opop, final_sim_year, year_min, year_max, year_group = 5, age_min_fert = 15, age_max_fert = 50, age_group = 5 )estimate_fertility_rates( opop, final_sim_year, year_min, year_max, year_group = 5, age_min_fert = 15, age_max_fert = 50, age_group = 5 )
opop |
An R object from SOCSIM microsimulation output (population file). |
final_sim_year |
numeric. Final simulated year in 'real world' time ( used to convert 'SOCSIM time' to 'real world' time.) |
year_min |
numeric. Lower-bound year for which rate should be estimated. |
year_max |
numeric. Upper-bound year for which rate should be estimated. |
year_group |
numeric. Size of year groups to estimate rate (year_group=1 will produce single-year estimates) |
age_min_fert |
numeric. Lower-bound age of female reproductive period |
age_max_fert |
numeric. Upper-bound age of female reproductive period |
age_group |
numeric. Size of age groups to estimate rate (age_group=1 will produce single-age estimates) |
The final_sim_year can be obtained from the .sup file and must
refer to to a real-world year.
Grouped year and age ranges (i.e., if year_group > 1 or age_group > 1)
are created as [year;year+year_group).
A data frame with columns year, age, and socsim. year is a
factor describing the grouped calendar-year interval, age is a factor
describing the maternal-age interval, and socsim is the estimated
fertility rate for that cell.
opop <- data.frame( pid = 1:6, fem = c(1, 0, 1, 0, 1, 1), group = 1, nev = 0, dob = c(120, 120, 336, 348, 180, 360), mom = c(0, 0, 1, 1, 0, 5), pop = c(0, 0, 2, 2, 0, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = 0, mstat = 0, dod = c(0, 300, 0, 0, 0, 0), fmult = 0 ) asfr <- estimate_fertility_rates(opop = opop, final_sim_year = 2021, year_min = 1998, year_max = 2000, year_group = 5, age_min_fert = 15, age_max_fert = 50, age_group = 5) head(asfr)opop <- data.frame( pid = 1:6, fem = c(1, 0, 1, 0, 1, 1), group = 1, nev = 0, dob = c(120, 120, 336, 348, 180, 360), mom = c(0, 0, 1, 1, 0, 5), pop = c(0, 0, 2, 2, 0, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = 0, mstat = 0, dod = c(0, 300, 0, 0, 0, 0), fmult = 0 ) asfr <- estimate_fertility_rates(opop = opop, final_sim_year = 2021, year_min = 1998, year_max = 2000, year_group = 5, age_min_fert = 15, age_max_fert = 50, age_group = 5) head(asfr)
Given a population file ('opop') generated by rsocsim, the function estimates (yearly) age-specific mortality rates.
estimate_mortality_rates( opop, final_sim_year, year_min, year_max, year_group, age_max_mort, age_group )estimate_mortality_rates( opop, final_sim_year, year_min, year_max, year_group, age_max_mort, age_group )
opop |
An R object from SOCSIM microsimulation output (population file). |
final_sim_year |
numeric. Final simulated year in 'real world' time ( used to convert 'SOCSIM time' to 'real world' time.) |
year_min |
numeric. Lower-bound year for which rate should be estimated. |
year_max |
numeric. Upper-bound year for which rate should be estimated. |
year_group |
numeric. Size of year groups to estimate rate (year_group=1 will produce single-year estimates) |
age_max_mort |
numeric. Maximum age for estimating mortality. |
age_group |
numeric. Size of age groups to estimate rate (age_group=1 will produce single-age estimates) |
The final_sim_year can be obtained from the .sup file and must
refer to to a real-world year.
Grouped year and age ranges (i.e., if year_group > 1 or age_group > 1)
are created as [year;year+year_group). For age_group > 1, mortality rates
are split into an infant group [0,1) and then grouped ages [1, age_group),
[age_group, age_group + age_group), and so on.
A data frame with columns year, sex, age, and socsim.
year is a factor describing the grouped calendar-year interval,
sex is "male" or "female", age is a factor describing the age
interval, and socsim is the estimated mortality rate for that cell.
opop <- data.frame( pid = 1:6, fem = c(1, 0, 1, 0, 1, 1), group = 1, nev = 0, dob = c(120, 120, 336, 348, 180, 360), mom = c(0, 0, 1, 1, 0, 5), pop = c(0, 0, 2, 2, 0, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = 0, mstat = 0, dod = c(0, 300, 0, 0, 0, 0), fmult = 0 ) asmr <- estimate_mortality_rates(opop = opop, final_sim_year = 2021, year_min = 1995, year_max = 2000, year_group = 5, age_max_mort = 100, age_group = 5) head(asmr)opop <- data.frame( pid = 1:6, fem = c(1, 0, 1, 0, 1, 1), group = 1, nev = 0, dob = c(120, 120, 336, 348, 180, 360), mom = c(0, 0, 1, 1, 0, 5), pop = c(0, 0, 2, 2, 0, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = 0, mstat = 0, dod = c(0, 300, 0, 0, 0, 0), fmult = 0 ) asmr <- estimate_mortality_rates(opop = opop, final_sim_year = 2021, year_min = 1995, year_max = 2000, year_group = 5, age_max_mort = 100, age_group = 5) head(asmr)
When fn contains multiple file paths, or when seed contains multiple
values and fn is NULL, the matching result files are read and row-bound
into a single data frame. To keep identifiers unique across simulations,
positive ID columns are offset by (index - 1) * id_offset, while sentinel
zeros remain unchanged.
read_omar( folder = NULL, supfile = "socsim.sup", seed = 42, suffix = "", fn = NULL, id_offset = 10000000L, quiet = FALSE )read_omar( folder = NULL, supfile = "socsim.sup", seed = 42, suffix = "", fn = NULL, id_offset = 10000000L, quiet = FALSE )
folder |
simulation base folder ("~/socsim/simulation_235/") |
supfile |
name of supervisory-file ("socsim.sup") |
seed |
random number seed (42) |
suffix |
optional suffix for the results-directory (default="") |
fn |
complete path to the file. If not provided, it will be created from the other arguments |
id_offset |
positive integer stride used to offset IDs when combining multiple files. Ignored for single-file reads. Default is 10 million, which allows combining up to 214 files with a total population of 10 million each. |
quiet |
logical. If |
| 1 | mid | Marriage id number (unique sequential integer) |
| 2 | wpid | Wife’s person id |
| 3 | hpid | Husband’s person id |
| 4 | dstart | Date marriage began |
| 5 | dend | Date marriage ended or zero if still in force at end of simulation |
| 6 | rend | Reason marriage ended 2 = divorce; 3 = death of one partner |
| 7 | wprior | Marriage id of wife’s next most recent prior marriage |
| 8 | hprior | Marriage id of husband’s next most recent prior marriage |
you can either provide the complete path to the file or the folder, supfilename, seed and suffix with which you
started the simulation
A data frame with columns mid, wpid, hpid, dstart, dend,
rend, wprior, and hprior, matching the SOCSIM result.omar file.
If the file is missing or empty, a zero-row data frame with these columns
is returned.
When fn contains multiple file paths, or when seed contains multiple
values and fn is NULL, the matching result files are read and row-bound
into a single data frame. To keep identifiers unique across simulations,
positive ID columns are offset by (index - 1) * id_offset, while sentinel
zeros remain unchanged.
read_opop( folder = NULL, supfile = "socsim.sup", seed = 42, suffix = "", fn = NULL, id_offset = 10000000L, quiet = FALSE )read_opop( folder = NULL, supfile = "socsim.sup", seed = 42, suffix = "", fn = NULL, id_offset = 10000000L, quiet = FALSE )
folder |
simulation base folder ("~/socsim/simulation_235/") |
supfile |
name of supervisory-file ("socsim.sup") |
seed |
random number seed (42) |
suffix |
optional suffix for the results-directory (default="") |
fn |
complete path to the file. If not provided, it will be created from the other arguments |
id_offset |
positive integer stride used to offset IDs when combining multiple files. Ignored for single-file reads. Default is 10 million, which allows combining up to 214 files with a total population of 10 million each. |
quiet |
logical. If |
after the end of the simulation, socsim writes every person of the simulation into a file called result.opop |
| 1 | pid | Person id unique identifier assigned as integer in birth order |
| 2 | fem | 1 if female 0 if male |
| 3 | group | Group identifier 1..60 current group membership of individual |
| 4 | nev | Next scheduled event |
| 5 | dob | Date of birth integer month number |
| 6 | mom | Person id of mother |
| 7 | pop | Person id of father |
| 8 | nesibm | Person id of next eldest sibling through mother |
| 9 | nesibp | Person id of next eldest sibling through father |
| 10 | lborn | Person id of last born child |
| 11 | marid | Id of marriage in .omar file |
| 12 | mstat | Marital status at end of simulation integer 1=single;2=divorced; 3=widowed; 4=married |
| 13 | dod | Date of death or 0 if alive at end of simulation |
| 14 | fmult | Fertility multiplier |
This table explains the columns of the opop file and the columns of the output data frame.
You can either provide the complete path to the file or the folder, supfilename, seed and suffix with which you
started the simulation
A data frame with columns pid, fem, group, nev, dob,
mom, pop, nesibm, nesibp, lborn, marid, mstat, dod, and
fmult, matching the SOCSIM result.opop file. If the file is missing
or empty, a zero-row data frame with these columns is returned.
Identify members of a kin network for an individual or individuals of interest.
retrieve_kin( opop, omar, pid, extra_kintypes = character(), kin_by_sex = FALSE, KidsOf = NULL )retrieve_kin( opop, omar, pid, extra_kintypes = character(), kin_by_sex = FALSE, KidsOf = NULL )
opop |
An R object from SOCSIM microsimulation output (population file). Create this object with the function read_opop(). |
omar |
An R object from SOCSIM microsimulation output (marriage file). Create this object with the function read_omar(). |
pid |
A vector of person IDs, indicating persons of interest for whom these kin networks should be identified. |
extra_kintypes |
A vector of character values indicating which additional types of kin should be obtained. For reasons of computational efficiency, the function will by default only identify an individual's great-grandparents ("ggparents" in function output), grandparents ("gparents"), parents, siblings, spouse, children, and grandchildren ("gchildren"). However, by selecting one or more of the following kin types, the kin network generated will also include these individuals:
|
kin_by_sex |
A logical value indicating whether output should include kin relations additionally disaggregated by the sex of the relative. Setting this value to TRUE will result in additional objects being generated to identify individuals' relatives by sex. |
KidsOf |
An optional precomputed list object containing the children of
each person in the population. If |
A named list whose components are kinship categories such as
parents, siblings, or children. Each component is itself a named list
of integer person IDs, organized by relationship. These person ID values will be named
based on the person of interest with whom they are associated.
For example, for a list named "parents", the values will be person IDs of
the parents of individuals of interest. These values will be named according
to their children's IDs (given that their children are, in this case,
the persons of interest provided to the function input).
With kin_by_sex set to TRUE and extra_kintypes set to c(c("gunclesaunts",
"unclesaunts", "firstcousins", "niblings", "inlaws")),
the full list of kin relations identified are:
"ggparents": great-grandparents
"ggmothers": great-grandmothers
"ggfathers": great-grandfathers
"gparents": grandparents
"gmothers": grandmothers
"gfathers": grandfathers
"gunclesaunts": great-uncles and great-aunts
"guncles": great-uncles
"gaunts": great-aunts
"parents": parents
"mother": mother
"father": father
"unclesaunts": uncles and aunts (siblings of parents)
"uncles": uncles
"aunts": aunts
"siblings": siblings
"sisters": sisters
"brothers": brothers
"firstcousins": first cousins
"firstcousinsfemale": female first cousins
"firstcousinsmale": male first cousins
"children": children
"daughters": daughters
"sons": sons
"gchildren": grandchildren
"gdaughters": granddaughters
"gsons": grandsons
"niblings": nephews and nieces
"nieces": nieces
"nephews": nephews
"spouse": spouse (based on final marriage, in the case of multiple marriages)
"parentsinlaw": parents-in-law
"motherinlaw": mother-in-law
"fatherinlaw": father-in-law
"siblingsinlaw": brothers and sisters in law
"sistersinlaw": sisters-in-law
"brothersinlaw": brothers-in-law
opop <- data.frame( pid = 1:4, fem = c(1, 0, 1, 0), group = 1, nev = 0, dob = c(120, 120, 300, 300), mom = c(0, 0, 1, 1), pop = c(0, 0, 2, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = c(1, 1, 0, 0), mstat = c(4, 4, 1, 1), dod = 0, fmult = 0 ) omar <- data.frame( mid = 1, wpid = 1, hpid = 2, dstart = 0, dend = 0, rend = 0, wprior = 0, hprior = 0 ) kin_network <- retrieve_kin( opop = opop, omar = omar, pid = 3, extra_kintypes = c("niblings", "inlaws"), kin_by_sex = TRUE ) kin_network$parents[[1]]opop <- data.frame( pid = 1:4, fem = c(1, 0, 1, 0), group = 1, nev = 0, dob = c(120, 120, 300, 300), mom = c(0, 0, 1, 1), pop = c(0, 0, 2, 2), nesibm = 0, nesibp = 0, lborn = 0, marid = c(1, 1, 0, 0), mstat = c(4, 4, 1, 1), dod = 0, fmult = 0 ) omar <- data.frame( mid = 1, wpid = 1, hpid = 2, dstart = 0, dend = 0, rend = 0, wprior = 0, hprior = 0 ) kin_network <- retrieve_kin( opop = opop, omar = omar, pid = 3, extra_kintypes = c("niblings", "inlaws"), kin_by_sex = TRUE ) kin_network$parents[[1]]
Calculate for how many years the simulation ran
simulation_time_to_years(simulation_time, pre_simulation_time, start_year)simulation_time_to_years(simulation_time, pre_simulation_time, start_year)
simulation_time |
An integer. The number of periods (months) the simulation ran. |
pre_simulation_time |
An integer. The number of periods (months) the simulation ran before getting to a stable population. This is subtracted from 'simulation_time' in order to arrive at the "real" simulation time |
start_year |
An integer. The year the simulation started. |
A numeric scalar giving the calendar year reached at the end of the
simulated period after subtracting pre_simulation_time / 12. The value
can include a fractional year.
Run a single Socsim simulation with a given supervisory file and directory
socsim( folder, supfile, seed = "42", process_method = "inprocess", compatibility_mode = "1", suffix = "" )socsim( folder, supfile, seed = "42", process_method = "inprocess", compatibility_mode = "1", suffix = "" )
folder |
A string. This is the base directory of the simulation. Every .sup and rate file should be named relative to this directory. |
supfile |
A string. The name of the .sup file to start the simulation, relative to the directory. |
seed |
A string. The seed for the RNG, so expects an integer. Defaults to "42". |
process_method |
A string. Whether and how SOCSIM should be started in its own process or in the running R process. Defaults to "inprocess". Use one of:
|
compatibility_mode |
A string. |
suffix |
A string. |
Returns 1L when the simulation finishes successfully. If the
simulation errors before completion, the function returns NULL after
issuing warnings. Result files are written to the directory
sim_results_<basename(supfile)>_<seed>_<suffix> inside folder.