| |
|
Frequently Asked Questions
_______________________________________________________________________________
Publishers, Publications, Datasets & Vintages
Publishers publish publications with datasets periodically in vintages.
| | like ... |
| Publisher: | Census Bureau |
| issue Publication: | Decennial Survey |
| of Dataset: | (DP02) Selected Social Characteristics |
| over time Vintage: | (DP02) Selected Social Characteristics for 1990, 2010, 2020 |
_______________________________________________________________________________
Subjects
Subjects allow us to categorize Publications, Datasets and Models so they can be searched and compared.
Subjects might include:
- Consumers
- Geographies
- Economics
- Politics
- Financial Markets
_______________________________________________________________________________
Models
Open Environments applies machine learning methods to estimate or project new datasets.
A model is like a data publication that is calculated rather than representing an originally collected dataset.
Open Environments first models include:
- Projecting election results from voting precincts on to geographies of the American Community Survey
- Using BLS Consumer Expenditure microdata to estimate spending for each Census block group
- Casting the CDC's Social Vulneratibility index from the Census tract level down to lower levels
- Merging TIGER/Line shapes with select ACS demographic results
Models can be found on this website linked to the Open Environments service on Harvard University's dataverse platform.
_______________________________________________________________________________
Reference Data (Codes!)
To structure some dataset, a field may rely on a set of valid values.
To use that dataset, you will needs an authoritative master list of those codes with a description of each.
Very often these code lists are administrated by government agencies.
For example,
- Data with the FIPS County Code "019" and FIPS State Code "04" refers to Pima County, Arizona.
- Companies that provide NAPCS code "67119010202" are selling hay.
- If your doctor's visit is coded W61.62XD, you've had a followup visit after being attacked by a duck.
- There are nine codes for varieties of the Portugese language.
_______________________________________________________________________________
Geographies
Public data is often captured around some geography. This is valuable for visualizing on a map.
In machine learning, you can add a person's geographic attributes to the properties input to the model.
In marketing terms, You Are Where You Live (YAWYL) analysis would associate your propensities with the data about your geography.
For example, an alumni's likelihood to donate might be affected by their zip code's income distribution.
- Population Density of the person's Zip Code
- Age of the person's County
- Unemployment Rate of the person's State
- Poverty Rate of the person's Nation
_______________________________________________________________________________
|
|
|
| |