Mount Sinai GeoSpatial

What are typical use cases for GREENER data?

Researchers use GREENER to link environmental exposures — air pollution, greenspace, social vulnerability, climate — to health cohorts for epidemiological and translational studies. Common use cases include assessing long-term air quality exposure at participant residences, comparing neighborhood-level social vulnerability across study populations, and building multi-exposure models for chronic disease research.

Where does the data come from?

GREENER curates data from federal agencies (EPA, NASA, NOAA, CDC, USGS, HUD, USDA) and peer-reviewed academic models (Harvard, Stanford, Boston University, ECMWF). Each dataset is processed, harmonized to standard CONUS geographies, and validated for spatial and temporal completeness before publication on the platform. See the Data Sources section for the full list.

What is the geographic scope of GREENER?

All datasets are currently scoped to the Continental United States (CONUS), using 2020 TIGER/Line census tract boundaries as the standard geographic unit. Hawaii, Alaska, and U.S. territories are not currently supported.

How current is the data?

Coverage varies by dataset. High-resolution PM2.5 (GEEA Lab) is available through 2025. Harvard air quality datasets cover 2000–2016. NDVI (Landsat) runs through 2023. Social datasets such as SVI and COI are updated annually. See the Data Library for exact coverage per dataset.

Who can access the Data Library?

Public datasets in the GREENER Data Library are available to anyone without an account. To access restricted datasets or submit a cohort linkage request, you will need an approved GREENER account.

What is the difference between public and restricted data?

Public datasets — such as SVI, NDVI, and COI — can be browsed and downloaded by any visitor. Restricted datasets contain finer-grained or sensitive data (e.g., daily 100m resolution PM2.5) and require an approved account with a brief data use justification. All requests are reviewed by the GREENER team prior to data delivery.

How do I request an account?

Accounts are available to any researcher. Submit a request through the registration page on the platform. Requests are reviewed by the GREENER team and typically processed within a few business days.

What file formats are used for data delivery?

Linked datasets are delivered as CSV or Apache Parquet files, depending on the size and structure of the request. All deliveries include a data dictionary. Cohort linkage requests also include a match rate report and QA flag columns documenting geocoding quality and data completeness per variable.

What is a "match rate" and why does it matter?

Match rate is the proportion of participants successfully linked to a given exposure dataset for each year. A match rate below 95% is flagged in the QA report, with a breakdown of unmatched records and likely causes — such as address quality, geographic edge cases, or dataset temporal gaps.

What is the difference between point-level and tract-level extraction?

Point extraction samples the raster value at the geocoded participant coordinate (their residence). Tract-level extraction returns the mean of all raster pixels intersecting the participant's census tract. Point-level is appropriate for individual-level exposure research; tract-level is used for ecological analyses.

Can I request data for a longitudinal study where participants have multiple addresses?

Yes. GREENER supports multi-address longitudinal linkage. Provide a time-stamped address history and each address window will be linked to the corresponding exposure data for that period. Match rates and QA flags are generated per address window.

Frequently Asked Questions