Data Processing and Analysis in the NM EPHT Network
- Age Adjustment in Analysis
- Time Series Data and Analysis
- Linking Health and Environmental Data
- Satellite Data
The geocoding of health outcome and environmental data is a way to assign geographic locators to the data, thus enabling linkage of multiple types of data. Much of our data are geocoded with latitude-longitude coordinates assigned to a street address. Street address, city, county, zip code, and Census tract are the fields typically used to geocode Tracking data.
This PDF report describes some of the details of geocoding cancer data and some of the issues associated with these efforts. back to top
Age Adjustment in Analysis
Rates might require adjusting due to factors such as age. If the groups differ with respect to this factor, rates cannot be correctly compared unless the factor is adjusted for. For age, the general practice is to adjust rates to a standard population (e.g., the United States population from the 2000 Census). Using this practice, the mortality rate for Florida and the mortality rate for Alaska would both be age-adjusted to the U.S. standard population. The two mortality rates could then be compared.
For mortality, age adjustment is calculated as follows:
- Choose a standard population with a known distribution (U.S. Census 2000 Population).
- Calculate the age-specific death rates for the two populations (Alaska and Florida in this case).
- Calculate the age-specific expected number of deaths based on the standard population (multiply the age-specific death rates for the two states by the number of people in the respective age class in the standard population).
- For each state, add the expected numbers of deaths over all age classes. Divide the resulting total number of expected deaths by the total number of people in the standard population. These are the age-adjusted death rates.
Age-adjusted rates are used for many of the diseases tracked on the NM EPHT site but are not always the preferred method for comparison. back to top
Time Series Data and Analysis
A time series is a sequence of data points, typically measured at specified time intervals in order to understand the trends of the data over time. These trends can sometimes be used to forecast future events based on known past events, that is to predict future data points before they are measured. The term time series analysis is used to distinguish trend analysis from an analysis in which there is no natural ordering of the individual observations, and also from the spatial data analyses we use to relate our data to geographic locations. A time series model will generally reflect the fact that observations close together in time will be more closely related to each other than observations further apart in time. In addition, time series models will often make use of the natural one-way ordering of time so that values in a series for a given time will be expressed as deriving in some way from past values, rather than from future values.
The time series data presented on this site typically relate a health outcome, such as thyroid cancer, with time through representation of the health and time data on a graph. In the case of thyroid cancer, rates are increasing over time, whereas myocardial infarction rates are decreasing over time. In the case of arsenic concentrations in drinking water, rates are generally constant over time. All of these time series can be evaluated to determine the statistical significance of the changes and the magnitude of the rate of changes. back to top
Linking Health and Environmental Data
Linkage studies refer to investigations that connect environmental and health outcome data in time and place within a population. The primary goal of linkage studies is to facilitate understanding of the relationship between diseases and the environment. A good example of the impact such studies may have is the removal of lead from gasoline in the 1970's, following a series of national linkage studies showing that blood lead levels decreased in direct relation to declining lead use in gasoline. In this instance, a clear cause-effect relationship existed between environmental hazard and health effect, and measuring the declining lead content of gasoline was relatively easy to do.
Unfortunately, for many diseases the cause-effect relationship is not clear, and environmental hazards cannot be measured as easily. Furthermore, many adverse health outcomes may result from exposures to multiple different hazards, some received in the short term and others received over a longer, more protracted time period. For these reasons, it is important that linkage be approached in a scientifically rigorous manner. However, even here, caution is needed in the interpretation of results since the environmental data used in linkage analysis ordinarily is not collected on an individual basis; rather, data are collected across broad populations, such as a county or particular region within a state. Consequently, a study that does find a relationship between the level of an environmental hazard and the occurrence of a health outcome cannot be used to conclude that the hazard actually caused the health outcome, since it is not known who among the population actually received exposure at the levels of interest. Rather, the results of such studies can be used to generate hypotheses on causation, which can then be tested in more formal studies involving recruitment of study subjects and collection of data n an individual level.
Detailed studies on health and environmental data linkages are presented at NM EPHT Health Effects: Health and Environment Linkage Studies. back to top
Metadata are "data about data." There are several types of metadata, and these can be broadly defined under the categories of Descriptive, Structural, and Administrative. (These are taken from the CDC EPHT Metadata Workgroup's FAQ document.)
- Descriptive Metadata: Information that describes the content, quality, and context of a data resource for the purpose of facilitating identification and discovery. It may reference additional information like quality assurance documents and data dictionaries. Through descriptive metadata a user can learn the what, why, when, who, where, and how for a data resource.
- Structural Metadata: Information about how the item is put together or arranged such as the table of contents page, individual page numbers, or illustration. It basically describes the structure of an item, such as a book, so that all of the pages of that item can be displayed in the correct order. In the electronic world it facilitates navigation and presentation of electronic resources.
- Administrative Metadata: Includes information about resolution, bit depth, type of equipment used to produce the file, storage format, and file name and location. It can also include basic facts on ownership, rights, and reproduction information.
The Environmental Public Health Tracking Network makes extensive use of Descriptive Metadata. Metadata are considered the backbone of the EPHT Network. Metadata Benefits: As more data are created and stored, there is a need to document data resources for future use and to improve accessibility. Creating Descriptive Metadata:
- Helps an organization arrange and maintain its data assets.
- Limits duplication of effort by ensuring that others in the organization are aware of the existence of data resources.
- Assists in both determining and improving the quality of data resources.
- Improves an organization's ability to comply with rules, regulations, and policies related to data access.
- Reduces the loss of institutional memory for data resources when key staff move on.
- Provides information about an organization's data holdings so that users can locate available resources relevant to an area of interest or study.
- Provides the ability to advertise and promote the availability of data resources via online services.
- Supplies the means to document limitations about the data resource or disclaimers that are important for potential users to be aware of.
The Centers for Disease Control and Prevention (CDC), through EPHT Grantee efforts on the Metadata Workgroup and a contract with Northrop Grumman, developed the EPHT Metadata Creation Tool (MCT). The EPHT MCT generates customized, FGDC-compliant metadata files from information entered into Web forms. A New Mexico version of this tool is provided on the NM EPHT Web server; those who wish to use the MCT to create metadata must be assigned a username and password. (FGDC: Federal Geographic Data Committee)
During this initial stage of New Mexico EPHT implementation access to the MCT will be restricted to NM Tracking Team members, who may request an EPHT MCT username and password from the NM EPHT Webmaster, doh-eheb AT state DOT nm DOT us.
- Metadata Content Guidance Document, Version 1.0 (PDF) View or save this PDF Metadata Content Guidance Document that the EPHT Metadata Workgroup developed to explain metadata, show examples of environmental and health metadata records, and describe the Metadata Creation Tool.
- New Mexico EPHT Metadata Creation Tool (external Web site) Once you have received log-in information (username and password) from the NM EPHT Program, you can go to the Web site (http://epht-mct.unm.edu/), log in, and create your metadata file. Request an NM EPHT Metadata Creation Tool username and password New Mexico Tracking Team only: doh-eheb AT state DOT nm DOT us.
Satellite data provide spectral information for the Earth's surface, geology, waters, and atmosphere on a regular basis. Researchers and applied scientists use satellite data and derived data products for synoptic, repeated analysis of specific issues and questions. For example, data from the Visible Red (Red) and Near Infrared (NIR) spectral ranges can be calculated to provide an index for vegetation greenness, the Normalized Difference Vegetation Index NDVI. NDVI = (NIR - Red)/(NIR + Red)
Vegetation greenness provides an indication of water content in plants (as with stages of agricultural crop growth), indicates areas of no vegetation (such as areas of rock or snow cover), and shows changes in land cover and plant health over time (for example, plant cover before a wildfire compared with, later, extent of the fire through lost vegetation). The land cover classification process (which categorizes type and extent of land cover) uses the vegetation index as a layer of information, in addition to the spectral satellite data and other identifying data. A land cover classification is useful as a layer of information in many analyses; for example, it provides a method for assigning values to plant types for how well they hold dirt particles on the ground, and is used in erosion and dust forecast models.
Much of the satellite data used in the NM EPHT Program comes from National Aeronautics and Space Administration (NASA) sensors. Learn more about NASA Earth programs or browse NASA image galleries (both are external sites). Landsat and MODIS data are commonly used in image analysis; browse the Landsat image gallery and the MODIS image gallery (both are external sites). There are numerous commercial satellites in orbit; the company GeoEye operates IKONOS and GeoEye-1. Browse the GeoEye image gallery (external site). This PDF file, TBudge_remotesensing.pdf, was created from a PowerPoint presentation developed by Tom Budge, Remote Sensing Manager at The University of New Mexico Earth Data Analysis Center. It describes many of the satellite sensors and their image products, the significance of pixel and temporal resolutions, and differences in information between 8-bit and 11-bit data. Note: The PDF file is 13.6 MB and will take a while to download or load into the browser, depending upon your Internet connection. back to top
The four images below illustrate slides from an EPHT PowerPoint presentation on using NASA satellite data products in the NM EPHT Program.
back to top
back to top
back to top
back to top