Report Information Index

What is Cancer Incidence Data?

Cancer incidence is the number of cases of cancer that are diagnosed during a given period of time. Typically the period of time is a calendar year. Incidence should not be confused with prevalence, which is the number of people that have cancer at a particular point in time. For example, suppose a person is diagnosed with cancer in 2003 and dies from it in 2005. That counts as an incidence case only in 2003, but it counts for prevalence at any point between the date of diagnosis and the date of death.

It's also important to understand that the number of cases diagnosed is not necessarily the same as the number of patients diagnosed. It's possible for a patient to be diagnosed with two different types of cancer. For example, someone might be diagnosed with both prostate cancer and stomach cancer. This counts as two different cases, even though it's the same patient.

In addition, cancer incidence generally excludes cancers that are not invasive at the time of diagnosis. A cancer is not invasive if it hasn't spread beyond the layer of cells in which it originated. An exception is made for cancer of the urinary bladder because of the difficulty in ascertaining the correct stage, and because patients generally receive the same treatment for in situ and microinvasive tumors. Consequently cases of in situ bladder cancer have traditionally been included in cancer incidence statistics.

What is Cancer Mortality Data?

Cancer mortality is the number of people who die from cancer during a given period of time. Typically the period of time is a calendar year. A number of factors may contribute to a person's death, but only one is designated as the underlying cause, while the rest are listed as contributing causes. If cancer is a contributing cause in a particular case, but not the underlying cause, that case does not count toward cancer mortality. It counts only when cancer is the underlying cause.

What is Cancer Staging?

Staging is a system of describing the extent or spread of cancer at the time the patient is diagnosed. There are several different staging systems, but the one used here for creating reports is known as "summary staging." The rules for determining the summary stage have changed as knowledge has increased, and thus is dependent on the date of diagnosis. Nonetheless, the following categories are used regardless of the date:

What are Rates?

If you want to compare cancer incidence in two different areas, you cannot simply compare the number of cases in the two areas, since one area may have a much larger population than the other. Suppose for example, you want to compare Brobdingnab County, which has a population of 600,000 people, with Lilliput County, which has only 25,000. Suppose that during a particular year 1,800 cases of cancer are diagnosed in Brobdingnab County while only 100 cases are diagnosed in Lilliput County. Looking only at the case counts, cancer incidence would seem to be worse in Brobdingnab County. But if we know that Brobdingnab County has a much larger population, we would expect it to have many more cases. It would be more useful to know what percent of the population is diagnosed with cancer. In this case, the number of cases in Brobdingnab County is 0.3% of its population, while the number of cases in Lilliput County is 0.4% of its population.

Because incidence cases are nearly always less than 1% of the population, rather than using percentages, rates are usually expressed as the number of cases per 100,000 people in the population at risk. Thus rather than saying 0.3%, we would say the rate in Brobdingnab County is 300 per 100,000. Similarly the rate in Lilliput County (0.4%) is 400 per 100,000. Generally the words "per 100,000" are omitted.

Also note that even if the incidence rate is expressed as a percentage (per 100) rather than per 100,000, it's still not the percentage of people who are diagnosed with cancer. It's the number of cases diagnosed. Since a person can be diagnosed with more than one type of cancer, a single person can count as more than one case.

The rates calculated in the preceding paragraphs are crude rates. Calculating crude rates compensates for differences in population size. There is another kind of rate called age-adjusted, which compensates not only for differences in population size, but also for differences in the age-distribution of the population.

What is a Crude (or Age-Specific) Rate?

A crude rate is the number of cases diagnosed in a given period of time, usually a calendar year, for every 100,000 people in the population at risk. If we are interested in the rate among people in a particular age group, for example, people in their late seventies, then we refer to it as an age-specific rate. In that case, the population at risk is limited to the people in the age group under consideration. But other than this change in what the phrase "population at risk" refers to, the formula is the same.

What is an Age-Adjusted Rate?

An age-adjusted rate is a rate that has been adjusted to compensate for differences in age distribution in a population. The age distribution of a population is the proportion of people within each age group. For example, suppose the largest industry in Paleolithic County is retirement homes, while in Neolithic County it's amusement parks. If a lot of older people move to Paleolithic County to retire, and a lot of younger people move to Neolithic County to find work, the age distribution in Paleolithic County will be skewed toward elderly people and the age distribution in Neolithic County will be skewed toward young people. Since age-specific cancer rates increase with age, we would expect more cases of cancer to be diagnosed in Paleolithic County than in Neolithic County just because it has more older people.

In order to compare cancer rates in these two counties, the rates need to be adjusted to compensate for these differences in age distribution. This is done by using a formula that calculates the age-specific rate for each age group, multiplies it by the proportion that age group comprises of some standard population, and then sums these adjusted age-specific rates. This means that when a particular population has proportionately more people in a particular age group, the number of cancers diagnosed in that age group will count less because there would be fewer people, and hence fewer cancers, if the age-specific rate remained the same and the age distribution of the population were the same as the standard population. Similarly, when there are proportionately fewer people in an age group, the number of cancers diagnosed in that age group count more.

For example, consider the following table showing the populations of the two counties split into four age groups. The columns labeled Cases give the number of cases diagnosed in each age group for each county. The columns labeled Population show the population for each age group in each county. The columns labeled Rate show the age-specific (or crude) rates for each age group. The columns labeled Adjusted show the adjusted values of the age-specific rates that are used to calculate the age-adjusted rate. The column labeled Standard Population shows the percent of each age group within the standard population. For example, if the total standard population is 300 million people, 40% of them--or 120 million--are younger than 25.

Age
Group
Neolithic County Paleolithic County Standard
Population
Cases Population Rate1 Adjusted2 Cases Population Rate1 Adjusted2
0-24 20 50,000 40 16 8 20,000 40 16 40%
25-49 60 30,000 200 50 40 20,000 200 50 25%
50-74 75 15,000 500 100 100 25,000 400 80 20%
75+ 60 5,000 1,200 180 350 35,000 1,000 150 15%
Total 215 100,000 215 3463 498 100,000 498 2963 100%

1The age-specific rate for an age group is calculated by dividing the number of cases by the population and then multiplying by 100,000.

2The adjustment to the age-specific rate is calculated by multiplying the rate by the standard population percent. The value is the number of cases per 100,000 people that would have been diagnosed if the age-specific rate for the age group remained the same but the percent of the county population within the age group were the same as in the standard population.

3The age-adjusted rate is calculated by adding up the adjusted age-specific rates given in the same column of the preceding rows.

The overall crude rate for each county is shown in the cell with the yellow background. As you can see, the crude rate in Paleolithic County is 498, which is more than twice the crude rate in Neolithic County, which is only 215. But notice the age-specific rate for each age group in Paleolithic County is either the same as or less than the corresponding rate in Neolithic County. This is reflected in the age-adjusted rate, which is shown in the cells with the aqua background. The age-adjusted rate for Paleolithic County is 296, which is less than 346, the age-adjusted rate for Neolithic County.

The following table shows what the preceding table would look like if entered into an Excel spreadsheet, except the actual formulas are shown for the cells where the values are calculated. Any cell whose contents starts with an equal sign (=) contains a formula. All other cells contain data values that are not calculated, but entered as is. This makes it easier to see how the age-specific and age-adjusted values are calculated. Notice there is an extra column showing the standard population counts. This shows how the standard population percentages are computed.

. A B C D E F G H I J K
1 Age Neolithic County Paleolithic County Standard Population
2 Group Cases Population Rate Adjusted Cases Population Rate Adjusted Percent Count
3 0-24 20 50,000 =B3/C3*100000 =D3*J3 8 20,000 =F3/G3*100000 =H3*J3 =K3/K7 120,000,000
4 25-49 60 30,000 =B4/C4*100000 =D4*J4 40 20,000 =F4/G4*100000 =H4*J4 =K4/K7 75,000,000
5 50-74 75 15,000 =B5/C5*100000 =D5*J5 100 25,000 =F5/G5*100000 =H5*J5 =K5/K7 60,000,000
6 75+ 60 5,000 =B6/C6*100000 =D6*J6 350 35,000 =F6/G6*100000 =H6*J6 =K6/K7 45,000,000
7 Total 215 100,000 =B7/C7*100000 =SUM(E3:E6) 498 100,000 =F7/G7*100000 =SUM(I3:I6) =SUM(J3:J6) =SUM(K3:K6)

The ISCR Report Generator uses age groups covering 5-year spans, instead of 25-year spans, as shown in this example. The ISCR Report Generator uses the 2000 US Standard Population. The standard population in this example was invented solely for the purposes of illustration.

What is a Confidence Interval and a Standard Error?

Because of random variation, the calculated rate may not be the "true" rate, but it should be in the neighborhood of the true rate. Confidence intervals and standard errors are calculated values that give us an idea of what that neighborhood is. Confidence intervals are perhaps more intuitively understandable. A confidence interval is a range of values around the calculated rate such that there is a given probability that the true rate is in that range. For example, suppose the calculated rate is 400 cases per 100,000. It's possible to calculate two numbers, one below 400 and one above it, such that there's (say) a 95% probability the true rate lies between those two numbers. If, for example, those two numbers turned out to be 397 and 404, we would say we're 95% confident the true rate is between 397 and 404.

Knowing the confidence interval can be important when comparing two rates. For example, if calculated rates for Counties A and B are 400 and 420, respectively, it might seem reasonable to say County B has a higher rate. But if we know the 95% confidence intervals for the two rates are 360-445 and 385-460, respectively, we can see there's a fairly wide overlap between these two intervals. So it could turn out that County B actually has a lower rate.

Confidence intervals can be calculated for both crude rates and age-adjusted rates, but the formula for a crude rate confidence interval differs from the formula for an age-adjusted rate confidence interval.

The standard error for a rate is no so easily explained. Like the confidence interval, it's a value that identifies the neighborhood where the true rate lies, but this neighborhood is not as easily visualized.

The standard error can also be calculated for both crude rates and age-adjusted rates, and like confidence intervals, the formula for the standard error for a crude rate differs from the formula for the standard error for an age-adjusted rate.