Ana səhifə

U. S. Department of Education Office of Planning, Evaluation and Policy Development


Yüklə 1.39 Mb.
səhifə9/12
tarix26.06.2016
ölçüsü1.39 Mb.
1   ...   4   5   6   7   8   9   10   11   12

b. The Effect of Participation in Upward Bound (CACE)


For the average eligible applicant to Upward Bound, the pattern of statistically significant effects of actual participation (the CACE effects) is nearly identical to that for the effects of the opportunity to participate in Upward Bound (the ITT effects). However, the CACE estimates in Table III.2 are generally larger than the ITT estimates in Table III.1, reflecting the fact that approximately 15 percent of treatment group members did not participate in Upward Bound or Upward Bound Math-Science and about 14 percent of control group members did participate in Upward Bound or Upward Bound Math-Science.

C. Summary of Sensitivity Analyses


Several important choices were made in designing the evaluation. One choice pertained to the length of the follow-up period. Considering the objective of Upward Bound to prepare students for entry into and success in postsecondary education, the Department of Education specified a long follow-up period that allowed sample members to be observed for many years

b
eyond expected high school graduation. Although response rates to the evaluation’s follow-up surveys remained high, administrative data from the NSC and federal FSA files were obtained to assess and address the potential effects of survey nonresponse. While nonresponse is one potential limitation of survey data, measurement and coverage error are concerns with administrative data.

Measuring postsecondary outcomes in different ways can shed light on how the relative strengths and weaknesses of the data sources affect the findings of the evaluation. Therefore, we have conducted a set of sensitivity analyses to examine alternative ways of combining data from the available sources—surveys, NSC, and FSA—to measure postsecondary enrollment and completion.

Other important design choices pertained to the sample of projects that would be selected. In addition to specifying that the sample had to be nationally representative, the Department of Education required that the sample have substantial overrepresentation of some less common, but key types of projects, including, for example, projects serving predominantly Native American students. Attempting to balance the competing needs of the evaluation, the chosen design had much higher selection probabilities for these relatively rare projects than for more common types of projects. This led to substantial undersampling and underrepresentation of the latter and to very unequal weighting of projects in the evaluation sample.

One implication of the sample design was that some of the most common types of Upward Bound projects had low selection probabilities and were substantially undersampled. This is true of one set of projects in particular—projects that were medium-sized, located in an urban setting, hosted by a four-year public institution, and not serving a group of students that is predominantly Asian, Native American, or Latino. This stratum of projects ends up accounting for about 26 percent of all eligible Upward Bound applicants nationwide. The final sample selected for the impact evaluation included only one project out of 56 projects in this stratum. The main impact analyses weight the sample accordingly, and the sample members from this one project account for approximately 26 percent of the total weight.

Because one project and its students comprise such a large proportion of the weighted sample, two additional sets of analyses were conducted. The first examined whether this one sampled project—labeled Project 69—is an outlier or unusual in any way. The second reduced the relative weight given to Project 69 when estimating impacts.


1. Sensitivity Analyses Pertaining to the Measurement of Outcomes


There are many approaches to combining data from the available sources to measure postsecondary enrollment or completion, and many assumptions that can be made about enrollment or completion status when the data do not provide definitive evidence. In Appendix B, we discuss these issues, and describe many different measures for the postsecondary outcomes examined in this report. We present estimates for the measures in Appendix C.

Across a wide range of approaches to measuring postsecondary enrollment, the basic finding of no detectable impact holds up. For 27 different measures of postsecondary enrollment, the distribution of estimated impacts ranges from –2.4 to 2.8, with a mean of 1.3. None of the estimates are significant. For the impact on attending a four-year institution, the 27 estimates range from 0 to 5.5, with a mean of 1.8, and one estimate is significant.14

There are fewer ways to use the available data for measuring completion than there are for measuring enrollment, because FSA data do not provide any direct information about postsecondary completion. We considered nine different measures of completion (see Appendix B). Estimates for three of the nine indicate that the impact on completing any degree, certificate, or license is significant. The nine estimates range from 0.5 to 13.0 (the second largest is 3.7), with a mean of 3.5. When we examine the estimated impacts on receipt of a bachelor’s degree, we find that estimates for the nine measures range from –1.0 to 4.3, with a mean of 0.6, and one estimate is significant. These estimates and the estimates pertaining to the receipt of certificates or licenses as the highest degree completed suggest that if Upward Bound affects postsecondary completion, it might do so by increasing the likelihood of earning a certificate or license, as suggested previously by Table III.1.

Exploring results obtained with different measures of the outcomes is important because judgments are required to decide which results are most likely to reflect the overall national effects of the program. Although not all approaches are equally good, there are some reasonable alternatives to the methods underlying the main results reported here, and reporting the sensitivity of those findings to alternative approaches can help readers evaluate and interpret the results.


2. Sensitivity Analyses Pertaining to Sample Weighting


The sample design adopted for the evaluation has important consequences for the weighting of sample data. As noted above, the sample selection stratum composed of the most common type of project is represented by one project, Project 69, and that project had a selection probability much lower than the average selection probability. It also had a large pool of eligible applicants. As a consequence, the students in Project 69 represent 26 percent of eligible applicants nationwide. In the main analyses, their data are weighted as such to account for the precision of the sample decision and measure the effect of the national Upward Bound program on the average eligible applicant.

Because Project 69 and its students comprise such a large proportion of the weighted sample, we performed analyses to address two broad questions (see Appendix G for details):



  • Is this project an “outlier,” that is, unusual in some way?

  • Does this project have a large amount of influence on our results?

By the available measures, Project 69 is not an outlier. We find, for example, that it is similar in terms of project-level characteristics to the five projects from the same sample selection stratum that were selected for the grantee survey sample but not the impact study sample. Further analyses find that there are some significant differences between treatment and control groups in Project 69, as there are for other projects. Some such differences at the project level are expected to occur by chance. We adjust for these differences using regression methods, and include in our models covariates measuring student baseline characteristics, as well as interactions that capture the effects of these covariates specific to Project 69.

This examination of baseline differences between the treatment and control groups revealed that as shown in Appendix G, treatment group members in Project 69 were more likely to have applied to the program in ninth grade and less likely to have applied in tenth grade than control group members were. These differences are not statistically significant. Nevertheless, although the evaluation has had a very long follow-up period for observing postsecondary outcomes, the treatment group members in Project 69 had somewhat less time, on average, to begin and complete postsecondary education. Therefore, to assess the potential effects of this, we conducted additional sensitivity analyses. One analysis derived impacts using a regression model that controlled for not only grade at application (as in the main analysis) but also expected year of high school graduation, including indicators for different years and estimating effects specific to Project 69. With one exception, the impacts obtained are numerically smaller than the impacts in Table III.1, and are not significant.15 For another sensitivity analysis, we constructed a standardized outcome measure—postsecondary enrollment within six years of the year of expected high school graduation. With this standardized measure, the impact on overall postsecondary enrollment is numerically larger than the impact from our main analysis—1.60 compared with 1.54. With p-values of 0.60 and 0.58, respectively, neither impact is significant.16

In addition to the analyses of project-level characteristics from the grantee survey and baseline differences between the treatment and control groups, we examined the distribution across projects of average baseline characteristics of sample members, no-show and crossover rates by treatment and control group members, mean outcomes of control group members as of the fifth follow up, and impacts on postsecondary outcomes. These analyses support the finding that Project 69 is not an outlier, although as would be expected for any project, it is sometimes in the lower or upper portion of the distribution and not right at the center.

We also conducted analyses to examine the influence of Project 69 on overall impacts and assess the robustness of the main findings. Detailed results and a detailed description of the analyses can be found in Appendix G.

In one analysis, we determined how much larger Project 69’s impact on each outcome would have to be to make the overall impact of Upward Bound statistically significant when Project 69 gets its full weight and standard errors correctly reflect the precision of the sample design. We find that Project 69’s impact would often have to move from the lower end of the distribution of project-level impacts to the upper end in order for the overall impact of Upward Bound to be significant. This implies that Project 69 and the other 55 projects in Project 69’s selection stratum would have to have had larger impacts, on average, than all of the other Upward Bound projects. Otherwise, the results would not be affected.

In contrast to this analysis, most of the sensitivity analyses involved changing weights to reduce the relative weight given to Project 69’s sample members. One such analysis adjusted the weights within each project to weight up to the number of funded slots rather than the number of applicants. This addresses concerns about not only the effects of typical year-to-year fluctuations in the number of applicants, but also whether the implementation of random assignment might have inflated the number of applicants differentially across projects. With this approach, Project 69 accounts for about 15 percent, rather than 26 percent, of the total weight—a much lower but still appropriately large fraction. The estimated impacts are generally somewhat bigger than those obtained in our main analyses. The pattern of significance levels is essentially the same. (See the last column of Table III.3.)

Most of the other analyses that reduce the relative weight given to Project 69’s sample members changed weights even more substantially. In these analyses, we examined impact estimates obtained by: combining project sampling strata in various ways; redistributing much of Project 69’s weight to various sets of projects that were most similar to Project 69 on a wide range of project- and student-level characteristics; and redistributing much of the weight of each Project 69 sample member to sample members in other projects with similar individual characteristics. We also ran unweighted analyses, and derived weighted estimates without Project 69. Results from a few of these analyses are shown in Table III.3. To facilitate comparisons of results, the first column of Table III.3 repeats the estimates from the main analysis, which were presented in Table III.1 above.

Many of these sensitivity analyses that changed sample weights substantially produced larger impacts for most outcomes compared with the findings from the main impact analysis,



which weighted all sample members according to their actual selection probabilities. Many of the impacts from the analyses with large changes in weights are also significant. This suggests that the results are sensitive to such large changes in the weight of Project 69.

Because Project 69 had below average impacts for most outcomes, reducing its weight relative to other projects results in larger overall impacts. Reducing the weight of Project 69 also underestimates the standard errors associated with the impact estimates. With larger impact estimates and reduced standard errors, many impact estimates become statistically significant when the sample weight for Project 69 is substantially reduced. When the standard errors more accurately reflect the precision of the sample design, many of these impact estimates are not statistically significant. Furthermore, as shown in Appendix G, they become smaller and fewer are significant when other projects with relatively large weights are dropped from the analysis along with Project 69. This illustrates an important consideration—the potential for influencing the findings through post hoc adjustments that deviate from the chosen design.

Another important consideration in interpreting results from analyses that omit Project 69 or otherwise change the weights of projects in any substantial way is that the resulting sample no longer represents the actual universe of Upward Bound projects. In particular, the sample does not appropriately represent the most common stratum of Upward Bound projects. Thus, with the possible exception of the analysis that adjusts weights to the number of funded slots, such analyses do not answer the evaluation’s research questions about the impacts of the national Upward Bound program. Moreover, the estimates from such analyses do not generalize to urban projects, large projects, or any other well-defined subset of projects for which the findings might have policy implications.

In contrast, the findings from the main impact analyses, which include all projects weighted based on their selection probabilities, are intended to generalize to the national Upward Bound program. In assessing the implications of those findings, however, a statistical consideration is that as a consequence of selecting a single project from a large stratum—the stratum represented by Project 69—the estimates and inferences for that stratum and, therefore, the universe of projects will generally not be as robust as the estimates and inferences that would be obtained with an alternative design with much less variable project selection probabilities and with several projects selected from the large stratum. The lower robustness of the chosen sample design and the results from the extensive sensitivity analyses can be taken into account in determining the implications of the main findings.

1   ...   4   5   6   7   8   9   10   11   12


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©atelim.com 2016
rəhbərliyinə müraciət