Category Archives: Research

Building a county-level labor force and unemployment panel

If you have ever looked for a way to get county-level unemployment rates for each county over time, there are probably a few ways to go about that task. Below is one way to do it in Stata.

This copies each file from 2000 through 2018, saves and cleans it to your local machine, and then stacks them all together in a panel format. I haven’t fully spot-checked, so please let me know if you see bugs.

Disclaimer: I have never figured out the whole Git Hub thing. I suspect if you’re reading this page there’s a good chance you haven’t either. So please don’t proselytize, just use this as a resource. WordPress doesn’t allow uploading .do files, but here’s what’s you could do:

// set local directory
cd C:\Users\BLS_LAUS

// copy data (2000 to 2018) from BLS website
local years 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
foreach y of local years {
import excel https://www.bls.gov/lau/laucnty`y’.xlsx
keep if substr(A, 1, 2) == “CN”
rename A laus_code
rename B state_fips
rename C county_fips_3
rename D county_name
rename E year
drop F
rename G labor_force
rename H employed
rename I unemployed
rename J unemp_rate
replace labor_force = “” if labor_force==”N.A.”
replace employed = “” if employed == “N.A.”
replace unemployed = “” if unemployed==”N.A.”
replace unemp_rate = “” if unemp_rate == “N.A.”
destring year-unemp_rate, replace
generate str county_fips = state_fips + county_fips_3
save county_20`y’.dta, replace
clear
}

// append each year starting with 2000
use county_2000.dta
forvalues y=2001/2018 {
append using county_`y’.dta
}
save county_panel_2000_2018.dta, replace

// table of total labor force, employed, and unemployed
table year, c(sum labor_force sum employed sum unemployed) f(%20.0fc)

State support for public higher education – trends and data

Here is a quick visualization of the relationship between state higher education appropriations and net tuition revenue per full-time equivalent student from fiscal year 1980 to 2018.

The y-axis represents inflation-adjusted dollars per FTE based on SHEEO’s State Higher Education Finance data (here). The x-axis represents fiscal years; the solid black line is net tuition; the dotted line represents appropriations.

Please use, copy, and improve upon the Stata code pasted below to replicate these charts. This type of chart is a bit overwhelming, probably too much to squeeze into a small space, but it helps me see trends across multiple states. For example, I am squinting at the steady downward appropriation trends in Wisconsin that seems to have flattened out or even reversed direction in other Midwestern states over the past several years.

Thanks to the SHEEO team for making this data so easy to use. Stata code below:

// import data from SHEEO SHEF page
import excel https://sheeo.org/wp-content/uploads/2019/04/SHEEO_SHEF_FY18_Nominal_Data.xlsx, cellrange(A17:Q2016)

// trim data to focus on public appropriations, net tuition, and fte
keep A B E L N Q
rename A state
rename B fy
rename E heca_index
rename L ed_approp // state and local support less RAM
rename N net_tuition // net tuition less federal and state grants
rename Q fte // fte net medical students

// adjust for inflation using HECA
gen approp_adj = ed_approp / (heca_index/1)
gen tuit_adj = net_tuition / (heca_index/1)

// scale per fte
gen approp_fte = approp_adj/fte
gen tuit_fte = tuit_adj/fte

// table over time
table fy if state~=”US”, c(sum approp_adj sum tuit_adj sum fte) format(%15.0fc)
drop if inlist(state,”US”,”Washington DC”)

// lattice charts
* rescaled y axis
xtline tuit_fte approp_fte, i(state) t(fy) scheme(s1mono) byopts(rows(5) rescale note(“”)) xlabel(1980[10]2020,angle(vertical)) xtitle(“”) ymtick(none) ylabel(, nogrid angle(horizontal)) legend(size(tiny) region(lwidth(none)) label(1 “Tuition”) label(2 “Appropriations”) symxsize(6))
* fixed y axis
xtline tuit_fte approp_fte, i(state) t(fy) scheme(s1mono) byopts(rows(5) note(“”)) xlabel(1980[10]2020,angle(vertical)) xtitle(“”) ymtick(none) ylabel(0[5000]25000,nogrid angle(horizontal)) legend(size(tiny) region(lwidth(none)) label(1 “Tuition”) label(2 “Appropriations”) symxsize(6))

Using difference-in-differences in higher education research

Difference-in-differences is gaining popularity in higher education policy research and for good reason. Under certain conditions, it can help us evaluate the effectiveness of policy changes.

The basic idea is that two groups were following similar trend lines for a period of time. But eventually, one group gets exposed to a new policy (the treatment) while the other group does not (the comparison). This change essentially splits time in two, where the treatment group’s exposure to the policy puts them on a new trend line. Had the policy never been adopted, the two groups would have continued on similar paths.

I made a data file and show the steps I took to conduct this analysis in Stata.

Step 1: Generate treatment variable

Let’s say College A is exposed to the policy change (treatment) and College B is not (comparison). We just need a simple dummy variable to categorize these two groups. Let’s call it “treat,” where College A gets a value of 1 and College B gets a value of 0.

Step 2: Generate “post” policy variable

In the previous step, we didn’t say when the policy change occurred, so we need to do that now. Let’s say it began in 2015, meaning all years from that point forward are “post-policy” while those prior are “pre-policy.”

Step 3: Examine trends for the two groups

Now that we’ve identified the treatment/control and the pre/post periods, we can put it all together in a simple graph. I like to use the user-written “lgraph” command (use “ssc install lgraph, replace” to get it).

We see here the two groups were following similar trends prior to the 2015 policy change, and then the treatment group started to increase at a higher rate while the comparison group did not.

Step 4: Difference-in-differences means table

The visual inspection looks like there’s probably a policy effect, but it’s hard to tell the magnitude. To get at that, we need to measure the difference in groups means before and after the policy:

Below is a simple table calculating the difference between the two groups before (-10) and after (5) the policy. It then calculates the difference before and after within each group (20 and 35, respectively).

	Pre	Post	Difference
Comparison	517.5	537.5	20
Treatment	507.5	542.5	35
Difference	-10	5	15

The comparison group increased by 20 units after the policy, while the treatment increased by 35. Similarly, the treatment group was 10 units lower than the comparison group prior to the policy, but was ahead by 5 units after. When we calculate the difference in the group differences, we get 15 (e.g., 35 minus 20, or 5 minus -10).

Step 5: Difference-in-differences regression

We can run a regression on the data using the two variables created in Steps 1 and 2. The only trick is we need to interact those two variables (treat x post) to get our difference-in-differences estimate.

Here, the “treat” dummy measures the treatment group’s pre-policy difference from the comparison group. And “post” is the comparison groups post-policy change. The interaction between the two variables (“treat x post”) is our average treatment effect of 15, the same number we saw in the previous step. The intercept is the comparison group’s pre-policy mean.

In a regression framework, we can easily add covariates, use multiple comparison groups for robustness checks, and address issues that may arise with respect to standard errors. Doing so can help rule out plausible alternative explanations to the findings, assuming other important considerations are also met.

My goal with this post was to break down the difference-in-differences approach to help make it a little more accessible and less intimidating to researchers/policy analysts. I am still leaning a lot about the technique, so please consider these steps some illustrative tips to get oriented/introduced.

Stata replication code:

// generate treatment variable (College A is the treated unit, College B is comparison)
gen treat = 1 if id==1
recode treat (.=0)
lab def treat_lab 0 “comparison” 1 “treatment”
lab val treat treat_lab
tabstat y, by(treat) stat(n mean min max sd)

// generate post-treatment period (2015 is start date)
gen post = 1 if year>=2015
recode post (.=0)
lab def post_lab 0 “pre” 1 “post”
lab val post post_lab
tabstat y, by(post) stat(n mean min max sd)

// descriptives of the two groups
table year treat, c(mean y)
ssc install lgraph, replace
lgraph y year, by(treat) stat(mean) xline(2015) ylab(, nogrid) scheme(s2mono)

// did means table
table treat post, c(mean y)

// did regression
xtset id year
xtreg y i.treat i.post i.treat#i.post

State higher education funding – motion charts

Each year, State Higher Education Executive Officers (SHEEO) produces the go-to guide for state-level trends in higher education finance, downloadable and interactive data here.

Using this data, I created the following motion chart where each bubble represents a state. Pressing the play button shows how appropriations (x-axis) and net tuition revenue (y-axis) change over time.

You can select individual states and change the bubble color/size. All financial data are presented in 2017 dollars based on CPI and the “tuition-to-appropriations” ratio is simply that: net tuition relative to net appropriations, where values over 1 indicate the state generates more revenue from tuition than it appropriates. For more details on the data, see their report.

I hope you find this tool helpful! I like using it as a teaching tool, so please feel free to use and share. I sometimes run into trouble depending on the browser I’m using, so just a heads-up on that front. Unfortunately, Google retired Motion Charts, but I was able to update this with the most recent data so hopefully it has some shelf life.

Please let me know if you see any bugs or if you find interesting patterns!

High school FAFSA filing update

In the first three months of the current (2018-19) FAFSA filing cycle, 1.275 million high school seniors completed the form. This is a 6% increase from last year, as shown in the chart below.

We monitored these weekly trends last year and found about 2.1 million high school seniors filed by the end of the school year. So within just the first three months of the new cycle, we are over half-way to last year’s filing levels.

These trends hold pretty steady across the states, where all but five (MT, OR, VT, WV, WY) are outpacing last year’s figures each week.

Several states are making really strong progress on high school FAFSA completions over last year’s cycle. Arizona, Kansas, Louisiana, Mississippi, and Washington, DC, stand out as having some of the highest growth rates over last year.

We will continue monitoring these trends, especially in early spring when some state aid programs have filing deadlines.

Below is an excel file behind these charts, listing the weekly number of completions for the two cycles, by each state.

State high school FAFSA completions (.xlsx): state_hs_fafsa_week_13

Enrollment trends in UW Colleges

The UW Colleges are a unique feature of the public higher education landscape. They are classified as “Associate’s: High-Transfer, High-Traditional” colleges, meaning they largely serve traditional-aged students (coming directly out of high school) with a goal of improving transfer from Associate’s to Bachelor’s degree programs.

Nationwide, there are about 160 of these colleges* and they enroll about 1.5 million undergraduates. Since the end of the Great Recession, enrollments have dropped and today’s enrollments are higher than the pre-Recession levels.^

Colleges located in Midwestern states are experiencing similar enrollment declines since 2010, and the following chart shows the four largest Midwestern Higher Education Compact states’ total enrollments for their “high-transfer, high-traditional” Associate’s degree granting colleges:

Using the UW System’s Accountability Dashboard, we can see how UW Colleges’ enrollment levels have changed relative to four-year universities in the system (this chart excludes UW-Madison and UW-Milwaukee). Approximately 12,000 students enrolled during the fall of 2016-17.**

UW Colleges enrolled approximately 7 percent of the system’s total undergraduates in 2016-17 and this share has hovered between 6 and 8 percent over time.

And a growing share of UW Colleges’ enrollments are from students who identify as Black, Hispanic, or Native American. Since 2010, the percentage of students classified into these three racial/ethnic groups has doubled within the UW Colleges.

As conversations about UW Colleges continue, it is also worth stating these two-year institutions serve different missions than the Wisconsin Technical College System (WTCS). To oversimplify the difference, technical colleges are focused more on vocational training while UW Colleges focuses more on transfer. Nevertheless, the final chart uses IPEDS enrollment data to summarize how enrollments in the technical colleges, university system, and UW Colleges have changed over time.

Notes:

* A “college” could include multiple campuses. For example, UW Colleges is reported as a single institution in IPEDS, but accounts for 13 campuses and an online presence.

^ In all charts, we would get slightly different enrollment trends when using 12-month headcounts from IPEDS, but I used fall enrollment (undergraduate total, degree-seeking and non-degree-seeking).

** Note the axis scale on the previous chart makes the recent enrollment drop look less steep and the prior (IPEDS) data only includes 2015 enrollment, not 2016.

Waffle chart in Excel

Below are two ways to display the same data.

The first is a trusty old pie chart.

The second is its cooler pastry cousin, the waffle chart.

These data are from IPEDS, just a quick count of public college undergraduates by their college’s selectivity.

We can see in both charts the majority enroll in open-access colleges. But the waffle is easier on the eye, I think.

Turns out, the choice between pies and waffles can be contentious.

If you’re like me, you probably want to use a waffle chart to display your data sometime.

You can make one of these in R using the “waffle” command…if you are familiar with that program. I, sheepishly, am not.

For those of us still in the old school, I made this waffle chart in Excel. Please feel free to use in your own work, but keep in mind this will only work for up to four slices of pie.

Download this Excel file: waffle template

Step one: Enter up to four category names in cells B3:B6

Step two: Enter percentages (high to low) in cells C3:C6. The template is just filling these with random numbers as a placeholder.

Step three: Choose which chart you like better, the square waffle or the rectangle waffle. Both charts link to the same data.

Note: You’ll see two other tabs in this file, “chart 1” and “chart 2,” which are the underlying formulas driving the chart. Just pretend they’re not there.

I created and modified this based on a helpful guide found here. If you end up using this file, please let me know if you detect any bugs. I think I’ve worked them out, but don’t hesitate to contact me if you see any or have suggestions.

Why we need comparison groups in PBF research

Tennessee began implementing performance-based funding in 2011 as part of its statewide effort to improve college completions. The figure below uses IPEDS completion data plotting the average number of bachelor’s degrees produced by each of the state’s 9 universities over time.

One could evaluate the policy by comparing outcomes in the pre-treatment years against those in the post-treatment years. Doing so would result in a statistically significant “impact,” where 320 more students graduated after the policy was in effect.

	Pre	Post	Difference
Tennessee	1940	2260	320

This Interrupted Times Series approach was recently used in a report concluding that 380 more students (Table 24) graduated because of the policy. My IPEDS estimate and the one produced by the evaluation firm use different data, but are in the same ballpark.

Anyway, simply showing a slope changed after a certain point in time is not strong enough evidence to make causal claims. In very limited conditions can one make causal claims with this approach. But this is uncommon, making interrupted time series a pretty uncommon technique to see in policy research.

A better approach would add a comparison group (a la Comparative Interrupted Time Series or its extensions). If one were to do that, then they would compare trends in Tennessee to other universities in the U.S. The graph below does that just for illustrative purposes:

By adding a comparison group, we can see that the gains experienced in Tennessee were pretty much on par with trends occurring elsewhere in the U.S.:

	Pre	Post	Difference
Tennessee	1940	2260	320
Other states	1612	1850	238
Difference	328	411	83

The difference-in-differences estimate, which is a more accurate measure of the policy’s impact, is 83. And if we run all this through a regression model, we can see if 83 is a significant difference between these groups.

It is not:

Using this more appropriate design would likely yield smaller impacts than those reported in the recent evaluation. And these small impacts likely wouldn’t be distinguishable from zero.

I wanted to share this brief back of the envelope illustration for two reasons. First, I am working on a project examining Tennessee’s PBF policy and the only “impact” we are seeing is in the community college sector (more certificates). We are not finding the same results in associate’s degree or bachelor’s degree production. Second, it gives me an opportunity to explain why research design matters in policy analysis. I don’t pretend to be a methodologist or economist; I am an applied education researcher trying my best to keep up with social science standards. Hopefully this quick post illustrates why that’s important.

June 30 FAFSA report

Below is a summary of high school FAFSA filing up to June 30 for the current and prior filing cycles.

June 30, 2016: 1,949,067
June 30, 2017: 2,128,524

This is a 9% boost in filing, or 179,457 more filers than last year!

You can download the raw data here or below.

We haven’t yet taken a close look at which high schools have shown the most growth, and I won’t pretend to know what these schools did to boost completions, but below is a quick look at the Top 20 in terms of largest raw number increase in FAFSAs. Way to go, Northside High School in Houston, TX, which saw the biggest jump in completions – going from 25 to 457!

	State	School Name	City	June 30 completions (16-17)	June 30 completions (17-18)	Change	Percent Change
1	TX	NORTHSIDE HIGH SCHOOL	HOUSTON	25	457	432	1728%
2	PA	PENN FOSTER HS	SCRANTON	911	1213	302	33%
3	FL	CYPRESS CREEK HIGH	ORLANDO	295	499	204	69%
4	IL	LINCOLN-WAY EAST HIGH SCHOOL	FRANKFORT	327	529	202	62%
5	FL	TIMBER CREEK HIGH	ORLANDO	396	572	176	44%
6	TX	ALLEN H S	ALLEN	674	842	168	25%
7	NC	ROLESVILLE HIGH	ROLESVILLE	95	258	163	172%
8	FL	WILLIAM R BOONE HIGH	ORLANDO	304	467	163	54%
9	CA	WARREN HIGH	DOWNEY	556	710	154	28%
10	CA	RANCHO VERDE HIGH	MORENO VALLEY	465	614	149	32%
11	IL	JONES COLLEGE PREP HIGH SCHOOL	CHICAGO	226	374	148	65%
12	FL	OLYMPIA HIGH	ORLANDO	318	460	142	45%
13	FL	FREEDOM HIGH	ORLANDO	374	515	141	38%
14	NY	NEW UTRECHT HIGH SCHOOL	BROOKLYN	372	511	139	37%
15	NY	BRENTWOOD HIGH SCHOOL	BRENTWOOD	491	630	139	28%
16	TX	THE WOODLANDS H S	THE WOODLANDS	419	555	136	32%
17	PA	PHILADELPHIA PERFORMING ARTS CS	PHILADELPHIA	7	142	135	1929%
18	TX	LOS FRESNOS H S	LOS FRESNOS	341	476	135	40%
19	UT	COPPER HILLS HIGH	WEST JORDAN	256	390	134	52%
20	CA	VALENCIA HIGH	VALENCIA	318	449	131	41%

And in Wisconsin, here’s a list of the Top 20 schools in terms of raw growth in completions:

	State	School Name	City	June 30 completions (16-17)	June 30 completions (17-18)	Change	Percent Change
1	WI	KING INTERNATIONAL	MILWAUKEE	201	278	77	38%
2	WI	BADGER HIGH	LAKE GENEVA	148	217	69	47%
3	WI	OCONOMOWOC HIGH	OCONOMOWOC	194	262	68	35%
4	WI	SUN PRAIRIE HIGH	SUN PRAIRIE	238	306	68	29%
5	WI	CASE HIGH	RACINE	160	227	67	42%
6	WI	EAST HIGH	APPLETON	170	235	65	38%
7	WI	REAGAN COLLEGE PREPARATORY HIGH	MILWAUKEE	206	269	63	31%
8	WI	RIVERSIDE HIGH	MILWAUKEE	183	243	60	33%
9	WI	MIDDLETON HIGH	MIDDLETON	239	296	57	24%
10	WI	DE PERE HIGH	DE PERE	180	235	55	31%
11	WI	BAY PORT HIGH	GREEN BAY	232	281	49	21%
12	WI	FRANKLIN HIGH	FRANKLIN	229	277	48	21%
13	WI	NORTH HIGH	WAUKESHA	119	166	47	39%
14	WI	HAMILTON HIGH	MILWAUKEE	128	174	46	36%
15	WI	EAST HIGH	MADISON	173	218	45	26%
16	WI	CENTRAL HIGH	SALEM	134	178	44	33%
17	WI	CENTRAL HIGH	LA CROSSE	128	171	43	34%
18	WI	MUSKEGO HIGH	MUSKEGO	220	263	43	20%
19	WI	WAUNAKEE HIGH	WAUNAKEE	154	193	39	25%
20	WI	WEST HIGH	WAUKESHA	149	188	39	26%

We will continue to analyze this data and plan to merge with other data sources to gain a better understanding of the variation that exists in filing rates.

We want to be sure to make this data available along the way, so please feel free to download and use the following high school and state-level data comparing the two cycles: FAFSA completions to June 30.xlsx

I wish I had time and resources to make this data more user-friendly and to share more widely. But until then, hopefully this good old fashioned Excel file is of use!

More College Scorecard code for Stata

This post provides a new way to import and manage College Scorecard data in Stata.

Alan Riley, Stata’s VP for software development, created the following code that creates the same panel as described in my previous post.

But this one gets the job done in about 15 minutes!

Alan reached out after reading my previous post and seeing some of our Twitter conversation. He didn’t like hearing how long it took me (and others) to run this. So, Alan took it as a challenge to figure out a better way.

And that he did!

I asked if I could share this code here on my blog and he eagerly agreed. He offered one comment for users:

Just one caveat:

I feel that I just made some quick tweaks to the code, and there are probably not only more optimizations that could be found, but it could also be made more robust to potential problems with more use of Stata’s -confirm- and -assert- commands in various places to ensure everything is as expected.

Nick Cox (yes, the same Nick Cox from the Stata Forum) also reached out with some suggestions on the -destring- command I was previously using. I was destringing a lot of unnecessary items, which Alan fixed and greatly speeds up the process. Kevin Fosnacht offered all the labeling code in the previous post. And Daniel Klasik noted some quirks between Mac and PC users, where the “unitid” variable may create an error for Mac users. Thanks, guys!

And here is a .do file where I copied and pasted Alan’s original code (updated 11/14/2017): https://uwmadison.box.com/s/wo6dtfp141x103cen3uq2zr8s09av610

Nick Hillman

Associate Professor, UW-Madison