Minister’s call on evaluation: Let’s see the evidence

Posted on 05 Mar 2024

By Matthew Schulz, journalist, SmartyGrants

As the anniversary of the federal government’s push to increase its commitment to good evaluation nears, the minister in charge of its implementation has defended a focus on “flagship evaluation”.

The Australian Centre for Evaluation (ACE) was formed with $10 million in funding over four years in the 2023 federal budget and now sits within the Treasury department.

At the time, the Assistant Minister for Competition, Charities and Treasury, Dr Andrew Leigh, said the purpose of the ACE was to “determine what programs work, and for whom”.

He said the work would involve using randomised controlled trials (RCTs) and “complementary” evaluation techniques to ensure taxpayers’ money was spent wisely.

While many grantmakers and those in the social sector will be familiar with a variety of evaluation methodologies, the minister has unapologetically kept his eye firmly on the higher-cost and more rigorous forms of evaluation.

Dr Leigh, a former professor of economics at ANU, has a close interest in the promise of randomised controlled trials, having written a book, Randomistas, on the subject.

Randomised controlled trials (RCTs) are a method of scientifically testing the effectiveness of an intervention and are often used in medical research, education and social sciences.

While time consuming and resource intensive, RCTs are widely considered to be the “gold standard” of evaluation, especially for clinical and behavioural research, partly because the focus on avoiding bias is aimed at establishing a strong connection between cause and effect.

According to a Deakin University summary of the method:

Researchers choose a sample of the population to participate in the trial
Participants are randomly allocated to either the experimental group or the control group to avoid bias and reduce the impact of variables outside the researchers’ control
Researchers control how participants are exposed to the intervention. In contrast, in “observational” studies, researchers do not have this level of control.
Only the experimental group is exposed to the intervention, while the control group is not. Researchers then compare the outcomes in the two groups.
Randomised controlled trials are often structured so that participants (in blind RCTs) and in some cases the researchers (in double-blind RCTs) do not know which participants are in which groups, to again reduce the potential for bias to influence a trial.

Recently, Dr Leign spruiked the benefits of RCTs in an opinion piece for the Canberra Times, describing a randomised controlled trial that tested the effectiveness of sending text messages to patients encouraging them not to skip hospital appointments and explaining the consequences if they did.

By sending messages telling patients about the impact on the hospital budget and on other patients if they didn’t show up, the hospital slashed no-shows by 19%.

Dr Leigh said the ACE had entered into a partnership with the Employment and Workplace Relations department to conduct a series of trials to inform a revamp of that department.

The five RCTs will examine the effect of modifications to online employment services, including communication, support, and client tools.

The government was also exploring partnerships with other agencies, Dr Leigh said.

Asked what the ACE could practically achieve with its limited budget, Dr Leigh said it would be leading by example.

“We're focusing on flagship evaluations, aiming not only to talk about evaluation, but to do evaluation. Too much of the work in the past has been PowerPoints, rather than actually running evaluations, and we know that it's important to show that high quality, rigorous evaluations can be done within a modest budget in the Australian context.”

Quizzed about the focus on randomised controlled trials at the expense of other methodologies, Dr Leigh said he was focused on lifting standards.

“We're aiming to raise the quality of evaluation, and so we're unapologetic about looking to get more randomised trials done. There's a good reason why you can't get a drug on the pharmaceutical register without having a double-blind randomised trial, because it's regarded as the best way of getting a clear comparison group.

“What you’re trying to do in any evaluation is to have a credible counterfactual, and in a before-and-after study: that's assuming that the people would have been the same after as they were before. If you're doing a pure pilot [study], then sometimes you're not even thinking about a credible counterfactual, so randomisation is a powerful tool.”

While he accepted that not every evaluation could be randomised, “we’ll be looking to use it where we can”.

Asked whether grantmakers and other evaluators should be looking away from the kinds of measurement tools they were currently using, Dr Leigh suggested practitioners should at least consider RCTs.

“What we need is a mix of methodologies, but I think it's always worth asking the question, can we do a randomised trial, and if not, why not?

“There will certainly be contexts in which that isn't feasible, and in which we'll look at other evaluation techniques. But randomised trials have been vanishingly rare in the Australian context, which contrasts with other advanced countries and particularly developing countries, where they've been extremely common.”

He said that the Perry Preschool Project, for instance, an RCT conducted in the 1960s, was still a frequently cited intervention, despite being more than 60 years old.

Dr Leigh was noncommittal about adopting the SmartyGrants recommendation that 10% of a grant should be spent on evaluation, suggesting there were ways to reduce costs.

“Sometimes you’ll want to have a benchmark, but sometimes it's about trying to look at economical ways of carrying out the evaluation. One of the things we're really focused on is making sure we get better access to administrative data and then we can use that for good evaluation. If you don't have to run a survey afterwards, then the cost of the evaluation comes way down. And we've seen examples in which you're able to do rigorous evaluations for very modest costs.

“Just to give you one example, when I was writing a book a while back on inequality, my publisher said it should be called Fair Enough. My Mum said it should be called Battlers and Billionaires. We ran a set of Google ads randomly varying the title. Within a few hours and for less than $100, we had definitive evidence that my Mum's title was more popular than my publisher’s title. These sorts of evaluations are very easily done and don't need to cost a bomb.”

Dr Leigh agreed that training was critical to enable organisations, funders and bureaucrats to raise the standard of evaluation, but he said the experiments themselves would help that process.

“We want to make sure that the training is up to scratch. By building a base of evaluations, we're going to have good examples that people can look to in their own areas.

“This is commonplace within a business context. It's been said that every pixel on the Amazon homepage has had to justify its existence through a randomised trial, and that every time you use Google, you're participating in multiple randomised experiments. These firms have oodles of data, and yet they're still doing randomised trials, because they want to sort out the causal effects of their programs.

“So if that's good enough for some of the savviest companies in the world, it's a technique that we need to be looking at in government and also in the not-for-profit sector.”

Dr Leigh avoided buying into suggestions that evaluations would be sacrificed for political expediency and by those seeking positive headlines or leaning toward their favoured projects.

“I'm confident that that the members of the Albanese government want to make a difference,” he responded.

He stressed that he was a member of a government that realised that “ensuring we're having a positive impact comes down to being able to measure it. ‘What gets measured gets managed’ as the old saying goes.”

“If we're able to rigorously measure causal impacts, we can do a better job of working out what works.”

He said there were many examples overseas of promising-sounding policies that had had little positive impact.

“I gave an example in a talk last year of 10 randomised trials of employment programs in the United States, of which only one had a positive effect. That program had a big positive effect and needs to be scaled up, but I don't think anyone could have guessed at the outset which of the 10 would work, and that nine out of the 10 wouldn't.”

Dr Leigh also implied that good evaluation would stop governments from spending in the wrong areas, even amid political turmoil. He said health spending had shown this to be true.

“In the health context [there] is an environment in which there is a strong commitment to evidence, and towards finding out what works. So when covid hit, we didn't immediately say, well, what does our gut tell us? Instead, we said, ‘let's have a look at treatments coming out of the lab. That looks promising. Let's run them through randomised trials. And then let's scale up the one that has the biggest impact.’

Microscope — Randomised controlled trials are often associated with medical research.

“That produced surprising results. A UQ [University of Queensland] vaccine that looked promising turned out not to not to work as intended, but the Pfizer and Moderna treatments worked very well. Hydroxychloroquine, which looked good in low-quality evaluations, turned out not to be effective when subjected to a randomised trial, and all of that process was done within a highly political and contested environment because the medical system has a strong focus on measurements and in determining what works.”

“Ultimately, I'm an Enlightenment guy. I believe that evidence doesn't always win, but in the long run it tends to prevail over bluster and bluff.”

Speaking at the Social Impact Measurement Network Australia awards last year, Dr Leigh said measuring social impact was “a critical component to achieving strong outcomes for Australian communities.”

“The Australian government is placing a greater focus on outcomes, partnerships, innovation and investment to bring about significant change in the way we deliver services and achieve outcomes for Australians who are experiencing disadvantage,” he told attendees.

“We're moving away from a top-down approach and embedding social impact and partnerships with communities at the heart of our investment decisions.”

Treasury last year released its Measuring What Matters framework, aimed at tracking five wellbeing themes and at guiding government policy.

SmartyGrants maintains a close interest in evaluation, especially of grants, and the platform's pioneering Outcomes Engine has already been adopted by more than 50 Australian grantmakers.

SmartyGrants recently lodged a detailed submission to the Department of Social Security’s community sector issues paper, calling on government to:

adopt clear definitions of terms such as “output” and “outcomes”
invest 10% of grant funds in the evaluation of results
treat grant acquittals “like gold” for the power of their insights.

SmartyGrants chief impact officer Jen Riley said SmartyGrants and its Outcomes Engine were “deliberately agnostic when it comes to evaluation” and were flexible enough to accommodate a range of evaluation approaches, including RCT reporting, results-based accountability and a range of realist and theory-based approaches.

More information

Australian Centre for Evaluation | Commonwealth evaluation policy

SmartyGrants’ view: Government measurement moves ‘a good first step’ | Improvements are needed to government grant processes

More resources: Best practice evaluation