What Is Driving NDIS Spend Growth?

ndis

health-policy

modelling

australia

A reproducible first pass at explaining NDIS spend growth using public participant, payment, budget, utilisation and regional data.

Author

Aydin

Published

May 26, 2026

The National Disability Insurance Scheme is usually discussed through one big number: total spend. That is not wrong, but it hides several different mechanisms. Spend can grow because more people enter the scheme, because average plan budgets rise, because participants use more of their approved supports, or because the participant mix changes toward groups with higher support needs.

This post uses the short public quarterly window to tell a descriptive spend-growth story:

How quickly is observed NDIS spend growing?
Is scheme participation growing faster than the Australian population?
Is spend pressure coming more from participant growth, average budgets, or utilisation?
Which disability groups appear largest, most expensive, and fastest growing?
What public explanatory factors help describe grouped quarterly spend?

The models are descriptive, not funding advice and not a participant-level forecast. I deliberately avoid lagged payment and lagged budget features in the overall spend model: those improve fit, but they mostly tell us that last quarter predicts this quarter. The aim here is to describe observable drivers.

Data Sources

The workflow downloads source files from the NDIS participant datasets, NDIS provider datasets, and NDIS payments datasets. It focuses on plan budgets, utilisation, providers, market concentration, payments, and LGA mapping files. It also uses ABS estimated resident population from National, state and territory population, then tries to add regional context from SEIFA 2021 and population age structure. If the ABS files change shape, the NDIS-only analysis still runs and reports the missing context.

Code

clean_names <- function(x) {
  x |>
    str_to_lower() |>
    str_replace_all("&", "and") |>
    str_replace_all("[^a-z0-9]+", "_") |>
    str_replace_all("^_|_$", "")
}

clean_df <- function(x) {
  names(x) <- clean_names(names(x))
  x
}

as_number <- function(x) {
  if (is.numeric(x)) return(x)
  x_chr <- as.character(x)
  out <- parse_number(x_chr, na = c("", "NA", "n/a", "Not applicable", "No data available"))
  ifelse(str_detect(x_chr, "^\\s*<"), NA_real_, out)
}

as_rate <- function(x) {
  out <- as_number(x)
  ifelse(out > 1.5, out / 100, out)
}

safe_sum <- function(x) {
  x <- x[is.finite(x)]
  if (length(x) == 0) return(NA_real_)
  sum(x)
}

safe_mean <- function(x) {
  x <- x[is.finite(x)]
  if (length(x) == 0) return(NA_real_)
  mean(x)
}

safe_median <- function(x) {
  x <- x[is.finite(x)]
  if (length(x) == 0) return(NA_real_)
  median(x)
}

safe_mode_character <- function(x) {
  x <- as.character(x)
  x <- x[!is.na(x) & x != "" & x != "ALL"]
  if (length(x) == 0) return(NA_character_)
  names(sort(table(x), decreasing = TRUE))[[1]]
}

collapse_top_values <- function(x, n = 4) {
  x <- as.character(x)
  x <- unique(x[!is.na(x) & x != "" & x != "ALL"])
  if (length(x) == 0) return(NA_character_)
  paste(head(sort(x), n), collapse = ", ")
}

safe_weighted_mean <- function(x, w) {
  x <- as.numeric(x)
  w <- as.numeric(w)
  keep <- is.finite(x) & is.finite(w) & w > 0
  if (any(keep)) return(weighted.mean(x[keep], w[keep]))
  safe_mean(x)
}

has_enough_data <- function(data, cols, min_rows = 2) {
  if (nrow(data) < min_rows) return(FALSE)
  all(map_lgl(cols, \(col) col %in% names(data) && sum(is.finite(data[[col]])) >= min_rows))
}

format_optional_number <- function(x, accuracy = 0.1, suffix = "") {
  ifelse(is.na(x) | !is.finite(x), "Not joined", paste0(number(x, accuracy = accuracy), suffix))
}

format_optional_percent <- function(x, accuracy = 0.1) {
  ifelse(is.na(x) | !is.finite(x), "Not joined", percent(x, accuracy = accuracy))
}

index_to_first <- function(x) {
  bases <- x[is.finite(x) & x > 0]
  base <- if (length(bases) > 0) bases[[1]] else NA_real_
  if (length(base) == 0 || is.na(base)) return(rep(NA_real_, length(x)))
  x / base - 1
}

pretty_feature <- function(x) {
  recode(
    x,
    quarter_index = "time",
    log_participants = "participant count",
    log_providers = "active providers",
    log_participants_per_provider = "participants per active provider",
    payment_share_top10 = "top-10 provider payment share",
    irsd_score = "SEIFA disadvantage score",
    irsd_decile = "SEIFA disadvantage decile",
    ier_score = "economic resources score",
    ier_decile = "economic resources decile",
    ieo_score = "education/occupation score",
    ieo_decile = "education/occupation decile",
    median_age = "median local age",
    pct_0_14 = "local population aged 0-14",
    pct_15_64 = "local population aged 15-64",
    pct_65_plus = "local population aged 65+",
    log_lga_population = "local population",
    remoteness_rank = "remoteness",
    remoteness_major_cities_share = "major-cities share",
    remoteness_inner_regional_share = "inner-regional share",
    remoteness_outer_regional_share = "outer-regional share",
    remoteness_remote_share = "remote share",
    remoteness_very_remote_share = "very-remote share",
    log_budget = "average support budget",
    state = "state",
    service_district = "service district",
    age_group = "age group",
    disability_group = "disability group",
    support_class = "support class",
    remoteness = "remoteness",
    .default = str_replace_all(x, "_", " ")
  )
}

top_rf_predictors <- function(model, n = 5) {
  importance(model) |>
    as.data.frame() |>
    tibble::rownames_to_column("feature") |>
    arrange(desc(IncNodePurity)) |>
    slice_head(n = n) |>
    pull(feature) |>
    pretty_feature() |>
    paste(collapse = ", ")
}

first_col <- function(data, patterns, required = TRUE) {
  hits <- names(data)[map_lgl(names(data), \(nm) any(str_detect(nm, regex(paste(patterns, collapse = "|"), ignore_case = TRUE))))]
  if (length(hits) == 0) {
    if (required) stop("Could not find column matching: ", paste(patterns, collapse = ", "))
    return(NA_character_)
  }
  hits[[1]]
}

sample_rows <- function(data, n = 8000) {
  data |> slice_sample(n = min(nrow(data), n))
}

normalise_service_district <- function(x) {
  x |>
    as.character() |>
    str_replace("^[A-Z]{2,3}~", "") |>
    str_squish()
}

read_page <- function(url) {
  paste(readLines(url, warn = FALSE, encoding = "UTF-8"), collapse = "\n")
}

extract_links <- function(url) {
  html <- read_page(url)
  base_url <- str_replace(url, "^(https?://[^/]+).*$", "\\1")
  matches <- gregexpr("<a[^>]+href=[\"']([^\"']+)[\"'][^>]*>(.*?)</a>", html, perl = TRUE)
  pieces <- regmatches(html, matches)[[1]]
  if (length(pieces) == 0) return(tibble(label = character(), url = character()))
  tibble(raw = pieces) |>
    mutate(
      href = str_match(raw, "href=[\"']([^\"']+)[\"']")[, 2],
      label = raw |>
        str_replace_all("<[^>]+>", " ") |>
        str_squish(),
      url = case_when(
        str_detect(href, "^https?://") ~ href,
        str_detect(href, "^/") ~ paste0(base_url, href),
        TRUE ~ paste0(str_remove(url, "/[^/]*$"), "/", href)
      )
    ) |>
    select(label, url)
}

quarter_from_label <- function(label) {
  month <- str_match(label, "(March|June|September|December)")[, 2]
  year <- str_match(label, "(20[0-9]{2})")[, 2]
  tibble(quarter_label = paste(month, year)) |>
    left_join(analysis_quarters, by = "quarter_label")
}

download_one <- function(url, filename) {
  path <- file.path(raw_dir, filename)
  if (!file.exists(path) || (str_detect(filename, "\\.xlsx?$") && !is_xlsx_file(path))) {
    download.file(url, path, mode = "wb", quiet = TRUE)
  }
  path
}

download_quarterly <- function(page_url, label_regex, prefix) {
  links <- extract_links(page_url) |>
    filter(str_detect(label, regex(label_regex, ignore_case = TRUE))) |>
    mutate(row_id = row_number()) |>
    bind_cols(quarter_from_label(extract_links(page_url) |>
      filter(str_detect(label, regex(label_regex, ignore_case = TRUE))) |>
      pull(label))) |>
    select(-row_id) |>
    filter(!is.na(quarter_date), quarter_label %in% analysis_quarters$quarter_label) |>
    distinct(quarter_label, .keep_all = TRUE) |>
    arrange(quarter_date)

  missing <- setdiff(analysis_quarters$quarter_label, links$quarter_label)
  if (length(missing) > 0) {
    message("Missing visible links for ", prefix, ": ", paste(missing, collapse = ", "))
  }

  links |>
    mutate(
      path = map2_chr(url, quarter_label, \(u, q) {
        download_one(u, paste0(prefix, "_", str_replace_all(str_to_lower(q), " ", "_"), ".csv"))
      })
    )
}

is_xlsx_file <- function(path) {
  sig <- readBin(path, what = "raw", n = 4)
  identical(as.integer(sig[1:2]), as.integer(charToRaw("PK")))
}

read_public_table <- function(path) {
  out <- if (is_xlsx_file(path) || str_detect(path, regex("\\.xls$|\\.xlsx$", ignore_case = TRUE))) {
    read_excel(path) |> clean_df()
  } else {
    read_csv(path, show_col_types = FALSE) |> clean_df()
  }
  out |> mutate(across(everything(), as.character))
}

read_quarterly_csvs <- function(links) {
  links |>
    mutate(data = map(path, read_public_table)) |>
    select(quarter_label, quarter_date, quarter_index, data) |>
    unnest(data)
}

download_named <- function(page_url, label_regex, filename) {
  link <- extract_links(page_url) |>
    filter(str_detect(label, regex(label_regex, ignore_case = TRUE))) |>
    slice(1)
  if (nrow(link) == 0) return(NA_character_)
  download_one(link$url[[1]], filename)
}

extension_from_url <- function(url, default = ".csv") {
  ext <- str_match(str_to_lower(url), "\\.(csv|xlsx|xls|zip)(\\?|$)")[, 2]
  ifelse(is.na(ext), default, paste0(".", ext))
}

download_matching <- function(page_url, label_regex, prefix, limit = Inf) {
  links <- extract_links(page_url) |>
    filter(str_detect(label, regex(label_regex, ignore_case = TRUE))) |>
    distinct(label, url) |>
    slice_head(n = limit)

  if (nrow(links) == 0) {
    return(tibble(label = character(), url = character(), path = character()))
  }

  links |>
    mutate(
      path = pmap_chr(list(url, label, row_number()), \(u, lab, i) {
        safe_label <- lab |>
          str_to_lower() |>
          str_replace_all("[^a-z0-9]+", "_") |>
          str_replace_all("^_|_$", "") |>
          str_trunc(70, ellipsis = "")
        download_one(u, paste0(prefix, "_", str_pad(i, 2, pad = "0"), "_", safe_label, extension_from_url(u)))
      })
    )
}

Code

budget_links <- download_quarterly(
  page_urls$participant,
  "^Participant numbers and plan budgets data",
  "participant_plan_budgets"
)

utilisation_links <- download_quarterly(
  page_urls$participant,
  "^Utilisation of Plan budgets data",
  "plan_utilisation"
)

provider_links <- download_quarterly(
  page_urls$provider,
  "^Active providers data",
  "active_providers"
)

budget_raw <- read_quarterly_csvs(budget_links)
utilisation_raw <- read_quarterly_csvs(utilisation_links)
provider_raw <- read_quarterly_csvs(provider_links)

lga_path <- download_named(page_urls$participant, "^Participants by LGA data", "participants_by_lga.csv")
service_lga_path <- download_named(page_urls$participant, "^Service District to LGA 2021 data", "service_district_to_lga_2021.csv")
market_path <- download_named(page_urls$provider, "^Market Concentration data", "market_concentration.csv")
payments_links <- download_quarterly(page_urls$payments, "^Payments data", "payments")
payments_raw <- read_quarterly_csvs(payments_links)

baseline_outcomes_path <- download_named(page_urls$participant, "^Baseline Outcomes data", "baseline_outcomes.csv")
longitudinal_outcomes_path <- download_named(page_urls$participant, "^Longitudinal Outcomes data", "longitudinal_outcomes.csv")
regional_outcome_links <- download_matching(page_urls$outcomes, "by Service District Accessible", "regional_outcomes_service_district")

Code

standardise_budget <- function(data) {
  state <- first_col(data, c("^state", "state_territory"), required = FALSE)
  if (is.na(state)) state <- first_col(data, c("state_cd", "statecd"), required = FALSE)
  service <- first_col(data, c("service_district", "srvc_dstrct_nm", "srvcdstrctnm"), required = FALSE)
  age <- first_col(data, c("^age", "age_group", "age_band", "age_bnd", "agebnd"), required = FALSE)
  disability <- first_col(data, c("disability", "dsblty_grp_nm", "dsbltygrpnm"), required = FALSE)
  support <- first_col(data, c("support_class", "support_category", "supp_class", "suppclass", "support"), required = FALSE)
  participants <- first_col(data, c("active.*participant", "actv_prtcpnt", "actvprtcpnt", "number.*participant", "participant.*count", "participants"), required = FALSE)
  budget <- first_col(data, c("average.*support", "average.*budget", "average.*annual", "avg.*annual", "avg_anlsd_cmtd_supp_bdgt", "avganlsdcmtdsuppbdgt", "committed"), required = TRUE)

  data |>
    transmute(
      quarter_label, quarter_date, quarter_index,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      service_district = if (!is.na(service)) normalise_service_district(.data[[service]]) else NA_character_,
      age_group = if (!is.na(age)) .data[[age]] else NA_character_,
      disability_group = if (!is.na(disability)) .data[[disability]] else NA_character_,
      support_class = if (!is.na(support)) .data[[support]] else NA_character_,
      participant_count = if (!is.na(participants)) as_number(.data[[participants]]) else NA_real_,
      avg_support_budget = as_number(.data[[budget]])
    ) |>
    filter(!is.na(avg_support_budget), state != "ALL", service_district != "ALL", age_group != "ALL", disability_group != "ALL", support_class != "ALL")
}

standardise_national_participants <- function(data) {
  state <- first_col(data, c("^state", "state_territory"), required = FALSE)
  if (is.na(state)) state <- first_col(data, c("state_cd", "statecd"), required = FALSE)
  service <- first_col(data, c("service_district", "srvc_dstrct_nm", "srvcdstrctnm"), required = FALSE)
  age <- first_col(data, c("^age", "age_group", "age_band", "age_bnd", "agebnd"), required = FALSE)
  disability <- first_col(data, c("disability", "dsblty_grp_nm", "dsbltygrpnm"), required = FALSE)
  support <- first_col(data, c("support_class", "support_category", "supp_class", "suppclass", "support"), required = FALSE)
  participants <- first_col(data, c("active.*participant", "actv_prtcpnt", "actvprtcpnt", "number.*participant", "participant.*count", "participants"), required = FALSE)

  if (is.na(participants)) {
    return(tibble(
      quarter_label = character(), quarter_date = as.Date(character()),
      quarter_index = numeric(), ndis_participants = numeric(),
      participant_basis = character()
    ))
  }

  data |>
    transmute(
      quarter_label, quarter_date, quarter_index,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      service_district = if (!is.na(service)) normalise_service_district(.data[[service]]) else NA_character_,
      age_group = if (!is.na(age)) .data[[age]] else NA_character_,
      disability_group = if (!is.na(disability)) .data[[disability]] else NA_character_,
      support_class = if (!is.na(support)) .data[[support]] else NA_character_,
      participant_count = as_number(.data[[participants]])
    ) |>
    filter(is.finite(participant_count), participant_count > 0) |>
    mutate(
      detail_count = rowSums(across(
        c(state, service_district, age_group, disability_group, support_class),
        \(x) !is.na(x) & x != "ALL"
      ))
    ) |>
    group_by(quarter_label, quarter_date, quarter_index) |>
    filter(detail_count == min(detail_count, na.rm = TRUE)) |>
    summarise(
      ndis_participants = sum(participant_count, na.rm = TRUE),
      detail_count = min(detail_count, na.rm = TRUE),
      participant_basis = if_else(detail_count == 0, "National aggregate row", "Least-detailed rows summed"),
      .groups = "drop"
    )
}

standardise_utilisation <- function(data) {
  state <- first_col(data, c("^state", "state_territory"), required = FALSE)
  if (is.na(state)) state <- first_col(data, c("state_cd", "statecd"), required = FALSE)
  service <- first_col(data, c("service_district", "srvc_dstrct_nm", "srvcdstrctnm"), required = FALSE)
  age <- first_col(data, c("^age", "age_group", "age_band", "age_bnd", "agebnd"), required = FALSE)
  disability <- first_col(data, c("disability", "dsblty_grp_nm", "dsbltygrpnm"), required = FALSE)
  support <- first_col(data, c("support_class", "support_category", "supp_class", "suppclass", "support"), required = FALSE)
  sil_sda <- first_col(data, c("sil_or_sda", "silorsda", "sil.*sda"), required = FALSE)
  utilisation <- first_col(data, c("utilisation", "utilization", "utlstn"), required = TRUE)

  data |>
    transmute(
      quarter_label, quarter_date, quarter_index,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      service_district = if (!is.na(service)) normalise_service_district(.data[[service]]) else NA_character_,
      age_group = if (!is.na(age)) .data[[age]] else NA_character_,
      disability_group = if (!is.na(disability)) .data[[disability]] else NA_character_,
      support_class = if (!is.na(support)) .data[[support]] else NA_character_,
      sil_or_sda = if (!is.na(sil_sda)) .data[[sil_sda]] else NA_character_,
      utilisation_rate = as_rate(.data[[utilisation]])
    ) |>
    filter(!is.na(utilisation_rate), state != "ALL", service_district != "ALL", age_group != "ALL", disability_group != "ALL", support_class != "ALL")
}

standardise_provider <- function(data) {
  state <- first_col(data, c("^state", "state_territory"), required = FALSE)
  if (is.na(state)) state <- first_col(data, c("state_cd", "statecd"), required = FALSE)
  age <- first_col(data, c("^age", "age_group", "age_band", "age_bnd", "agebnd"), required = FALSE)
  disability <- first_col(data, c("disability", "dsblty_grp_nm", "dsbltygrpnm"), required = FALSE)
  support <- first_col(data, c("support_class", "support_category", "supp_class", "suppclass", "support"), required = FALSE)
  providers <- first_col(data, c("active.*provider", "prvdr_cnt", "prvdrcnt", "number.*provider", "provider.*count", "providers"), required = TRUE)

  data |>
    transmute(
      quarter_label, quarter_date, quarter_index,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      age_group = if (!is.na(age)) .data[[age]] else NA_character_,
      disability_group = if (!is.na(disability)) .data[[disability]] else NA_character_,
      support_class = if (!is.na(support)) .data[[support]] else NA_character_,
      active_providers = as_number(.data[[providers]])
    ) |>
    filter(state != "ALL", age_group != "ALL", disability_group != "ALL", support_class != "ALL") |>
    group_by(quarter_label, quarter_date, quarter_index, state, age_group, disability_group, support_class) |>
    summarise(active_providers = max(active_providers, na.rm = TRUE), .groups = "drop")
}

standardise_payments <- function(data) {
  support_class <- first_col(data, c("^supp_class", "^suppclass"), required = FALSE)
  support_category <- first_col(data, c("supp_cat_nm", "suppcatnm", "support_category"), required = FALSE)
  support_item_number <- first_col(data, c("supp_item_nmbr", "suppitemnmbr", "item_number"), required = FALSE)
  support_item_desc <- first_col(data, c("supp_item_desc", "suppitemdesc", "item_desc"), required = FALSE)
  state <- first_col(data, c("rsds_in_state_cd", "rsdsinstatecd", "^state"), required = FALSE)
  service <- first_col(data, c("rsds_in_srvc_dstrct_nm", "rsdsinsrvcdstrctnm", "service_district"), required = FALSE)
  disability <- first_col(data, c("ndis_dsblty_grp_nm", "ndisdsbltygrpnm", "disability"), required = FALSE)
  age <- first_col(data, c("ndia_age_bnd", "ndiaagebnd", "age"), required = FALSE)
  payments <- first_col(data, c("pmt_amt", "pmtamt", "payment"), required = TRUE)
  participants <- first_col(data, c("count_of_participants", "countofparticipants", "participant"), required = FALSE)

  data |>
    transmute(
      quarter_label, quarter_date, quarter_index,
      support_class = if (!is.na(support_class)) .data[[support_class]] else NA_character_,
      support_category = if (!is.na(support_category)) .data[[support_category]] else NA_character_,
      support_item_number = if (!is.na(support_item_number)) .data[[support_item_number]] else NA_character_,
      support_item_desc = if (!is.na(support_item_desc)) .data[[support_item_desc]] else NA_character_,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      service_district = if (!is.na(service)) normalise_service_district(.data[[service]]) else NA_character_,
      disability_group = if (!is.na(disability)) .data[[disability]] else NA_character_,
      age_group = if (!is.na(age)) .data[[age]] else NA_character_,
      payment_amount = as_number(.data[[payments]]),
      payment_participants = if (!is.na(participants)) as_number(.data[[participants]]) else NA_real_
    ) |>
    mutate(payment_per_participant = payment_amount / payment_participants)
}

standardise_market <- function(path) {
  if (is.na(path) || !file.exists(path)) {
    return(tibble(
      market_quarter_date = as.Date(character()),
      state = character(),
      service_district = character(),
      support_class = character(),
      payment_share_top10 = numeric(),
      payment_band = character()
    ))
  }

  data <- read_public_table(path)
  report <- first_col(data, c("rprt_dt", "rprtdt"), required = FALSE)
  state <- first_col(data, c("^state", "state_cd", "statecd"), required = FALSE)
  service <- first_col(data, c("service_district", "srvc_dstrct_nm", "srvcdstrctnm"), required = FALSE)
  support <- first_col(data, c("support_class", "supp_class", "suppclass"), required = FALSE)
  share <- first_col(data, c("pymnt_share_of_top10", "pymntshareoftop10", "share.*top10"), required = FALSE)
  band <- first_col(data, c("pymnt_bnd", "pymntbnd", "payment.*band"), required = FALSE)

  data |>
    transmute(
      market_report = if (!is.na(report)) .data[[report]] else NA_character_,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      service_district = if (!is.na(service)) normalise_service_district(.data[[service]]) else NA_character_,
      support_class = if (!is.na(support)) .data[[support]] else NA_character_,
      payment_share_top10 = if (!is.na(share)) as_rate(.data[[share]]) else NA_real_,
      payment_band = if (!is.na(band)) .data[[band]] else NA_character_
    ) |>
    mutate(
      market_quarter_date = dmy(str_replace_all(market_report, "([0-9]{2})([A-Z]{3})([0-9]{4})", "\\1-\\2-\\3"), quiet = TRUE)
    ) |>
    filter(state != "ALL", service_district != "ALL", support_class != "ALL")
}

read_optional_table <- function(path) {
  if (is.na(path) || !file.exists(path)) return(tibble())
  tryCatch(read_public_table(path), error = \(e) tibble())
}

standardise_outcomes <- function(path, source_label) {
  data <- read_optional_table(path)
  if (nrow(data) == 0) {
    return(tibble(
      source = character(), state = character(), respondent_group = character(),
      domain = character(), indicator = character(), outcome_value = numeric(), sample_n = numeric()
    ))
  }

  state <- first_col(data, c("^state", "state_territory", "state_cd", "statecd"), required = FALSE)
  respondent <- first_col(data, c("questionnaire", "respondent", "cohort", "participant", "family", "carer"), required = FALSE)
  domain <- first_col(data, c("domain", "area", "outcome_area"), required = FALSE)
  indicator <- first_col(data, c("indicator_description", "indicator.*description", "indicator", "measure", "question", "outcome"), required = FALSE)
  value <- first_col(data, c("percent", "percentage", "proportion", "value", "result", "rate"), required = FALSE)
  sample <- first_col(data, c("sample", "denominator", "respondents", "^n$"), required = FALSE)

  if (is.na(value)) {
    numeric_candidates <- names(data)[map_lgl(data, \(x) any(!is.na(as_number(x))))]
    value <- numeric_candidates[[min(length(numeric_candidates), 1)]]
  }

  data |>
    transmute(
      source = source_label,
      state = if (!is.na(state)) .data[[state]] else NA_character_,
      respondent_group = if (!is.na(respondent)) .data[[respondent]] else NA_character_,
      domain = if (!is.na(domain)) .data[[domain]] else NA_character_,
      indicator = if (!is.na(indicator)) .data[[indicator]] else NA_character_,
      outcome_value = if (!is.na(value)) as_rate(.data[[value]]) else NA_real_,
      sample_n = if (!is.na(sample)) as_number(.data[[sample]]) else NA_real_
    ) |>
    mutate(
      domain = case_when(
        !is.na(domain) ~ domain,
        str_detect(str_to_lower(indicator), "choice|control|decide|say|want") ~ "Choice and control",
        str_detect(str_to_lower(indicator), "health|wellbeing|well-being") ~ "Health and wellbeing",
        str_detect(str_to_lower(indicator), "work|job|employment|learning|school|education") ~ "Work and learning",
        str_detect(str_to_lower(indicator), "community|social|relationship|friend|family") ~ "Community and relationships",
        str_detect(str_to_lower(indicator), "daily|self-care|home|living|independent") ~ "Daily living",
        TRUE ~ "Other outcomes"
      )
    ) |>
    filter(!is.na(outcome_value), outcome_value >= 0, outcome_value <= 1)
}

read_excel_all_sheets <- function(path) {
  tryCatch({
    sheets <- excel_sheets(path)
    map_dfr(sheets, \(sheet) {
      read_excel(path, sheet = sheet) |>
        clean_df() |>
        mutate(across(everything(), as.character), sheet = sheet, .before = 1)
    })
  }, error = \(e) tibble())
}

standardise_regional_outcomes_file <- function(path, label) {
  data <- read_excel_all_sheets(path)
  if (nrow(data) == 0) {
    return(tibble(
      source_file = character(), state = character(), service_district = character(),
      respondent_group = character(), domain = character(), indicator = character(),
      outcome_value = numeric(), sample_n = numeric()
    ))
  }

  service <- first_col(data, c("service_district", "srvc_dstrct", "district"), required = FALSE)
  state <- first_col(data, c("^state", "state_territory", "state_cd"), required = FALSE)
  respondent <- first_col(data, c("questionnaire", "respondent", "participant", "family", "carer"), required = FALSE)
  domain <- first_col(data, c("domain", "area", "outcome_area"), required = FALSE)
  indicator <- first_col(data, c("indicator", "measure", "question", "outcome"), required = FALSE)
  sample <- first_col(data, c("sample", "denominator", "respondents", "^n$"), required = FALSE)

  value_cols <- names(data)[map_lgl(data, \(x) any(!is.na(as_number(x))))] |>
    setdiff(c(sample))

  if (is.na(service) || length(value_cols) == 0) {
    return(tibble(
      source_file = character(), state = character(), service_district = character(),
      respondent_group = character(), domain = character(), indicator = character(),
      outcome_value = numeric(), sample_n = numeric()
    ))
  }

  data |>
    transmute(
      source_file = label,
      state = if (!is.na(state)) .data[[state]] else str_match(label, "Dashboard ([A-Z]{2,3}) by")[, 2],
      service_district = normalise_service_district(.data[[service]]),
      respondent_group = if (!is.na(respondent)) .data[[respondent]] else sheet,
      domain = if (!is.na(domain)) .data[[domain]] else sheet,
      indicator = if (!is.na(indicator)) .data[[indicator]] else NA_character_,
      sample_n = if (!is.na(sample)) as_number(.data[[sample]]) else NA_real_,
      across(all_of(value_cols), as_rate)
    ) |>
    pivot_longer(all_of(value_cols), names_to = "measure", values_to = "outcome_value") |>
    mutate(indicator = coalesce(indicator, measure)) |>
    select(-measure) |>
    filter(!is.na(service_district), service_district != "ALL", !is.na(outcome_value), outcome_value >= 0, outcome_value <= 1)
}

budget <- standardise_budget(budget_raw)
national_participants <- standardise_national_participants(budget_raw)
utilisation <- standardise_utilisation(utilisation_raw)
providers <- standardise_provider(provider_raw)
payments <- standardise_payments(payments_raw)
market_concentration <- standardise_market(market_path)
baseline_outcomes <- standardise_outcomes(baseline_outcomes_path, "Baseline outcomes")
longitudinal_outcomes <- standardise_outcomes(longitudinal_outcomes_path, "Longitudinal outcomes")
national_outcomes <- bind_rows(baseline_outcomes, longitudinal_outcomes)
regional_outcomes <- pmap_dfr(
  list(regional_outcome_links$path, regional_outcome_links$label),
  standardise_regional_outcomes_file
)

latest_market <- market_concentration |>
  filter(market_quarter_date == max(market_quarter_date, na.rm = TRUE)) |>
  select(state, service_district, support_class, payment_share_top10, payment_band)

utilisation_for_model <- utilisation |>
  filter(is.na(sil_or_sda) | sil_or_sda == "ALL")

model_base <- budget |>
  left_join(
    utilisation_for_model,
    by = c("quarter_label", "quarter_date", "quarter_index", "state", "service_district", "age_group", "disability_group", "support_class")
  ) |>
  left_join(
    providers,
    by = c("quarter_label", "quarter_date", "quarter_index", "state", "age_group", "disability_group", "support_class")
  ) |>
  left_join(
    latest_market,
    by = c("state", "service_district", "support_class")
  )

payments_support_class <- payments |>
  filter(
    state == "ALL", service_district == "ALL", disability_group == "ALL", age_group == "ALL",
    support_class != "ALL", support_category == "ALL", support_item_number == "ALL"
  )

payments_category <- payments |>
  filter(
    state == "ALL", service_district == "ALL", disability_group == "ALL", age_group == "ALL",
    support_class != "ALL", support_category != "ALL", support_item_number == "ALL"
  )

payments_item <- payments |>
  filter(
    state == "ALL", service_district == "ALL", disability_group == "ALL", age_group == "ALL",
    support_class != "ALL", support_category != "ALL", support_item_number != "ALL"
  )

payments_grouped <- payments |>
  filter(
    state != "ALL", service_district != "ALL", disability_group != "ALL", age_group != "ALL",
    support_class != "ALL", support_category == "ALL", support_item_number == "ALL"
  ) |>
  group_by(quarter_label, quarter_date, quarter_index, state, service_district, disability_group, age_group, support_class) |>
  summarise(
    payment_amount = sum(payment_amount, na.rm = TRUE),
    payment_participants = sum(payment_participants, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(payment_per_participant = payment_amount / payment_participants)

model_base <- model_base |>
  left_join(
    payments_grouped,
    by = c("quarter_label", "quarter_date", "quarter_index", "state", "service_district", "age_group", "disability_group", "support_class")
  ) |>
  mutate(
    active_providers_per100 = active_providers / participant_count * 100,
    participants_per_provider = if_else(active_providers > 0, participant_count / active_providers, NA_real_)
  )

The current public pages may not expose every historic quarter forever. The workflow records what it found at render time:

Code

tibble(
  dataset = c("Plan budgets", "Utilisation", "Active providers", "Payments"),
  quarters_found = c(
    paste(budget_links$quarter_label, collapse = ", "),
    paste(utilisation_links$quarter_label, collapse = ", "),
    paste(provider_links$quarter_label, collapse = ", "),
    paste(payments_links$quarter_label, collapse = ", ")
  )
) |>
  kable()

dataset	quarters_found
Plan budgets	June 2024, September 2024, December 2024, March 2025, June 2025, September 2025, December 2025
Utilisation	June 2024, September 2024, December 2024, March 2025, June 2025, September 2025, December 2025
Active providers	June 2024, September 2024, December 2024, March 2025, June 2025, September 2025, December 2025
Payments	June 2024, September 2024, December 2024, March 2025, June 2025, September 2025, December 2025

Appendix: ABS regional context build

ABS Regional Context

The NDIS plan budget data is organised around NDIA service districts. To bring in regional context, the workflow uses the NDIS service-district-to-LGA mapping and then adds ABS LGA indicators where available. For this first pass, the most useful lightweight context is socioeconomic position and age structure.

Code

read_optional_csv <- function(path) {
  if (is.na(path) || !file.exists(path)) return(tibble())
  read_csv(path, show_col_types = FALSE) |> clean_df()
}

service_lga <- read_optional_csv(service_lga_path)
participants_lga <- read_optional_csv(lga_path)

service_context <- tibble(
  state = character(),
  service_district = character(),
  lga_code = character(),
  lga_name = character(),
  remoteness = character()
)

if (nrow(service_lga) > 0) {
  state_col <- first_col(service_lga, c("^state", "state_territory", "state_cd", "statecd", "rsds.*state"), required = FALSE)
  service_col <- first_col(service_lga, c("service_district", "srvc_dstrct", "srvcdstrct", "rsds.*srvc.*dstrct"), required = FALSE)
  lga_col <- first_col(service_lga, c("lga.*code", "lga_code", "lgacd"), required = FALSE)
  lga_name_col <- first_col(service_lga, c("lga.*name", "lganm", "local.*government.*area", "area.*name"), required = FALSE)
  remote_col <- first_col(service_lga, c("remoteness", "remote"), required = FALSE)

  if (!is.na(service_col) && !is.na(lga_col)) {
    service_context <- service_lga |>
      transmute(
        state = if (!is.na(state_col)) .data[[state_col]] else NA_character_,
        service_district = normalise_service_district(.data[[service_col]]),
        lga_code = str_pad(as.character(as_number(.data[[lga_col]])), width = 5, side = "left", pad = "0"),
        lga_name = if (!is.na(lga_name_col)) .data[[lga_name_col]] else NA_character_,
        remoteness = if (!is.na(remote_col)) .data[[remote_col]] else NA_character_
      ) |>
      filter(!is.na(service_district), !is.na(lga_code)) |>
      distinct()
  }
}

service_lga_participants <- tibble(
  state = character(),
  service_district = character(),
  participant_lgas = numeric(),
  participant_lga_names = character(),
  lga_participants = numeric(),
  latest_lga_report = as.Date(character()),
  remoteness = character(),
  lga_avg_support_budget = numeric()
)

if (nrow(participants_lga) > 0) {
  report_col <- first_col(participants_lga, c("report", "rprt"), required = FALSE)
  service_col <- first_col(participants_lga, c("service_district", "srvc_dstrct", "srvcdstrct", "rsds.*srvc.*dstrct"), required = FALSE)
  lga_name_col <- first_col(participants_lga, c("lga.*name", "lganm", "local.*government.*area", "area.*name"), required = FALSE)
  state_col <- first_col(participants_lga, c("^state", "state_territory", "state_cd", "statecd"), required = FALSE)
  remote_col <- first_col(participants_lga, c("remoteness", "remote"), required = FALSE)
  participants_col <- first_col(participants_lga, c("active.*participant", "actv_prtcpnt", "actvprtcpnt", "participant.*count", "participants", "prtcpnt"), required = FALSE)
  budget_col <- first_col(participants_lga, c("average.*support", "average.*budget", "avg.*annual", "avg_anlsd", "committed", "cmtd"), required = FALSE)

  if (!is.na(service_col)) {
    service_lga_participants <- participants_lga |>
      transmute(
        latest_lga_report = if (!is.na(report_col)) dmy(.data[[report_col]], quiet = TRUE) else as.Date(NA),
        state = if (!is.na(state_col)) .data[[state_col]] else NA_character_,
        service_district = normalise_service_district(.data[[service_col]]),
        lga_name = if (!is.na(lga_name_col)) .data[[lga_name_col]] else NA_character_,
        remoteness = if (!is.na(remote_col)) .data[[remote_col]] else NA_character_,
        lga_participants = if (!is.na(participants_col)) as_number(.data[[participants_col]]) else NA_real_,
        lga_avg_support_budget = if (!is.na(budget_col)) as_number(.data[[budget_col]]) else NA_real_
      ) |>
      filter(!is.na(service_district), service_district != "ALL", service_district != "Other") |>
      group_by(state, service_district) |>
      filter(is.na(latest_lga_report) | latest_lga_report == max(latest_lga_report, na.rm = TRUE)) |>
      summarise(
        participant_lgas = n_distinct(lga_name),
        participant_lga_names = collapse_top_values(lga_name),
        lga_participants = sum(lga_participants, na.rm = TRUE),
        latest_lga_report = max(latest_lga_report, na.rm = TRUE),
        remoteness = safe_mode_character(remoteness),
        lga_avg_support_budget = safe_weighted_mean(lga_avg_support_budget, lga_participants),
        .groups = "drop"
      ) |>
      mutate(
        latest_lga_report = if_else(is.infinite(latest_lga_report), as.Date(NA), latest_lga_report),
        lga_participants = if_else(is.infinite(lga_participants), NA_real_, lga_participants),
        lga_avg_support_budget = if_else(is.nan(lga_avg_support_budget), NA_real_, lga_avg_support_budget)
      )
  }
}

abs_note <- "ABS context was not joined in this render."

seifa_file <- tryCatch(
  download_named(page_urls$seifa, "Local Government Area.*Indexes.*SEIFA|LGA.*Indexes.*SEIFA", "abs_seifa_lga.xlsx"),
  error = \(e) NA_character_
)

population_file <- tryCatch(
  download_named(page_urls$population_age, "Median age, sex ratio and broad age groups, by LGA.*2021.*Final", "abs_lga_age_structure.xlsx"),
  error = \(e) NA_character_
)

abs_lga <- tibble(
  lga_code = character(),
  irsd_score = numeric(),
  irsd_decile = numeric(),
  ier_score = numeric(),
  ier_decile = numeric(),
  ieo_score = numeric(),
  ieo_decile = numeric(),
  median_age = numeric(),
  pct_0_14 = numeric(),
  pct_15_64 = numeric(),
  pct_65_plus = numeric(),
  lga_population = numeric()
)

lga_remoteness <- tibble(
  lga_code = character(),
  remoteness = character(),
  remoteness_rank = numeric(),
  remoteness_major_cities_share = numeric(),
  remoteness_inner_regional_share = numeric(),
  remoteness_outer_regional_share = numeric(),
  remoteness_remote_share = numeric(),
  remoteness_very_remote_share = numeric()
)

seifa_file <- tryCatch(
  download_one(page_urls$seifa_lga, "abs_seifa_lga.xlsx"),
  error = \(e) seifa_file
)

if (!is.na(seifa_file) && file.exists(seifa_file) && is_xlsx_file(seifa_file)) {
  seifa <- read_excel(seifa_file, sheet = "Table 1", skip = 6, col_names = FALSE) |>
    clean_df()

  abs_lga <- seifa |>
    transmute(
      lga_code = str_pad(as.character(as_number(.data[["1"]])), width = 5, side = "left", pad = "0"),
      irsd_score = as_number(.data[["3"]]),
      irsd_decile = as_number(.data[["4"]]),
      ier_score = as_number(.data[["5"]]),
      ier_decile = as_number(.data[["6"]]),
      ieo_score = as_number(.data[["7"]]),
      ieo_decile = as_number(.data[["8"]])
    ) |>
    filter(!is.na(lga_code), is.finite(irsd_score)) |>
    distinct(lga_code, .keep_all = TRUE)
  abs_note <- "ABS SEIFA LGA context joined."
}

population_file <- tryCatch(
  download_one(page_urls$population_age_lga, "abs_lga_age_structure.xlsx"),
  error = \(e) population_file
)

if (!is.na(population_file) && file.exists(population_file) && is_xlsx_file(population_file)) {
  pop_lga <- read_excel(population_file, sheet = "Table 1", skip = 7, col_names = FALSE) |>
    clean_df() |>
    transmute(
      lga_code = str_pad(as.character(as_number(.data[["3"]])), width = 5, side = "left", pad = "0"),
      lga_population = as_number(.data[["7"]]),
      median_age = as_number(.data[["11"]]),
      pct_0_14 = as_number(.data[["13"]]),
      pct_15_64 = as_number(.data[["14"]]),
      pct_65_plus = as_number(.data[["15"]])
    ) |>
    filter(!is.na(lga_code), is.finite(lga_population)) |>
    distinct(lga_code, .keep_all = TRUE)

  abs_lga <- full_join(abs_lga, pop_lga, by = "lga_code")
  abs_note <- paste(abs_note, "ABS age-structure context also joined.")
}

mb_file <- tryCatch(
  download_one(page_urls$asgs_mb_2021, "abs_mb_2021_allocation.xlsx"),
  error = \(e) NA_character_
)

lga_allocation_file <- tryCatch(
  download_one(page_urls$asgs_lga_2021, "abs_lga_2021_aust.xlsx"),
  error = \(e) NA_character_
)

ra_file <- tryCatch(
  download_one(page_urls$asgs_ra_2021, "abs_ra_2021_aust.xlsx"),
  error = \(e) NA_character_
)

if (
  !is.na(mb_file) && file.exists(mb_file) && is_xlsx_file(mb_file) &&
  !is.na(lga_allocation_file) && file.exists(lga_allocation_file) && is_xlsx_file(lga_allocation_file) &&
  !is.na(ra_file) && file.exists(ra_file) && is_xlsx_file(ra_file)
) {
  mb_sa1 <- read_excel(mb_file, sheet = 1) |>
    clean_df() |>
    transmute(
      mb_code = as.character(mb_code_2021),
      sa1_code = as.character(sa1_code_2021)
    )

  lga_mb <- read_excel(lga_allocation_file, sheet = 1) |>
    clean_df() |>
    transmute(
      mb_code = as.character(mb_code_2021),
      lga_code = str_pad(as.character(as_number(lga_code_2021)), width = 5, side = "left", pad = "0"),
      lga_area_sqkm = as_number(area_albers_sqkm)
    ) |>
    filter(!is.na(lga_code), is.finite(lga_area_sqkm), lga_area_sqkm > 0)

  sa1_ra <- read_excel(ra_file, sheet = 1) |>
    clean_df() |>
    transmute(
      sa1_code = as.character(sa1_code_2021),
      remoteness_code = as_number(ra_code_2021),
      remoteness = str_remove(as.character(ra_name_2021), " of Australia$")
    ) |>
    filter(!is.na(sa1_code), !is.na(remoteness))

  lga_ra_long <- lga_mb |>
    left_join(mb_sa1, by = "mb_code") |>
    left_join(sa1_ra, by = "sa1_code") |>
    filter(!is.na(remoteness)) |>
    group_by(lga_code, remoteness, remoteness_code) |>
    summarise(area_sqkm = sum(lga_area_sqkm, na.rm = TRUE), .groups = "drop") |>
    group_by(lga_code) |>
    mutate(area_share = area_sqkm / sum(area_sqkm, na.rm = TRUE)) |>
    ungroup()

  lga_remoteness <- lga_ra_long |>
    group_by(lga_code) |>
    summarise(
      remoteness = remoteness[which.max(area_share)],
      remoteness_rank = weighted.mean(remoteness_code, area_share, na.rm = TRUE),
      remoteness_major_cities_share = sum(area_share[remoteness == "Major Cities"], na.rm = TRUE),
      remoteness_inner_regional_share = sum(area_share[remoteness == "Inner Regional"], na.rm = TRUE),
      remoteness_outer_regional_share = sum(area_share[remoteness == "Outer Regional"], na.rm = TRUE),
      remoteness_remote_share = sum(area_share[remoteness == "Remote"], na.rm = TRUE),
      remoteness_very_remote_share = sum(area_share[remoteness == "Very Remote"], na.rm = TRUE),
      .groups = "drop"
    )

  abs_note <- paste(abs_note, "ABS remoteness context also joined.")
}

service_abs <- service_context |>
  left_join(lga_remoteness, by = "lga_code", suffix = c("", "_abs")) |>
  mutate(remoteness = coalesce(remoteness, remoteness_abs)) |>
  left_join(abs_lga, by = "lga_code") |>
  group_by(state, service_district) |>
  summarise(
    irsd_score = safe_weighted_mean(irsd_score, lga_population),
    irsd_decile = safe_weighted_mean(irsd_decile, lga_population),
    ier_score = safe_weighted_mean(ier_score, lga_population),
    ier_decile = safe_weighted_mean(ier_decile, lga_population),
    ieo_score = safe_weighted_mean(ieo_score, lga_population),
    ieo_decile = safe_weighted_mean(ieo_decile, lga_population),
    median_age = safe_weighted_mean(median_age, lga_population),
    pct_0_14 = safe_weighted_mean(pct_0_14, lga_population),
    pct_15_64 = safe_weighted_mean(pct_15_64, lga_population),
    pct_65_plus = safe_weighted_mean(pct_65_plus, lga_population),
    lga_population = sum(lga_population, na.rm = TRUE),
    remoteness_rank = safe_weighted_mean(remoteness_rank, lga_population),
    remoteness_major_cities_share = safe_weighted_mean(remoteness_major_cities_share, lga_population),
    remoteness_inner_regional_share = safe_weighted_mean(remoteness_inner_regional_share, lga_population),
    remoteness_outer_regional_share = safe_weighted_mean(remoteness_outer_regional_share, lga_population),
    remoteness_remote_share = safe_weighted_mean(remoteness_remote_share, lga_population),
    remoteness_very_remote_share = safe_weighted_mean(remoteness_very_remote_share, lga_population),
    mapped_lgas = n_distinct(lga_code),
    mapped_lga_names = collapse_top_values(lga_name),
    remoteness = safe_mode_character(remoteness),
    .groups = "drop"
  ) |>
  left_join(service_lga_participants, by = c("state", "service_district"), suffix = c("", "_participant_file")) |>
  mutate(
    max_remoteness_share = pmax(
      remoteness_major_cities_share,
      remoteness_inner_regional_share,
      remoteness_outer_regional_share,
      remoteness_remote_share,
      remoteness_very_remote_share,
      na.rm = TRUE
    ),
    dominant_remoteness = case_when(
      !is.finite(max_remoteness_share) ~ NA_character_,
      remoteness_major_cities_share == max_remoteness_share ~ "Major Cities",
      remoteness_inner_regional_share == max_remoteness_share ~ "Inner Regional",
      remoteness_outer_regional_share == max_remoteness_share ~ "Outer Regional",
      remoteness_remote_share == max_remoteness_share ~ "Remote",
      remoteness_very_remote_share == max_remoteness_share ~ "Very Remote",
      TRUE ~ NA_character_
    ),
    remoteness = coalesce(dominant_remoteness, remoteness, remoteness_participant_file),
    mapped_lgas = if_else(is.finite(participant_lgas), pmax(mapped_lgas, participant_lgas, na.rm = TRUE), mapped_lgas),
    mapped_lga_names = coalesce(mapped_lga_names, participant_lga_names),
    lga_population = if_else(lga_population == 0, NA_real_, lga_population),
    lga_participants = if_else(lga_participants == 0, NA_real_, lga_participants),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, lga_avg_support_budget, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) ifelse(is.nan(x), NA_real_, x))
  )

model_data <- model_base |>
  left_join(service_abs, by = c("state", "service_district"))

model_service_keys <- model_data |>
  filter(!is.na(state), !is.na(service_district)) |>
  distinct(state, service_district)

service_context_covered <- service_context |>
  distinct(state, service_district) |>
  semi_join(model_service_keys, by = c("state", "service_district"))

service_abs_covered <- service_abs |>
  semi_join(model_service_keys, by = c("state", "service_district"))

join_diagnostics <- tibble(
  check = c(
    "Budget rows with utilisation",
    "Budget rows with active providers",
    "Budget rows with grouped payments",
    "Budget rows with market concentration",
    "Service districts with LGA mapping",
    "Service districts with ABS SEIFA/age context",
    "Service districts with remoteness context",
    "Service districts with LGA participant context"
  ),
  rows_or_groups = c(
    sum(!is.na(model_data$utilisation_rate)),
    sum(!is.na(model_data$active_providers)),
    sum(!is.na(model_data$payment_amount)),
    sum(!is.na(model_data$payment_share_top10)),
    nrow(service_context_covered),
    nrow(service_abs_covered |> filter(!is.na(irsd_score) | !is.na(median_age))),
    nrow(service_abs_covered |> filter(!is.na(remoteness))),
    nrow(service_abs_covered |> filter(!is.na(lga_participants)))
  ),
  denominator = c(
    nrow(model_data),
    nrow(model_data),
    nrow(model_data),
    nrow(model_data),
    nrow(model_service_keys),
    nrow(model_service_keys),
    nrow(model_service_keys),
    nrow(model_service_keys)
  )
) |>
  mutate(coverage = rows_or_groups / denominator)

outcome_spend_snapshot <- model_data |>
  filter(quarter_date == as.Date("2024-06-30")) |>
  group_by(state, service_district) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    utilisation = safe_weighted_mean(utilisation_rate, participant_count),
    payment_per_participant = safe_weighted_mean(payment_per_participant, participant_count),
    participants_per_provider = safe_weighted_mean(participants_per_provider, participant_count),
    payment_share_top10 = safe_mean(payment_share_top10),
    irsd_decile = safe_mean(irsd_decile),
    ier_decile = safe_mean(ier_decile),
    ieo_decile = safe_mean(ieo_decile),
    remoteness_rank = safe_mean(remoteness_rank),
    median_age = safe_mean(median_age),
    pct_0_14 = safe_mean(pct_0_14),
    pct_65_plus = safe_mean(pct_65_plus),
    lga_population = safe_mean(lga_population),
    .groups = "drop"
  ) |>
  mutate(across(c(avg_budget, utilisation, payment_per_participant, participants_per_provider, payment_share_top10, irsd_decile, ier_decile, ieo_decile, remoteness_rank, median_age, pct_0_14, pct_65_plus, lga_population), \(x) ifelse(is.nan(x), NA_real_, x)))

state_outcome_spend_snapshot <- model_data |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
  group_by(state) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    utilisation = safe_weighted_mean(utilisation_rate, participant_count),
    payment_per_participant = safe_weighted_mean(payment_per_participant, participant_count),
    participants_per_provider = safe_weighted_mean(participants_per_provider, participant_count),
    payment_share_top10 = safe_mean(payment_share_top10),
    .groups = "drop"
  ) |>
  filter(!is.na(state), state != "ALL")

regional_outcomes_spend <- regional_outcomes |>
  left_join(outcome_spend_snapshot, by = c("state", "service_district")) |>
  filter(!is.na(payment_per_participant), !is.na(outcome_value))

state_outcomes_spend <- national_outcomes |>
  filter(!is.na(state), state != "ALL") |>
  left_join(state_outcome_spend_snapshot, by = "state") |>
  filter(!is.na(payment_per_participant), !is.na(outcome_value))

state_outcome_domain <- state_outcomes_spend |>
  group_by(source, state, domain) |>
  summarise(
    indicators = n_distinct(indicator),
    outcome_value = safe_mean(outcome_value),
    participants = max(participants, na.rm = TRUE),
    avg_budget = safe_mean(avg_budget),
    utilisation = safe_mean(utilisation),
    payment_per_participant = safe_mean(payment_per_participant),
    participants_per_provider = safe_mean(participants_per_provider),
    payment_share_top10 = safe_mean(payment_share_top10),
    .groups = "drop"
  ) |>
  mutate(
    participants = if_else(is.infinite(participants), NA_real_, participants),
    outcome_per_1000_dollars = outcome_value / (payment_per_participant / 1000)
  )

outcome_spend_join <- tibble(
  check = c(
    "Service districts with dashboard outcomes",
    "Dashboard outcome rows joined to June 2024 district spend",
    "State outcome rows joined to latest state spend"
  ),
  rows_or_groups = c(
    n_distinct(regional_outcomes$service_district),
    nrow(regional_outcomes_spend),
    nrow(state_outcomes_spend)
  ),
  denominator = c(
    n_distinct(outcome_spend_snapshot$service_district),
    nrow(regional_outcomes),
    nrow(national_outcomes |> filter(!is.na(state), state != "ALL"))
  ),
  coverage = if_else(denominator > 0, rows_or_groups / denominator, NA_real_)
)

abs_note

[1] "ABS SEIFA LGA context joined. ABS age-structure context also joined. ABS remoteness context also joined."

Appendix: source rows and join diagnostics

Source Rows and Join Diagnostics

Code

tibble(
  table = c("Budget rows", "Utilisation rows", "Provider rows", "Payments rows", "Market concentration rows", "Model rows"),
  rows = c(nrow(budget), nrow(utilisation), nrow(providers), nrow(payments), nrow(market_concentration), nrow(model_data))
) |>
  kable()

table	rows
Budget rows	105041
Utilisation rows	223624
Provider rows	18349
Payments rows	816299
Market concentration rows	7228
Model rows	105041

Code

join_diagnostics |>
  mutate(coverage = percent(coverage, accuracy = 0.1)) |>
  kable()

check	rows_or_groups	denominator	coverage
Budget rows with utilisation	62709	105041	59.7%
Budget rows with active providers	61506	105041	58.6%
Budget rows with grouped payments	63823	105041	60.8%
Budget rows with market concentration	1112	105041	1.1%
Service districts with LGA mapping	80	84	95.2%
Service districts with ABS SEIFA/age context	80	84	95.2%
Service districts with remoteness context	80	84	95.2%
Service districts with LGA participant context	80	84	95.2%

Overall Spend Growth

Across the visible payments extracts, observed NDIS payments increased by 18.7%, equivalent to about 2.9% per quarter. This is a short window, so I treat it as the recent pace in the public files rather than a long-run trend.

Code

if (nrow(overall_payment_trend) > 1) {
  ggplot(overall_payment_trend, aes(quarter_date, total_payments)) +
    geom_line(linewidth = 0.9, colour = "#0b6b57") +
    geom_point(size = 2, colour = "#0b6b57") +
    scale_y_continuous(labels = dollar) +
    labs(x = NULL, y = "Total quarterly payments")
} else {
  tibble(note = "Not enough payment quarters to draw the total spend chart.") |> kable()
}

Total NDIS payments in the visible quarterly extracts.

How Fast Is Participation Growing?

Before modelling budgets or utilisation, the first question is scale: is NDIS participation growing faster than the Australian population? The table below compares national NDIS participant counts from the public participant budget extract with ABS estimated resident population. The ABS quarterly population release currently overlaps the NDIS analysis window through September 2025, so the December 2025 NDIS quarter is not used in this comparison.

Code

abs_population <- tibble::tribble(
  ~quarter_label, ~quarter_date, ~australian_population,
  "March 2024", "2024-03-31", 27113517,
  "June 2024", "2024-06-30", 27194286,
  "September 2024", "2024-09-30", 27301149,
  "December 2024", "2024-12-31", 27388133,
  "March 2025", "2025-03-31", 27531443,
  "June 2025", "2025-06-30", 27613654,
  "September 2025", "2025-09-30", 27724744
) |>
  mutate(
    quarter_date = as.Date(quarter_date),
    population_source = "ABS National, state and territory population, September 2025 release"
  )

population_participation <- national_participants |>
  inner_join(abs_population, by = c("quarter_label", "quarter_date")) |>
  arrange(quarter_date) |>
  mutate(
    ndis_share_population = ndis_participants / australian_population,
    ndis_participant_growth = ndis_participants / first(ndis_participants) - 1,
    population_growth = australian_population / first(australian_population) - 1,
    share_change_percentage_points = (ndis_share_population - first(ndis_share_population)) * 100
  )

participation_summary <- if (nrow(population_participation) > 0) {
  population_participation |>
    summarise(
      start_quarter = first(quarter_label),
      end_quarter = last(quarter_label),
      population_growth = last(population_growth),
      ndis_participant_growth = last(ndis_participant_growth),
      start_share = first(ndis_share_population),
      end_share = last(ndis_share_population)
    )
} else {
  tibble(
    start_quarter = NA_character_,
    end_quarter = NA_character_,
    population_growth = NA_real_,
    ndis_participant_growth = NA_real_,
    start_share = NA_real_,
    end_share = NA_real_
  )
}

if (nrow(population_participation) > 0) {
  population_participation |>
    transmute(
      quarter = quarter_label,
      australian_population = comma(australian_population),
      ndis_participants = comma(ndis_participants),
      population_growth = percent(population_growth, accuracy = 0.1),
      ndis_participant_growth = percent(ndis_participant_growth, accuracy = 0.1),
      ndis_share_population = percent(ndis_share_population, accuracy = 0.01),
      share_change_percentage_points = number(share_change_percentage_points, accuracy = 0.01)
    ) |>
    kable()
} else {
  tibble(note = "National NDIS participant counts could not be aligned to the ABS population denominator in this render.") |>
    kable()
}

quarter	australian_population	ndis_participants	population_growth	ndis_participant_growth	ndis_share_population	share_change_percentage_points
June 2024	27,194,286	661,267	0.0%	0.0%	2.43%	0.00
September 2024	27,301,149	680,123	0.4%	2.9%	2.49%	0.06
December 2024	27,388,133	692,822	0.7%	4.8%	2.53%	0.10
March 2025	27,531,443	716,998	1.2%	8.4%	2.60%	0.17
June 2025	27,613,654	739,414	1.5%	11.8%	2.68%	0.25
September 2025	27,724,744	751,446	2.0%	13.6%	2.71%	0.28

Over the overlapping ABS and NDIS window, the Australian population grew 2.0% while NDIS participant counts grew 13.6%. That lifted observed NDIS participation from 2.43% of the population in June 2024 to 2.71% by September 2025.

Code

if (nrow(population_participation) > 1) {
  population_participation |>
    select(quarter_date, population_growth, ndis_participant_growth) |>
    pivot_longer(
      c(population_growth, ndis_participant_growth),
      names_to = "series",
      values_to = "growth"
    ) |>
    mutate(
      series = recode(
        series,
        population_growth = "Australian population",
        ndis_participant_growth = "NDIS participants"
      )
    ) |>
    ggplot(aes(quarter_date, growth, colour = series)) +
    geom_hline(yintercept = 0, colour = "grey80") +
    geom_line(linewidth = 0.9) +
    geom_point(size = 2) +
    scale_y_continuous(labels = percent) +
    labs(x = NULL, y = "Growth since March 2024", colour = NULL)
} else {
  tibble(note = "Not enough overlapping population and NDIS quarters to draw the indexed growth chart.") |>
    kable()
}

NDIS participant growth compared with Australian population growth, indexed to March 2024.

Code

if (nrow(population_participation) > 1) {
  population_participation |>
    ggplot(aes(quarter_date, ndis_share_population)) +
    geom_line(linewidth = 0.9, colour = "#1b6ca8") +
    geom_point(size = 2, colour = "#1b6ca8") +
    scale_y_continuous(labels = percent) +
    labs(x = NULL, y = "NDIS participants as share of population")
} else {
  tibble(note = "Not enough overlapping population and NDIS quarters to draw the participation-share chart.") |>
    kable()
}

Share of the Australian population participating in the NDIS.

Code

budget_trend_data <- budget |>
  group_by(quarter_date) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    .groups = "drop"
  ) |>
  filter(!is.na(avg_budget))

if (has_enough_data(budget_trend_data, c("avg_budget"))) {
  ggplot(budget_trend_data, aes(quarter_date, avg_budget)) +
    geom_line(linewidth = 0.8, colour = "#1b6ca8") +
    geom_point(size = 2, colour = "#1b6ca8") +
    scale_y_continuous(labels = dollar) +
    labs(x = NULL, y = "Average annualised budget", title = "Average support budgets over time")
} else {
  tibble(note = "Not enough non-missing average budget data to draw the trend chart.") |> kable()
}

Average support budget and participant counts across the visible quarterly extracts.

Code

utilisation_trend_data <- utilisation |>
  group_by(quarter_date) |>
  summarise(avg_utilisation = safe_mean(utilisation_rate), .groups = "drop") |>
  arrange(quarter_date)

utilisation_summary <- utilisation_trend_data |>
  summarise(
    start_utilisation = first(avg_utilisation),
    end_utilisation = last(avg_utilisation),
    min_utilisation = min(avg_utilisation, na.rm = TRUE),
    max_utilisation = max(avg_utilisation, na.rm = TRUE)
  )

utilisation_trend_data |>
  ggplot(aes(quarter_date, avg_utilisation)) +
  geom_line(linewidth = 0.8, colour = "#7a4b9d") +
  geom_point(size = 2, colour = "#7a4b9d") +
  scale_y_continuous(labels = percent) +
  labs(x = NULL, y = "Average utilisation", title = "Plan utilisation over time")

Plan utilisation across the visible quarterly extracts.

Payments are approximately participants multiplied by approved budgets multiplied by utilisation. In this short series, utilisation moves from 66.1% to 67.1%, with a range of 66.0% to 68.7%. That points the descriptive story toward participant growth and average budget growth rather than a large utilisation-rate shift.

Code

budget_pressure <- budget |>
  group_by(quarter_date) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    implied_budget_pool = sum(participant_count * avg_support_budget, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(quarter_date) |>
  mutate(
    participant_growth = participants / first(participants) - 1,
    avg_budget_growth = avg_budget / first(avg_budget) - 1,
    implied_pool_growth = implied_budget_pool / first(implied_budget_pool) - 1
  )

budget_pressure |>
  select(quarter_date, participant_growth, avg_budget_growth, implied_pool_growth) |>
  pivot_longer(-quarter_date, names_to = "measure", values_to = "growth") |>
  mutate(
    measure = recode(
      measure,
      participant_growth = "Participants",
      avg_budget_growth = "Average support budget",
      implied_pool_growth = "Participant x budget pool"
    )
  ) |>
  ggplot(aes(quarter_date, growth, colour = measure)) +
  geom_hline(yintercept = 0, colour = "grey80") +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  scale_y_continuous(labels = percent) +
  labs(x = NULL, y = "Growth since first available quarter", colour = NULL)

Participant count and average support budget growth, indexed to the first available quarter.

Appendix: support-class budget snapshot

Code

budget |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
  group_by(support_class) |>
  summarise(
    rows = n(),
    usable_participants = sum(participant_count, na.rm = TRUE),
    median_avg_budget = safe_median(avg_support_budget),
    mean_avg_budget = safe_mean(avg_support_budget),
    .groups = "drop"
  ) |>
  left_join(
    payments_support_class |>
      filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
      group_by(support_class) |>
      summarise(total_payments = sum(payment_amount, na.rm = TRUE), .groups = "drop"),
    by = "support_class"
  ) |>
  arrange(desc(total_payments)) |>
  mutate(
    usable_participants = comma(usable_participants),
    median_avg_budget = dollar(median_avg_budget, accuracy = 1),
    mean_avg_budget = dollar(mean_avg_budget, accuracy = 1),
    total_payments = dollar(total_payments, accuracy = 1)
  ) |>
  kable()

support_class	rows	usable_participants	median_avg_budget	mean_avg_budget	total_payments
Core	5857	524,476	$93,000	$103,684	$38,509,796,000
Capital	3446	106,679	$19,500	$20,195	$1,419,899,000
CapacityBuilding	5949	633,591	$20,000	$20,354	NA

Suppressed participant counts such as <11 are treated as missing. Weighted summaries use non-suppressed participant counts where they are available and otherwise fall back to unweighted summaries.

Disability Mix and Payment Growth

The payments files are the most useful addition for spend analysis because they contain actual payment amounts by disability group as well as support class, support category, item, region and age band. The disability-group rows below use national aggregate rows where available, so nested ALL rows are not double-counted.

Appendix: support class, category and item payment growth

The support-level tables are still useful audit checks, but the main story below focuses on disability groups.

Code

payments_support_class |>
  group_by(quarter_date, support_class) |>
  summarise(payment_amount = sum(payment_amount, na.rm = TRUE), .groups = "drop") |>
  group_by(support_class) |>
  arrange(quarter_date, .by_group = TRUE) |>
  mutate(index = payment_amount / first(payment_amount) - 1) |>
  ungroup() |>
  ggplot(aes(quarter_date, index, colour = support_class)) +
  geom_hline(yintercept = 0, colour = "grey80") +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  scale_y_continuous(labels = percent) +
  labs(x = NULL, y = "Growth since first available quarter", colour = "Support class")

Payment growth by support class, indexed to the first available quarter.

Code

first_payment_qtr <- min(payments_support_class$quarter_date, na.rm = TRUE)
last_payment_qtr <- max(payments_support_class$quarter_date, na.rm = TRUE)

payment_change <- function(data, group_vars) {
  if (nrow(data) == 0) {
    return(tibble(
      !!!set_names(rep(list(character()), length(group_vars)), group_vars),
      payment_amount_first = numeric(),
      payment_amount_last = numeric(),
      participants_first = numeric(),
      participants_last = numeric(),
      payment_change = numeric(),
      participant_change = numeric(),
      payment_per_participant_first = numeric(),
      payment_per_participant_last = numeric(),
      payment_per_participant_change = numeric()
    ))
  }

  wide <- data |>
    filter(quarter_date %in% c(first_payment_qtr, last_payment_qtr)) |>
    group_by(across(all_of(group_vars)), quarter_date) |>
    summarise(
      payment_amount = sum(payment_amount, na.rm = TRUE),
      participants = sum(payment_participants, na.rm = TRUE),
      .groups = "drop"
    ) |>
    mutate(period = if_else(quarter_date == first_payment_qtr, "first", "last")) |>
    select(-quarter_date) |>
    pivot_wider(
      names_from = period,
      values_from = c(payment_amount, participants),
      values_fill = 0
    )

  needed_cols <- c("payment_amount_first", "payment_amount_last", "participants_first", "participants_last")
  for (col in needed_cols) {
    if (!col %in% names(wide)) wide[[col]] <- 0
  }

  wide |>
    mutate(
      payment_amount_first = coalesce(payment_amount_first, 0),
      payment_amount_last = coalesce(payment_amount_last, 0),
      participants_first = coalesce(participants_first, 0),
      participants_last = coalesce(participants_last, 0),
      payment_change = payment_amount_last - payment_amount_first,
      participant_change = participants_last - participants_first,
      payment_per_participant_first = payment_amount_first / participants_first,
      payment_per_participant_last = payment_amount_last / participants_last,
      payment_per_participant_change = payment_per_participant_last - payment_per_participant_first
    ) |>
    arrange(desc(abs(payment_change)))
}

support_class_change <- payment_change(payments_support_class, "support_class")

support_class_change |>
  mutate(
    across(c(payment_amount_first, payment_amount_last, payment_change), \(x) dollar(x, accuracy = 1)),
    across(c(participants_first, participants_last, participant_change), comma),
    across(c(payment_per_participant_first, payment_per_participant_last, payment_per_participant_change), \(x) dollar(x, accuracy = 1))
  ) |>
  kable()

support_class	payment_amount_first	payment_amount_last	participants_first	participants_last	payment_change	participant_change	payment_per_participant_first	payment_per_participant_last	payment_per_participant_change
Core	$32,425,627,000	$38,509,796,000	512,210	552,928	$6,084,169,000	40,718	$63,305	$69,647	$6,342
Capacity Building	$7,802,961,000	$9,028,315,000	637,317	737,282	$1,225,354,000	99,965	$12,243	$12,245	$2
Capital	$1,018,264,000	$1,419,899,000	80,160	86,369	$401,635,000	6,209	$12,703	$16,440	$3,737
Missing	$7,371,000	$3,716,000	151,095	290,973	-$3,655,000	139,878	$49	$13	-$36

Code

category_change <- payment_change(payments_category, c("support_class", "support_category"))

category_change |>
  slice_max(abs(payment_change), n = 15) |>
  mutate(
    payment_change = dollar(payment_change, accuracy = 1),
    payment_amount_last = dollar(payment_amount_last, accuracy = 1),
    participant_change = comma(participant_change),
    payment_per_participant_change = dollar(payment_per_participant_change, accuracy = 1)
  ) |>
  select(support_class, support_category, payment_amount_last, payment_change, participant_change, payment_per_participant_change) |>
  kable()

support_class	support_category	payment_amount_last	payment_change	participant_change	payment_per_participant_change
Core	Daily Activities	$25,446,659,000	$4,005,204,000	29,401	$5,735
Core	Social Community and Civic Participation	$11,563,196,000	$2,090,588,000	23,533	$3,977
Capacity Building	CB Daily Activity	$5,790,549,000	$614,241,000	79,004	-$117
Capacity Building	CB Relationships	$954,703,000	$323,821,000	18,386	$1,801
Capital	Assistive Technology	$789,391,000	$203,350,000	6,627	$1,945
Capital	Home Modifications	$630,508,000	$198,285,000	1,070	$7,770
Capacity Building	Support Coordination	$1,198,990,000	$126,501,000	38,708	-$44
Capacity Building	CB Choice and Control	$645,490,000	$86,859,000	97,518	-$74
Core	Consumables	$620,377,000	-$64,787,000	-29,903	$15
Core	Transport	$879,563,000	$53,163,000	4,449	$146
Capacity Building	CB Employment	$177,050,000	$45,336,000	-4,672	$3,146
Capacity Building	CB Social Community and Civic Participation	$233,283,000	$35,722,000	-2,067	$915
Capacity Building	CB Health and Wellbeing	$26,683,000	-$7,512,000	-5,506	$161
Missing	Missing	$3,716,000	-$3,655,000	139,878	-$36
Capacity Building	CB Home Living	$1,216,000	$212,000	-43	$431

Code

item_change <- payment_change(payments_item, c("support_class", "support_category", "support_item_number", "support_item_desc"))

if (nrow(item_change) > 0) {
  item_change |>
    slice_max(abs(payment_change), n = 15) |>
    mutate(
      support_item_desc = str_trunc(support_item_desc, 70),
      payment_amount_last = dollar(payment_amount_last, accuracy = 1),
      payment_change = dollar(payment_change, accuracy = 1),
      participant_change = comma(participant_change),
      payment_per_participant_change = dollar(payment_per_participant_change, accuracy = 1)
    ) |>
    select(support_class, support_category, support_item_number, support_item_desc, payment_amount_last, payment_change, participant_change, payment_per_participant_change) |>
    kable()
} else {
  tibble(note = "The current public payments extract did not expose national item-level rows under the no-double-counting filters used here.") |>
    kable()
}

note
The current public payments extract did not expose national item-level rows under the no-double-counting filters used here.

Code

payments_disability_candidates <- payments |>
  filter(
    state == "ALL", service_district == "ALL", disability_group != "ALL", age_group == "ALL",
    support_category == "ALL", support_item_number == "ALL"
  )

if (any(payments_disability_candidates$support_class == "ALL", na.rm = TRUE)) {
  payments_disability <- payments_disability_candidates |>
    filter(support_class == "ALL") |>
    group_by(quarter_label, quarter_date, quarter_index, disability_group) |>
    summarise(
      payment_amount = sum(payment_amount, na.rm = TRUE),
      payment_participants = max(payment_participants, na.rm = TRUE),
      .groups = "drop"
    )
  disability_payment_basis <- "National disability-group aggregate rows"
} else {
  payments_disability <- payments_disability_candidates |>
    filter(support_class != "ALL") |>
    group_by(quarter_label, quarter_date, quarter_index, disability_group) |>
    summarise(
      payment_amount = sum(payment_amount, na.rm = TRUE),
      payment_participants = sum(payment_participants, na.rm = TRUE),
      .groups = "drop"
    )
  disability_payment_basis <- "Support-class rows summed within disability group"
}

payments_disability <- payments_disability |>
  mutate(
    payment_participants = if_else(is.infinite(payment_participants), NA_real_, payment_participants),
    payment_per_participant = payment_amount / payment_participants
  )

tibble(
  basis = disability_payment_basis,
  disability_groups = n_distinct(payments_disability$disability_group),
  quarters = paste(sort(unique(payments_disability$quarter_label)), collapse = ", ")
) |>
  kable()

basis	disability_groups	quarters
Support-class rows summed within disability group	18	December 2024, December 2025, June 2024, June 2025, March 2025, September 2024, September 2025

The participant-count view shows scale: groups that are large enough to move total spend even if average payments are moderate.

Code

largest_by_count |>
  transmute(
    disability_group,
    payment_participants = comma(payment_participants),
    total_payments = dollar(payment_amount, accuracy = 1),
    payment_per_participant = dollar(payment_per_participant, accuracy = 1)
  ) |>
  kable()

disability_group	payment_participants	total_payments	payment_per_participant
Autism	630,030	$10,565,618,000	$16,770
Intellectual Disability	213,590	$10,992,967,000	$51,468
Developmental delay	172,567	$891,824,000	$5,168
Psychosocial disability	144,402	$6,054,874,000	$41,931
Other Neurological	75,576	$4,158,314,000	$55,022
Hearing Impairment	58,663	$283,182,000	$4,827
Other Physical	57,391	$1,534,717,000	$26,741
ABI	51,816	$3,441,433,000	$66,416
Cerebral Palsy	49,537	$3,175,466,000	$64,103
Global developmental delay	38,648	$315,257,000	$8,157

The total-payments view combines scale and intensity: these are the groups contributing the largest payment dollars in the latest quarter.

Code

largest_by_spend |>
  transmute(
    disability_group,
    total_payments = dollar(payment_amount, accuracy = 1),
    payment_participants = comma(payment_participants),
    payment_per_participant = dollar(payment_per_participant, accuracy = 1)
  ) |>
  kable()

disability_group	total_payments	payment_participants	payment_per_participant
Intellectual Disability	$10,992,967,000	213,590	$51,468
Autism	$10,565,618,000	630,030	$16,770
Psychosocial disability	$6,054,874,000	144,402	$41,931
Other Neurological	$4,158,314,000	75,576	$55,022
ABI	$3,441,433,000	51,816	$66,416
Cerebral Palsy	$3,175,466,000	49,537	$64,103
Stroke	$1,746,058,000	30,248	$57,725
Down Syndrome	$1,612,746,000	26,879	$60,000
Other Physical	$1,534,717,000	57,391	$26,741
Other	$1,255,897,000	36,546	$34,365

The average-payment view is filtered to groups above the first quartile of participant counts, so it does not get dominated by very small groups with noisy averages.

Code

largest_by_average_spend |>
  transmute(
    disability_group,
    payment_per_participant = dollar(payment_per_participant, accuracy = 1),
    payment_participants = comma(payment_participants),
    total_payments = dollar(payment_amount, accuracy = 1)
  ) |>
  kable()

disability_group	payment_per_participant	payment_participants	total_payments
ABI	$66,416	51,816	$3,441,433,000
Cerebral Palsy	$64,103	49,537	$3,175,466,000
Stroke	$57,725	30,248	$1,746,058,000
Other Neurological	$55,022	75,576	$4,158,314,000
Intellectual Disability	$51,468	213,590	$10,992,967,000
Psychosocial disability	$41,931	144,402	$6,054,874,000
Multiple Sclerosis	$39,677	31,629	$1,254,956,000
Other	$34,365	36,546	$1,255,897,000
Other Physical	$26,741	57,391	$1,534,717,000
Autism	$16,770	630,030	$10,565,618,000

Code

disability_change <- payment_change(payments_disability, "disability_group")

total_disability_payment_change <- sum(disability_change$payment_change, na.rm = TRUE)

disability_driver <- disability_change |>
  mutate(
    participant_contribution = (participants_last - participants_first) *
      ((payment_per_participant_first + payment_per_participant_last) / 2),
    intensity_contribution = (payment_per_participant_last - payment_per_participant_first) *
      ((participants_first + participants_last) / 2),
    change_share = payment_change / total_disability_payment_change,
    stroke_flag = str_detect(str_to_lower(disability_group), "\\bstroke\\b")
  ) |>
  arrange(desc(abs(payment_change)))

if (nrow(disability_driver) > 0) {
  disability_driver |>
    slice_max(abs(payment_change), n = 15) |>
    transmute(
      disability_group,
      payment_amount_last = dollar(payment_amount_last, accuracy = 1),
      payment_change = dollar(payment_change, accuracy = 1),
      change_share = percent(change_share, accuracy = 0.1),
      participant_change = comma(participant_change),
      participant_contribution = dollar(participant_contribution, accuracy = 1),
      intensity_contribution = dollar(intensity_contribution, accuracy = 1),
      payment_per_participant_change = dollar(payment_per_participant_change, accuracy = 1)
    ) |>
    kable()
} else {
  tibble(note = "No disability-group payment rows were available under the no-double-counting filters used here.") |>
    kable()
}

disability_group	payment_amount_last	payment_change	change_share	participant_change	participant_contribution	intensity_contribution	payment_per_participant_change
Autism	$10,565,618,000	$2,293,245,000	29.7%	173,093	$3,018,221,951	-$724,976,951	-$1,334
Intellectual Disability	$10,992,967,000	$1,362,626,000	17.7%	19,472	$984,098,993	$378,527,007	$1,857
Other Neurological	$4,158,314,000	$844,032,000	10.9%	14,054	$765,191,910	$78,840,090	$1,150
Psychosocial disability	$6,054,874,000	$812,576,000	10.5%	10,473	$424,539,006	$388,036,994	$2,788
ABI	$3,441,433,000	$536,420,000	6.9%	6,727	$440,096,603	$96,323,397	$1,988
Stroke	$1,746,058,000	$409,681,000	5.3%	5,248	$291,735,853	$117,945,147	$4,270
Cerebral Palsy	$3,175,466,000	$406,537,000	5.3%	3,541	$220,077,145	$186,459,855	$3,904
Other	$1,255,897,000	$397,298,000	5.1%	11,275	$385,269,522	$12,028,478	$389
Multiple Sclerosis	$1,254,956,000	$177,850,000	2.3%	4,389	$173,845,483	$4,004,517	$136
Down Syndrome	$1,612,746,000	$167,282,000	2.2%	1,166	$67,753,650	$99,528,350	$3,785
Spinal Cord Injury	$1,132,552,000	$150,387,000	1.9%	2,184	$128,799,772	$21,587,228	$1,205
Other Physical	$1,534,717,000	$140,440,000	1.8%	4,015	$106,122,913	$34,317,087	$620
Developmental delay	$891,824,000	-$83,173,000	-1.1%	15,852	$90,272,802	-$173,445,802	-$1,053
Visual Impairment	$504,113,000	$62,762,000	0.8%	2,015	$41,165,690	$21,596,310	$934
Hearing Impairment	$283,182,000	$24,455,000	0.3%	5,694	$27,649,398	-$3,194,398	-$57

Code

top_driver_groups <- disability_driver |>
  filter(is.finite(payment_change)) |>
  slice_max(abs(payment_change), n = 7) |>
  pull(disability_group)

stroke_groups <- disability_driver |>
  filter(stroke_flag) |>
  pull(disability_group)

driver_groups <- unique(c(stroke_groups, top_driver_groups)) |>
  head(8)

disability_trend <- payments_disability |>
  filter(disability_group %in% driver_groups) |>
  arrange(disability_group, quarter_date) |>
  group_by(disability_group) |>
  mutate(
    payment_growth = index_to_first(payment_amount),
    participant_growth = index_to_first(payment_participants),
    payment_per_participant_growth = index_to_first(payment_per_participant)
  ) |>
  ungroup()

if (nrow(disability_trend) > 0) {
  disability_trend |>
    select(quarter_date, disability_group, payment_growth, participant_growth, payment_per_participant_growth) |>
    pivot_longer(
      ends_with("_growth"),
      names_to = "measure",
      values_to = "growth"
    ) |>
    mutate(
      measure = recode(
        measure,
        payment_growth = "Payments",
        participant_growth = "Participants",
        payment_per_participant_growth = "Payments per participant"
      )
    ) |>
    filter(is.finite(growth)) |>
    ggplot(aes(quarter_date, growth, colour = disability_group)) +
    geom_hline(yintercept = 0, colour = "grey80") +
    geom_line(linewidth = 0.8) +
    geom_point(size = 1.8) +
    facet_wrap(~ measure) +
    scale_y_continuous(labels = percent) +
    labs(x = NULL, y = "Growth since first available quarter", colour = "Disability group")
} else {
  tibble(note = "No stroke or top-driver disability-group trend rows were available to plot.") |>
    kable()
}

Payment, participant and payment-per-participant trends for stroke and the largest disability-group growth contributors.

Code

if (any(disability_driver$stroke_flag, na.rm = TRUE)) {
  disability_driver |>
    filter(stroke_flag) |>
    transmute(
      disability_group,
      payment_amount_first = dollar(payment_amount_first, accuracy = 1),
      payment_amount_last = dollar(payment_amount_last, accuracy = 1),
      payment_change = dollar(payment_change, accuracy = 1),
      change_share = percent(change_share, accuracy = 0.1),
      participants_first = comma(participants_first),
      participants_last = comma(participants_last),
      participant_change = comma(participant_change),
      payment_per_participant_change = dollar(payment_per_participant_change, accuracy = 1)
    ) |>
    kable()
} else {
  tibble(note = "Stroke was not exposed as a separate disability group in the current public payment extract.") |>
    kable()
}

disability_group	payment_amount_first	payment_amount_last	payment_change	change_share	participants_first	participants_last	participant_change	payment_per_participant_change
Stroke	$1,336,377,000	$1,746,058,000	$409,681,000	5.3%	25,000	30,248	5,248	$4,270

The payment tables are useful, but the time window is short. These are growth-contribution signals, not stable long-run trends. The disability-group decomposition separates participant growth from payment intensity, but it still reflects aggregate public payment rows rather than participant-level need or plan history.

Appendix: budgets, payments and utilisation detail

Budgets, Payments and Utilisation

Budget, utilisation and payments answer related but different questions. A high annualised support budget does not necessarily mean a high payment amount in the observed quarter, and low utilisation can reflect need, timing, provider availability or administrative barriers.

Code

latest_model_data <- model_data |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
  filter(!is.na(avg_support_budget), !is.na(utilisation_rate), !is.na(payment_amount), !is.na(payment_per_participant)) |>
  mutate(
    utilisation_rate = pmin(utilisation_rate, 1),
    high_budget_cut = quantile(avg_support_budget, 0.75, na.rm = TRUE),
    low_utilisation_cut = quantile(utilisation_rate, 0.25, na.rm = TRUE),
    high_utilisation_cut = quantile(utilisation_rate, 0.75, na.rm = TRUE),
    high_payment_cut = quantile(payment_per_participant, 0.75, na.rm = TRUE),
    signal = case_when(
      avg_support_budget >= high_budget_cut & utilisation_rate <= low_utilisation_cut ~ "High budget, low utilisation",
      utilisation_rate >= high_utilisation_cut & payment_per_participant >= high_payment_cut ~ "High utilisation, high payment",
      payment_per_participant >= high_payment_cut & avg_support_budget < high_budget_cut ~ "High payment, moderate budget",
      TRUE ~ NA_character_
    )
  )

Code

tibble(
  signal = c("High budget, low utilisation", "High utilisation, high payment", "High payment, moderate budget"),
  definition = c(
    "Average support budget is in the top quartile while utilisation is in the bottom quartile.",
    "Utilisation and payments per participant are both in the top quartile.",
    "Payments per participant are in the top quartile while average support budget is below the top quartile."
  )
) |>
  kable()

signal	definition
High budget, low utilisation	Average support budget is in the top quartile while utilisation is in the bottom quartile.
High utilisation, high payment	Utilisation and payments per participant are both in the top quartile.
High payment, moderate budget	Payments per participant are in the top quartile while average support budget is below the top quartile.

Code

signal_data <- latest_model_data |>
  filter(!is.na(signal))

signal_data |>
  group_by(signal) |>
  summarise(
    service_districts = n_distinct(service_district),
    grouped_cells = n(),
    participants = sum(participant_count, na.rm = TRUE),
    total_payments = sum(payment_amount, na.rm = TRUE),
    median_payment_per_participant = safe_median(payment_per_participant),
    median_utilisation = safe_median(utilisation_rate),
    median_avg_budget = safe_median(avg_support_budget),
    .groups = "drop"
  ) |>
  mutate(
    participants = comma(participants),
    total_payments = dollar(total_payments, accuracy = 1),
    median_payment_per_participant = dollar(median_payment_per_participant, accuracy = 1),
    median_utilisation = percent(median_utilisation, accuracy = 0.1),
    median_avg_budget = dollar(median_avg_budget, accuracy = 1)
  ) |>
  kable()

signal	service_districts	grouped_cells	participants	total_payments	median_payment_per_participant	median_utilisation	median_avg_budget
High budget, low utilisation	69	152	5,833	$783,146,000	$111,263	62.0%	$159,000
High payment, moderate budget	46	71	5,321	$624,586,000	$111,333	79.0%	$127,000
High utilisation, high payment	74	876	60,014	$10,258,221,000	$172,543	88.0%	$194,000

Code

signal_data |>
  group_by(signal, state, service_district, support_class) |>
  summarise(
    grouped_cells = n(),
    participants = sum(participant_count, na.rm = TRUE),
    total_payments = sum(payment_amount, na.rm = TRUE),
    median_utilisation = safe_median(utilisation_rate),
    median_payment_per_participant = safe_median(payment_per_participant),
    .groups = "drop"
  ) |>
  arrange(signal, desc(participants)) |>
  group_by(signal) |>
  slice_head(n = 5) |>
  ungroup() |>
  mutate(
    participants = comma(participants),
    total_payments = dollar(total_payments, accuracy = 1),
    median_utilisation = percent(median_utilisation, accuracy = 0.1),
    median_payment_per_participant = dollar(median_payment_per_participant, accuracy = 1)
  ) |>
  kable()

signal	state	service_district	support_class	grouped_cells	participants	total_payments	median_utilisation	median_payment_per_participant
High budget, low utilisation	QLD	Maroochydore	Core	4	441	$54,727,000	65.0%	$110,685
High budget, low utilisation	NSW	Murrumbidgee	Core	5	332	$37,025,000	62.0%	$117,062
High budget, low utilisation	NSW	North Sydney	Core	4	319	$47,653,000	64.5%	$143,835
High budget, low utilisation	SA	Southern Adelaide	Core	3	221	$30,030,000	62.0%	$130,688
High budget, low utilisation	VIC	Inner East Melbourne	Core	4	220	$26,553,000	60.5%	$100,802
High payment, moderate budget	NSW	Hunter New England	Core	2	768	$83,485,000	83.0%	$108,090
High payment, moderate budget	VIC	Bayside Peninsula	Core	1	463	$48,582,000	84.0%	$107,482
High payment, moderate budget	NSW	Nepean Blue Mountains	Core	3	416	$43,775,000	81.0%	$107,688
High payment, moderate budget	QLD	Beenleigh	Core	3	378	$39,623,000	69.0%	$108,160
High payment, moderate budget	WA	South East Metro	Core	2	339	$35,984,000	76.0%	$110,640
High utilisation, high payment	NSW	South Western Sydney	Core	41	8,489	$1,216,462,000	89.0%	$152,151
High utilisation, high payment	NSW	Western Sydney	Core	32	4,943	$781,416,000	89.5%	$173,531
High utilisation, high payment	NSW	Hunter New England	Core	14	2,784	$447,264,000	90.0%	$185,507
High utilisation, high payment	VIC	Southern Melbourne	Core	25	2,619	$403,094,000	90.0%	$170,656
High utilisation, high payment	SA	Northern Adelaide	Core	22	2,402	$392,635,000	88.0%	$168,278

Appendix: provider market context

Provider Market Context

Provider-market context is one of the few public levers that can plausibly help explain utilisation. The measures below are deliberately rough. participants_per_provider is a demand-pressure proxy, while payment_share_top10 is a concentration proxy. Neither measures provider quality, available hours, wait time, travel burden or workforce depth.

Code

latest_model_data |>
  group_by(support_class) |>
  summarise(
    grouped_cells = n(),
    participants = sum(participant_count, na.rm = TRUE),
    median_utilisation = safe_median(utilisation_rate),
    median_participants_per_provider = safe_median(participants_per_provider),
    median_payment_share_top10 = safe_median(payment_share_top10),
    median_payment_per_participant = safe_median(payment_per_participant),
    .groups = "drop"
  ) |>
  mutate(
    participants = comma(participants),
    median_utilisation = percent(median_utilisation, accuracy = 0.1),
    median_participants_per_provider = number(median_participants_per_provider, accuracy = 0.1),
    median_payment_share_top10 = percent(median_payment_share_top10, accuracy = 0.1),
    median_payment_per_participant = dollar(median_payment_per_participant, accuracy = 1)
  ) |>
  kable()

support_class	grouped_cells	participants	median_utilisation	median_participants_per_provider	median_payment_share_top10	median_payment_per_participant
Capital	2475	94,295	72.0%	1.5	76.0%	$15,870
Core	5504	521,695	78.0%	1.1	29.0%	$77,362

Code

provider_quadrant_data <- latest_model_data |>
  filter(is.finite(participants_per_provider), is.finite(payment_share_top10)) |>
  mutate(
    provider_pressure = if_else(participants_per_provider >= safe_median(participants_per_provider), "High provider pressure", "Low provider pressure"),
    concentration = if_else(payment_share_top10 >= safe_median(payment_share_top10), "High concentration", "Low concentration"),
    quadrant = paste(provider_pressure, concentration, sep = " / ")
  )

provider_quadrant_data |>
  group_by(quadrant) |>
  summarise(
    service_districts = n_distinct(service_district),
    grouped_cells = n(),
    participants = sum(participant_count, na.rm = TRUE),
    median_utilisation = safe_median(utilisation_rate),
    median_payment_per_participant = safe_median(payment_per_participant),
    median_payment_share_top10 = safe_median(payment_share_top10),
    .groups = "drop"
  ) |>
  mutate(
    participants = comma(participants),
    median_utilisation = percent(median_utilisation, accuracy = 0.1),
    median_payment_per_participant = dollar(median_payment_per_participant, accuracy = 1),
    median_payment_share_top10 = percent(median_payment_share_top10, accuracy = 0.1)
  ) |>
  kable()

quadrant	service_districts	grouped_cells	participants	median_utilisation	median_payment_per_participant	median_payment_share_top10
High provider pressure / High concentration	1	43	5,630	70.0%	$22,174	29.0%
Low provider pressure / High concentration	1	43	2,509	80.0%	$108,350	29.0%

SIL and SDA Baseline

The utilisation extract includes a SILorSDA field. That makes it possible to set a pre-reform descriptive baseline before mandatory registration for supported independent living and platform providers begins on 1 July 2026. This is not an evaluation of that change; it is a baseline view of utilisation patterns before the reform takes effect.

Code

sil_sda_utilisation <- utilisation |>
  filter(!is.na(sil_or_sda), sil_or_sda != "ALL", support_class != "ALL") |>
  group_by(quarter_date, sil_or_sda, support_class) |>
  summarise(utilisation = safe_mean(utilisation_rate), rows = n(), .groups = "drop")

sil_sda_utilisation |>
  ggplot(aes(quarter_date, utilisation, colour = sil_or_sda)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  facet_wrap(~ support_class) +
  scale_y_continuous(labels = percent) +
  labs(x = NULL, y = "Average utilisation", colour = "SIL/SDA flag")

Code

sil_sda_payment_items <- payments_item |>
  filter(str_detect(str_to_lower(paste(support_category, support_item_desc)), "supported independent living|\\bsil\\b|specialist disability accommodation|\\bsda\\b"))

if (nrow(sil_sda_payment_items) > 0) {
  sil_sda_payment_items |>
    group_by(quarter_date, support_class, support_category) |>
    summarise(payment_amount = sum(payment_amount, na.rm = TRUE), participants = sum(payment_participants, na.rm = TRUE), .groups = "drop") |>
    arrange(desc(payment_amount)) |>
    mutate(
      payment_amount = dollar(payment_amount, accuracy = 1),
      participants = comma(participants)
    ) |>
    slice_head(n = 15) |>
    kable()
} else {
  tibble(note = "No payment item descriptions matched SIL/SDA keywords in the national item-level extract.") |>
    kable()
}

note
No payment item descriptions matched SIL/SDA keywords in the national item-level extract.

Appendix: regional context and join quality

Regional Context and Join Quality

The regional layer is deliberately conservative. It uses the NDIS service-district-to-LGA mapping, then joins ABS LGA SEIFA, age-structure, population, and remoteness allocation files where the source workbooks resolve cleanly. Remoteness is allocated from ABS mesh-block and SA1 correspondence files up to LGA, then aggregated to service districts with population weights. The table below keeps the join quality visible so the model does not quietly imply more geographic precision than the data supports.

Code

join_diagnostics |>
  mutate(coverage = percent(coverage, accuracy = 0.1)) |>
  kable()

check	rows_or_groups	denominator	coverage
Budget rows with utilisation	62709	105041	59.7%
Budget rows with active providers	61506	105041	58.6%
Budget rows with grouped payments	63823	105041	60.8%
Budget rows with market concentration	1112	105041	1.1%
Service districts with LGA mapping	80	84	95.2%
Service districts with ABS SEIFA/age context	80	84	95.2%
Service districts with remoteness context	80	84	95.2%
Service districts with LGA participant context	80	84	95.2%

Code

model_data |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
  group_by(state, service_district) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    utilisation = safe_weighted_mean(utilisation_rate, participant_count),
    payment_per_participant = safe_weighted_mean(payment_per_participant, participant_count),
    participants_per_provider = safe_weighted_mean(participants_per_provider, participant_count),
    payment_share_top10 = safe_mean(payment_share_top10),
    remoteness = safe_mode_character(remoteness),
    irsd_decile = safe_mean(irsd_decile),
    median_age = safe_mean(median_age),
    pct_65_plus = safe_mean(pct_65_plus),
    mapped_lgas = max(mapped_lgas, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(participants)) |>
  slice_head(n = 20) |>
  mutate(
    participants = comma(participants),
    avg_budget = dollar(avg_budget, accuracy = 1),
    utilisation = percent(utilisation, accuracy = 0.1),
    payment_per_participant = dollar(payment_per_participant, accuracy = 1),
    participants_per_provider = number(participants_per_provider, accuracy = 0.1),
    payment_share_top10 = percent(payment_share_top10, accuracy = 0.1),
    irsd_decile = number(irsd_decile, accuracy = 0.1),
    median_age = number(median_age, accuracy = 0.1),
    pct_65_plus = number(pct_65_plus, accuracy = 0.1, suffix = "%"),
    mapped_lgas = comma(mapped_lgas)
  ) |>
  kable()

state	service_district	participants	avg_budget	utilisation	payment_per_participant	participants_per_provider	payment_share_top10	remoteness	irsd_decile	median_age	pct_65_plus	mapped_lgas
NSW	Hunter New England	67,783	$44,696	78.1%	$60,500	6.8	NA	Outer Regional Australia	5.7	41.3	20.6%	22
NSW	South Western Sydney	65,541	$45,184	85.5%	$67,085	9.3	NA	Major Cities	3.5	36.4	14.1%	7
QLD	Brisbane	53,192	$46,187	75.8%	$58,075	17.5	NA	Major Cities	9.0	35.9	13.2%	1
NSW	Western Sydney	49,424	$46,861	85.1%	$69,767	8.6	NA	Major Cities	6.5	35.2	12.0%	4
VIC	Bayside Peninsula	43,744	$42,201	77.1%	$52,385	15.5	NA	Major Cities	9.2	41.2	18.6%	7
SA	Northern Adelaide	41,658	$43,685	75.6%	$59,811	48.5	NA	Major Cities	3.8	37.1	15.6%	4
VIC	Southern Melbourne	36,206	$42,944	82.0%	$61,780	14.6	NA	Major Cities	6.0	34.7	12.0%	3
VIC	Western Melbourne	34,330	$40,186	79.7%	$51,890	17.8	NA	Major Cities	8.2	34.4	10.3%	5
QLD	Beenleigh	33,961	$44,387	76.4%	$59,177	19.2	NA	Major Cities	4.6	37.2	15.3%	2
VIC	North East Melbourne	33,581	$44,364	80.0%	$58,280	11.6	NA	Major Cities	8.0	37.3	14.3%	5
QLD	Caboolture/Strathpine	32,442	$43,681	76.7%	$56,970	16.2	NA	Major Cities	7.0	39.0	17.3%	1
QLD	Robina	29,218	$44,825	77.1%	$59,060	13.8	NA	Major Cities	8.0	39.2	17.3%	1
SA	Southern Adelaide	26,816	$43,756	72.1%	$54,782	28.4	NA	Major Cities	7.5	41.5	20.1%	4
VIC	Brimbank Melton	26,518	$36,866	79.1%	$48,447	37.2	NA	Major Cities	3.9	35.4	12.5%	2
QLD	Maroochydore	26,333	$44,352	75.8%	$55,140	12.5	NA	Inner Regional Australia	7.5	44.6	22.4%	3
NSW	North Sydney	25,978	$49,891	80.9%	$72,548	2.6	NA	Major Cities	10.0	40.6	17.2%	9
VIC	Barwon	25,904	$38,949	71.4%	$46,723	13.8	NA	Inner Regional Australia	8.0	40.2	19.7%	4
NSW	South Eastern Sydney	25,679	$47,941	81.1%	$66,487	3.3	NA	Major Cities	9.0	38.2	16.4%	6
VIC	Outer East Melbourne	25,171	$39,244	75.6%	$49,929	10.2	NA	Major Cities	9.0	40.0	17.2%	3
NSW	Illawarra Shoalhaven	24,407	$43,976	79.7%	$58,256	4.1	NA	Inner Regional Australia	6.7	41.9	20.9%	4

Code

latest_service_geo <- model_data |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE)) |>
  group_by(state, service_district) |>
  summarise(
    participants = sum(participant_count, na.rm = TRUE),
    avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
    utilisation = safe_weighted_mean(utilisation_rate, participant_count),
    payment_per_participant = safe_weighted_mean(payment_per_participant, participant_count),
    participants_per_provider = safe_weighted_mean(participants_per_provider, participant_count),
    payment_share_top10 = safe_mean(payment_share_top10),
    remoteness = safe_mode_character(remoteness),
    mapped_lgas = max(mapped_lgas, na.rm = TRUE),
    example_lgas = safe_mode_character(mapped_lga_names),
    irsd_decile = safe_mean(irsd_decile),
    median_age = safe_mean(median_age),
    pct_65_plus = safe_mean(pct_65_plus),
    .groups = "drop"
  ) |>
  mutate(
    mapped_lgas = if_else(is.infinite(mapped_lgas), NA_real_, as.numeric(mapped_lgas)),
    participant_rank = percent_rank(participants),
    budget_rank = percent_rank(avg_budget),
    payment_rank = percent_rank(payment_per_participant),
    utilisation_rank = percent_rank(utilisation),
    provider_pressure_rank = percent_rank(participants_per_provider),
    concentration_rank = percent_rank(payment_share_top10),
    geography_pressure_score = rowMeans(
      cbind(payment_rank, utilisation_rank, provider_pressure_rank, concentration_rank),
      na.rm = TRUE
    ),
    geography_pressure_score = if_else(is.nan(geography_pressure_score), NA_real_, geography_pressure_score)
  )

if (nrow(latest_service_geo) > 0) {
  latest_service_geo |>
    arrange(desc(geography_pressure_score), desc(participants)) |>
    slice_head(n = 15) |>
    transmute(
      state,
      service_district,
      remoteness,
      mapped_lgas = comma(mapped_lgas),
      participants = comma(participants),
      avg_budget = dollar(avg_budget, accuracy = 1),
      utilisation = percent(utilisation, accuracy = 0.1),
      payment_per_participant = dollar(payment_per_participant, accuracy = 1),
      participants_per_provider = number(participants_per_provider, accuracy = 0.1),
      payment_share_top10 = percent(payment_share_top10, accuracy = 0.1),
      irsd_decile = number(irsd_decile, accuracy = 0.1)
    ) |>
    kable()
} else {
  tibble(note = "No latest-quarter service-district geography profile could be built.") |>
    kable()
}

state	service_district	remoteness	mapped_lgas	participants	avg_budget	utilisation	payment_per_participant	participants_per_provider	payment_share_top10	irsd_decile
NT	Darwin Urban	Outer Regional Australia	6	4,793	$56,479	81.2%	$95,172	97.5	NA	8.7
NT	Central Australia	Very Remote Australia	3	1,181	$116,076	83.4%	$206,752	5.9	NA	5.4
VIC	Southern Melbourne	Major Cities	3	36,206	$42,944	82.0%	$61,780	14.6	NA	6.0
NSW	South Western Sydney	Major Cities	7	65,541	$45,184	85.5%	$67,085	9.3	NA	3.5
NSW	Western Sydney	Major Cities	4	49,424	$46,861	85.1%	$69,767	8.6	NA	6.5
QLD	Beenleigh	Major Cities	2	33,961	$44,387	76.4%	$59,177	19.2	NA	4.6
SA	Northern Adelaide	Major Cities	4	41,658	$43,685	75.6%	$59,811	48.5	NA	3.8
QLD	Robina	Major Cities	1	29,218	$44,825	77.1%	$59,060	13.8	NA	8.0
VIC	North East Melbourne	Major Cities	5	33,581	$44,364	80.0%	$58,280	11.6	NA	8.0
QLD	Caboolture/Strathpine	Major Cities	1	32,442	$43,681	76.7%	$56,970	16.2	NA	7.0
VIC	Hume Moreland	Major Cities	2	23,331	$40,516	82.6%	$55,848	11.2	NA	5.1
QLD	Brisbane	Major Cities	1	53,192	$46,187	75.8%	$58,075	17.5	NA	9.0
VIC	Western Melbourne	Major Cities	5	34,330	$40,186	79.7%	$51,890	17.8	NA	8.2
SA	Eastern Adelaide	Major Cities	7	7,975	$47,136	75.3%	$68,222	8.7	NA	9.0
WA	South East Metro	Major Cities	4	12,801	$54,248	75.9%	$67,261	6.7	NA	6.6

Code

remoteness_profile <- latest_service_geo |>
  filter(!is.na(remoteness))

if (nrow(remoteness_profile) > 0) {
  remoteness_profile |>
    group_by(remoteness) |>
    summarise(
      service_districts = n_distinct(service_district),
      participants = sum(participants, na.rm = TRUE),
      avg_budget = safe_weighted_mean(avg_budget, participants),
      utilisation = safe_weighted_mean(utilisation, participants),
      payment_per_participant = safe_weighted_mean(payment_per_participant, participants),
      participants_per_provider = safe_weighted_mean(participants_per_provider, participants),
      payment_share_top10 = safe_mean(payment_share_top10),
      median_irsd_decile = safe_median(irsd_decile),
      .groups = "drop"
    ) |>
    arrange(median_irsd_decile, desc(participants)) |>
    mutate(
      participants = comma(participants),
      avg_budget = dollar(avg_budget, accuracy = 1),
      utilisation = percent(utilisation, accuracy = 0.1),
      payment_per_participant = dollar(payment_per_participant, accuracy = 1),
      participants_per_provider = number(participants_per_provider, accuracy = 0.1),
      payment_share_top10 = percent(payment_share_top10, accuracy = 0.1),
      median_irsd_decile = number(median_irsd_decile, accuracy = 0.1)
    ) |>
    kable()
} else {
  tibble(note = "Remoteness was not available in the downloaded LGA mapping or participants-by-LGA file for this render.") |>
    kable()
}

remoteness	service_districts	participants	avg_budget	utilisation	payment_per_participant	participants_per_provider	payment_share_top10	median_irsd_decile
Remote Australia	3	8,446	NA	NA	NA	NA	NA	3.5
Outer Regional Australia	17	155,850	NA	NA	NA	NA	NA	4.8
Very Remote Australia	9	65,617	NA	NA	NA	NA	NA	5.1
Inner Regional Australia	22	308,107	NA	NA	NA	NA	47.5%	6.6
Major Cities	29	726,626	NA	NA	NA	NA	NA	7.3

Appendix: richer geography context

Richer Geography Context

The richer geography layer adds two lenses: socioeconomic decile where the ABS join works, and support-class market concentration. Equity-specific signals are left out of this version so the main post stays focused on spend growth mechanics.

Code

seifa_summary <- model_data |>
  filter(quarter_date == max(quarter_date, na.rm = TRUE), !is.na(irsd_decile))

if (nrow(seifa_summary) > 0) {
  seifa_summary |>
    mutate(irsd_decile = round(irsd_decile)) |>
    group_by(irsd_decile) |>
    summarise(
      participants = sum(participant_count, na.rm = TRUE),
      avg_budget = safe_weighted_mean(avg_support_budget, participant_count),
      utilisation = safe_weighted_mean(utilisation_rate, participant_count),
      payment_per_participant = safe_weighted_mean(payment_per_participant, participant_count),
      participants_per_provider = safe_weighted_mean(participants_per_provider, participant_count),
      payment_share_top10 = safe_mean(payment_share_top10),
      .groups = "drop"
    ) |>
    arrange(irsd_decile) |>
    mutate(
      participants = comma(participants),
      avg_budget = dollar(avg_budget, accuracy = 1),
      utilisation = percent(utilisation, accuracy = 0.1),
      payment_per_participant = dollar(payment_per_participant, accuracy = 1),
      participants_per_provider = number(participants_per_provider, accuracy = 0.1),
      payment_share_top10 = percent(payment_share_top10, accuracy = 0.1)
    ) |>
    kable()
} else {
  tibble(note = "SEIFA was not joined in this render, so the SEIFA decile summary is skipped rather than showing an NA-heavy table.") |>
    kable()
}

irsd_decile	participants	avg_budget	utilisation	payment_per_participant	participants_per_provider	payment_share_top10
1	971	$41,837	68.9%	$32,431	14.6	NA
2	14,228	$48,098	75.7%	$66,012	2.4	NA
3	80,224	$44,776	82.9%	$65,833	8.6	NA
4	104,874	$42,036	76.3%	$56,692	32.6	NA
5	177,249	$44,154	74.9%	$58,817	8.4	NA
6	197,676	$42,288	75.6%	$57,515	12.3	NA
7	201,112	$44,378	79.0%	$60,664	7.7	NA
8	200,022	$42,906	76.0%	$54,773	15.1	NA
9	239,409	$44,985	76.1%	$58,183	12.2	NA
10	48,881	$44,640	77.6%	$64,913	37.9	47.5%

Code

market_concentration |>
  filter(market_quarter_date == max(market_quarter_date, na.rm = TRUE), support_class != "ALL") |>
  group_by(support_class) |>
  summarise(
    service_districts = n_distinct(service_district),
    median_top10_share = safe_median(payment_share_top10),
    high_concentration_share = safe_mean(as.numeric(payment_share_top10 >= 0.7)),
    .groups = "drop"
  ) |>
  arrange(desc(median_top10_share)) |>
  mutate(
    median_top10_share = percent(median_top10_share, accuracy = 0.1),
    high_concentration_share = percent(high_concentration_share, accuracy = 0.1)
  ) |>
  kable()

support_class	service_districts	median_top10_share	high_concentration_share
Capital	88	62.0%	35.2%
Core	89	34.0%	13.5%
Capacity Building	89	23.0%	4.5%

Outcomes, Spend and Service Access

The outcomes analysis has been moved into a separate companion post: NDIS Outcomes, Spend and Service Access. That page inspects the regional dashboard workbook schema directly, chooses a small set of high-value outcome families, and treats outcomes as possible investment channels rather than only as cost offsets.

Appendix: average budget and utilisation model detail

Model 1: Average Support Budget

The first model asks whether grouped participant characteristics, region, support class, provider supply, and ABS context explain differences in average support budgets. The expanded model now carries through service district plus joined LGA-level SEIFA, age-structure and population measures where the joins succeed. The test period is the latest available quarter in the study window, so the model is judged on a future quarter rather than a random split.

Code

budget_model_data <- model_data |>
  filter(!is.na(avg_support_budget), !is.na(participant_count), participant_count > 0) |>
  mutate(
    across(c(state, service_district, age_group, disability_group, support_class, remoteness), \(x) fct_na_value_to_level(fct_lump_n(as.factor(x), n = 20), level = "Missing")),
    log_budget = log1p(avg_support_budget),
    log_participants = log1p(participant_count),
    log_providers = log1p(coalesce(active_providers, 0)),
    log_participants_per_provider = log1p(coalesce(participants_per_provider, 0)),
    log_lga_population = log1p(coalesce(lga_population, 0)),
    payment_share_top10 = coalesce(payment_share_top10, median(payment_share_top10, na.rm = TRUE)),
    payment_share_top10 = ifelse(is.na(payment_share_top10) | is.nan(payment_share_top10), 0, payment_share_top10),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) replace_na(x, median(x, na.rm = TRUE))),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) ifelse(is.na(x) | is.nan(x), 0, x))
  )

test_quarter <- max(budget_model_data$quarter_date, na.rm = TRUE)
budget_train <- filter(budget_model_data, quarter_date < test_quarter)
budget_test <- filter(budget_model_data, quarter_date == test_quarter)

budget_baseline <- lm(log_budget ~ quarter_index + log_participants, data = budget_train)

factor_terms <- c("state", "service_district", "age_group", "disability_group", "support_class", "remoteness")
factor_terms <- factor_terms[map_int(budget_train[factor_terms], \(x) n_distinct(x, na.rm = TRUE)) > 1]
numeric_terms <- c("quarter_index", "log_participants", "log_providers", "log_participants_per_provider", "payment_share_top10", "irsd_score", "irsd_decile", "ier_score", "ier_decile", "ieo_score", "ieo_decile", "median_age", "pct_0_14", "pct_15_64", "pct_65_plus", "log_lga_population", "remoteness_rank", "remoteness_major_cities_share", "remoteness_inner_regional_share", "remoteness_outer_regional_share", "remoteness_remote_share", "remoteness_very_remote_share")
numeric_terms <- numeric_terms[map_lgl(budget_train[numeric_terms], \(x) isTRUE(sd(x, na.rm = TRUE) > 0))]
budget_formula <- as.formula(paste("log_budget ~", paste(c(numeric_terms, factor_terms), collapse = " + ")))

budget_expanded <- lm(budget_formula, data = budget_train)

set.seed(20260526)
budget_rf_train <- budget_train |>
  drop_na(log_budget) |>
  slice_sample(n = min(nrow(drop_na(budget_train, log_budget)), 8000))

budget_rf <- randomForest(
  budget_formula,
  data = budget_rf_train,
  ntree = 60,
  importance = TRUE
)

score_regression <- function(actual, predicted) {
  tibble(
    rmse = sqrt(mean((actual - predicted)^2, na.rm = TRUE)),
    mae = mean(abs(actual - predicted), na.rm = TRUE)
  )
}

partial_dependence_data <- function(model, data, features, response_transform = identity,
                                    x_transforms = list(), feature_labels = NULL,
                                    grid_n = 25, sample_n = 1500) {
  pd_sample <- data |>
    slice_sample(n = min(nrow(data), sample_n))

  map_dfr(features, \(feature) {
    if (!feature %in% names(pd_sample) || !is.numeric(pd_sample[[feature]])) {
      return(tibble())
    }

    x <- pd_sample[[feature]]
    x <- x[is.finite(x)]
    if (length(unique(x)) < 2) return(tibble())

    if (length(unique(x)) <= grid_n) {
      grid <- sort(unique(x))
    } else {
      grid <- quantile(x, probs = seq(0.02, 0.98, length.out = grid_n), na.rm = TRUE)
      grid <- sort(unique(as.numeric(grid)))
    }

    x_transform <- x_transforms[[feature]]
    if (is.null(x_transform)) x_transform <- identity
    feature_label <- feature_labels[[feature]]
    if (is.null(feature_label)) feature_label <- feature

    map_dfr(grid, \(value) {
      newdata <- pd_sample
      newdata[[feature]] <- value
      tibble(
        feature = feature,
        feature_label = feature_label,
        feature_value = x_transform(value),
        predicted_response = mean(response_transform(predict(model, newdata = newdata)), na.rm = TRUE)
      )
    })
  })
}

plot_partial_dependence <- function(pdp_data, y_label, line_colour) {
  if (nrow(pdp_data) == 0) {
    return(tibble(note = "No partial dependence data could be computed for the selected features.") |> kable())
  }

  ggplot(pdp_data, aes(feature_value, predicted_response)) +
    geom_line(linewidth = 0.8, colour = line_colour) +
    facet_wrap(~ feature_label, scales = "free_x", labeller = label_wrap_gen(width = 26)) +
    theme(axis.text.x = element_text(size = 8)) +
    labs(x = NULL, y = y_label)
}

budget_scores <- bind_rows(
  score_regression(budget_test$log_budget, predict(budget_baseline, newdata = budget_test)) |> mutate(model = "Baseline linear"),
  score_regression(budget_test$log_budget, predict(budget_expanded, newdata = budget_test)) |> mutate(model = "Expanded linear"),
  score_regression(budget_test$log_budget, predict(budget_rf, newdata = budget_test)) |> mutate(model = "Random forest")
) |>
  select(model, rmse, mae)

budget_scores |> kable(digits = 3)

model	rmse	mae
Baseline linear	1.053	0.843
Expanded linear	0.539	0.398
Random forest	0.289	0.187

Code

budget_predictions <- budget_test |>
  mutate(
    pred_log_budget = predict(budget_rf, newdata = budget_test),
    pred_budget = expm1(pred_log_budget),
    budget_residual = avg_support_budget - pred_budget
  )

ggplot(budget_predictions, aes(pred_budget, avg_support_budget)) +
  geom_abline(linetype = "dashed", colour = "grey50") +
  geom_point(aes(size = participant_count), alpha = 0.35, colour = "#1b6ca8") +
  scale_x_continuous(labels = dollar) +
  scale_y_continuous(labels = dollar) +
  scale_size_area(labels = comma) +
  labs(x = "Predicted average support budget", y = "Observed average support budget", size = "Participants")

Predicted versus observed average support budgets in the holdout quarter.

Code

importance(budget_rf) |>
  as.data.frame() |>
  tibble::rownames_to_column("feature") |>
  arrange(desc(IncNodePurity)) |>
  slice_head(n = 12) |>
  ggplot(aes(reorder(feature, IncNodePurity), IncNodePurity)) +
  geom_col(fill = "#1b6ca8") +
  coord_flip() +
  labs(x = NULL, y = "Random forest importance", title = "Most useful predictors for average support budget")

The clearest partial-dependence shape is provider pressure. In this fitted model, higher participants per active provider is associated with lower predicted average support budgets after a small low-pressure bump. The age-structure curves are weaker, but older local age profiles also lean slightly lower after the rest of the grouped data is averaged out.

Partial dependence: average support budget random forest

These curves hold the rest of the training distribution in place and vary one predictor at a time. They are useful for shape, not causality. Provider pressure is shown on its original scale even though the model uses log1p(participants_per_provider).

Code

budget_pdp <- partial_dependence_data(
  budget_rf,
  budget_rf_train,
  features = c("log_participants_per_provider", "median_age", "pct_65_plus", "irsd_decile", "payment_share_top10"),
  response_transform = expm1,
  x_transforms = list(
    log_participants_per_provider = expm1,
    pct_65_plus = \(x) x * 100,
    payment_share_top10 = \(x) x * 100
  ),
  feature_labels = list(
    log_participants_per_provider = "Participants per active provider",
    median_age = "Median LGA age",
    pct_65_plus = "LGA population aged 65+ (%)",
    irsd_decile = "SEIFA disadvantage decile",
    payment_share_top10 = "Top 10 provider payment share (%)"
  )
)

plot_partial_dependence(budget_pdp, y_label = "Predicted average support budget", line_colour = "#1b6ca8") +
  scale_y_continuous(labels = dollar)

Partial dependence curves for selected average support budget predictors.

Model 2: Plan Utilisation

Utilisation is a bounded rate, so this first model keeps the interpretation simple: it predicts the observed utilisation rate for grouped cells and checks where predictions differ most from the latest quarter. The same joined geography fields used in the budget model are available here when they have non-missing variation.

Code

util_model_data <- model_data |>
  filter(!is.na(utilisation_rate), !is.na(avg_support_budget), utilisation_rate >= 0, utilisation_rate <= 1.5) |>
  mutate(
    utilisation_rate = pmin(utilisation_rate, 1),
    across(c(state, service_district, age_group, disability_group, support_class, remoteness), \(x) fct_na_value_to_level(fct_lump_n(as.factor(x), n = 20), level = "Missing")),
    log_budget = log1p(avg_support_budget),
    log_participants = log1p(coalesce(participant_count, 0)),
    log_providers = log1p(coalesce(active_providers, 0)),
    log_participants_per_provider = log1p(coalesce(participants_per_provider, 0)),
    log_lga_population = log1p(coalesce(lga_population, 0)),
    payment_share_top10 = coalesce(payment_share_top10, median(payment_share_top10, na.rm = TRUE)),
    payment_share_top10 = ifelse(is.na(payment_share_top10) | is.nan(payment_share_top10), 0, payment_share_top10),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) replace_na(x, median(x, na.rm = TRUE))),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) ifelse(is.na(x) | is.nan(x), 0, x))
  )

util_test_quarter <- max(util_model_data$quarter_date, na.rm = TRUE)
util_train <- filter(util_model_data, quarter_date < util_test_quarter)
util_test <- filter(util_model_data, quarter_date == util_test_quarter)

util_baseline <- lm(utilisation_rate ~ quarter_index + log_budget, data = util_train)
util_factor_terms <- c("state", "service_district", "age_group", "disability_group", "support_class", "remoteness")
util_factor_terms <- util_factor_terms[map_int(util_train[util_factor_terms], \(x) n_distinct(x, na.rm = TRUE)) > 1]
util_numeric_terms <- c("quarter_index", "log_budget", "log_participants", "log_providers", "log_participants_per_provider", "payment_share_top10", "irsd_score", "irsd_decile", "ier_score", "ier_decile", "ieo_score", "ieo_decile", "median_age", "pct_0_14", "pct_15_64", "pct_65_plus", "log_lga_population", "remoteness_rank", "remoteness_major_cities_share", "remoteness_inner_regional_share", "remoteness_outer_regional_share", "remoteness_remote_share", "remoteness_very_remote_share")
util_numeric_terms <- util_numeric_terms[map_lgl(util_train[util_numeric_terms], \(x) isTRUE(sd(x, na.rm = TRUE) > 0))]
util_formula <- as.formula(paste("utilisation_rate ~", paste(c(util_numeric_terms, util_factor_terms), collapse = " + ")))

util_expanded <- lm(util_formula, data = util_train)

set.seed(20260526)
util_rf_train <- util_train |>
  drop_na(utilisation_rate) |>
  slice_sample(n = min(nrow(drop_na(util_train, utilisation_rate)), 8000))

util_rf <- randomForest(
  util_formula,
  data = util_rf_train,
  ntree = 60,
  importance = TRUE
)

util_scores <- bind_rows(
  score_regression(util_test$utilisation_rate, predict(util_baseline, newdata = util_test)) |> mutate(model = "Baseline linear"),
  score_regression(util_test$utilisation_rate, predict(util_expanded, newdata = util_test)) |> mutate(model = "Expanded linear"),
  score_regression(util_test$utilisation_rate, predict(util_rf, newdata = util_test)) |> mutate(model = "Random forest")
) |>
  select(model, rmse, mae)

util_scores |> kable(digits = 3)

model	rmse	mae
Baseline linear	0.176	0.131
Expanded linear	0.172	0.127
Random forest	0.163	0.114

Code

util_predictions <- util_test |>
  mutate(
    pred_utilisation = pmin(pmax(predict(util_rf, newdata = util_test), 0), 1),
    utilisation_residual = utilisation_rate - pred_utilisation
  )

ggplot(util_predictions, aes(pred_utilisation, utilisation_rate)) +
  geom_abline(linetype = "dashed", colour = "grey50") +
  geom_point(aes(size = participant_count), alpha = 0.35, colour = "#7a4b9d") +
  scale_x_continuous(labels = percent) +
  scale_y_continuous(labels = percent) +
  scale_size_area(labels = comma) +
  labs(x = "Predicted utilisation", y = "Observed utilisation", size = "Participants")

Predicted versus observed utilisation in the holdout quarter.

Code

importance(util_rf) |>
  as.data.frame() |>
  tibble::rownames_to_column("feature") |>
  arrange(desc(IncNodePurity)) |>
  slice_head(n = 12) |>
  ggplot(aes(reorder(feature, IncNodePurity), IncNodePurity)) +
  geom_col(fill = "#7a4b9d") +
  coord_flip() +
  labs(x = NULL, y = "Random forest importance", title = "Most useful predictors for plan utilisation")

The utilisation PDPs tell a different story from the budget model. Average support budget has the strongest visible positive shape, while participants per provider is close to flat after the rest of the model context is averaged over. That is a useful caution: provider pressure may matter in particular regions or supports, but it is not a simple national monotonic utilisation effect in this grouped model.

Partial dependence: plan utilisation random forest

These plots are most useful for asking access questions. For example, if predicted utilisation falls as participants per provider rises, that is consistent with provider pressure constraining use of plans. It is still not enough to prove access failure, because participant mix and support needs are only partly observed in the public files.

Code

util_pdp <- partial_dependence_data(
  util_rf,
  util_rf_train,
  features = c("log_participants_per_provider", "median_age", "log_budget", "pct_0_14", "irsd_decile", "payment_share_top10"),
  response_transform = \(x) pmin(pmax(x, 0), 1),
  x_transforms = list(
    log_participants_per_provider = expm1,
    log_budget = \(x) expm1(x) / 1000,
    pct_0_14 = \(x) x * 100,
    payment_share_top10 = \(x) x * 100
  ),
  feature_labels = list(
    log_participants_per_provider = "Participants per active provider",
    median_age = "Median LGA age",
    log_budget = "Average support budget ($000s)",
    pct_0_14 = "LGA population aged 0-14 (%)",
    irsd_decile = "SEIFA disadvantage decile",
    payment_share_top10 = "Top 10 provider payment share (%)"
  )
)

plot_partial_dependence(util_pdp, y_label = "Predicted utilisation", line_colour = "#7a4b9d") +
  scale_y_continuous(labels = percent)

Partial dependence curves for selected plan utilisation predictors.

The supporting models help separate the mechanics behind spend growth:

Average support budget: the strongest random-forest predictors are support class, disability group, age group, active providers, participants per active provider.
Plan utilisation: the strongest random-forest predictors are average support budget, disability group, support class, age group, active providers.

Model 3: Overall Spend

The main model asks how well public explanatory factors describe actual payment dollars for each grouped quarterly cell. I exclude lagged payments and lagged budgets, and I also keep current average budget and utilisation out of this model. Those variables are useful components of spend, but including them here would make the model less useful for describing upstream drivers such as participant scale, disability mix, age mix, geography and provider-market context.

Code

spend_model_data <- model_data |>
  filter(
    !is.na(payment_amount), payment_amount > 0,
    !is.na(participant_count), participant_count > 0
  ) |>
  mutate(
    across(c(state, service_district, age_group, disability_group, support_class, remoteness), \(x) fct_na_value_to_level(fct_lump_n(as.factor(x), n = 25), level = "Missing")),
    log_payment = log1p(payment_amount),
    log_participants = log1p(participant_count),
    log_providers = log1p(coalesce(active_providers, 0)),
    log_participants_per_provider = log1p(coalesce(participants_per_provider, 0)),
    log_lga_population = log1p(coalesce(lga_population, 0)),
    payment_share_top10 = coalesce(payment_share_top10, median(payment_share_top10, na.rm = TRUE)),
    payment_share_top10 = ifelse(is.na(payment_share_top10) | is.nan(payment_share_top10), 0, payment_share_top10),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) replace_na(x, median(x, na.rm = TRUE))),
    across(c(irsd_score, irsd_decile, ier_score, ier_decile, ieo_score, ieo_decile, median_age, pct_0_14, pct_15_64, pct_65_plus, remoteness_rank, remoteness_major_cities_share, remoteness_inner_regional_share, remoteness_outer_regional_share, remoteness_remote_share, remoteness_very_remote_share), \(x) ifelse(is.na(x) | is.nan(x), 0, x))
  )

spend_test_quarter <- max(spend_model_data$quarter_date, na.rm = TRUE)
spend_train <- filter(spend_model_data, quarter_date < spend_test_quarter)
spend_test <- filter(spend_model_data, quarter_date == spend_test_quarter)

spend_baseline <- lm(log_payment ~ quarter_index + log_participants, data = spend_train)

spend_factor_terms <- c("state", "service_district", "age_group", "disability_group", "support_class", "remoteness")
spend_factor_terms <- spend_factor_terms[map_int(spend_train[spend_factor_terms], \(x) n_distinct(x, na.rm = TRUE)) > 1]
spend_numeric_terms <- c(
  "quarter_index", "log_participants",
  "log_providers", "log_participants_per_provider", "payment_share_top10",
  "irsd_score", "irsd_decile", "ier_score", "ier_decile", "ieo_score", "ieo_decile",
  "median_age", "pct_0_14", "pct_15_64", "pct_65_plus", "log_lga_population",
  "remoteness_rank", "remoteness_major_cities_share", "remoteness_inner_regional_share",
  "remoteness_outer_regional_share", "remoteness_remote_share", "remoteness_very_remote_share"
)
spend_numeric_terms <- spend_numeric_terms[map_lgl(spend_train[spend_numeric_terms], \(x) isTRUE(sd(x, na.rm = TRUE) > 0))]
spend_expanded_formula <- as.formula(paste("log_payment ~", paste(c(spend_numeric_terms, spend_factor_terms), collapse = " + ")))
spend_expanded <- lm(spend_expanded_formula, data = spend_train)

set.seed(20260526)
spend_rf_train <- spend_train |>
  drop_na(log_payment) |>
  slice_sample(n = min(nrow(drop_na(spend_train, log_payment)), 10000))

spend_rf <- randomForest(
  spend_expanded_formula,
  data = spend_rf_train,
  ntree = 80,
  importance = TRUE
)

spend_scores <- bind_rows(
  score_regression(spend_test$log_payment, predict(spend_baseline, newdata = spend_test)) |> mutate(model = "Participant-scale linear"),
  score_regression(spend_test$log_payment, predict(spend_expanded, newdata = spend_test)) |> mutate(model = "Explanatory linear"),
  score_regression(spend_test$log_payment, predict(spend_rf, newdata = spend_test)) |> mutate(model = "Explanatory random forest")
) |>
  select(model, rmse, mae)

spend_scores |> kable(digits = 3)

model	rmse	mae
Participant-scale linear	1.355	1.176
Explanatory linear	0.501	0.361
Explanatory random forest	0.366	0.270

Code

spend_predictions <- spend_test |>
  mutate(
    pred_log_payment = predict(spend_rf, newdata = spend_test),
    pred_payment = expm1(pred_log_payment),
    payment_residual = payment_amount - pred_payment
  )

ggplot(spend_predictions, aes(pred_payment, payment_amount)) +
  geom_abline(linetype = "dashed", colour = "grey50") +
  geom_point(aes(size = participant_count), alpha = 0.35, colour = "#0b6b57") +
  scale_x_continuous(labels = dollar) +
  scale_y_continuous(labels = dollar) +
  scale_size_area(labels = comma) +
  labs(x = "Predicted grouped payments", y = "Observed grouped payments", size = "Participants")

Predicted versus observed grouped quarterly payments in the holdout quarter.

Code

importance(spend_rf) |>
  as.data.frame() |>
  tibble::rownames_to_column("feature") |>
  arrange(desc(IncNodePurity)) |>
  slice_head(n = 15) |>
  ggplot(aes(reorder(feature, IncNodePurity), IncNodePurity)) +
  geom_col(fill = "#0b6b57") +
  coord_flip() +
  labs(x = NULL, y = "Random forest importance", title = "Most useful predictors for grouped payment spend")

For overall spend, the strongest predictors in the random forest are support class, participant count, disability group, active providers, age group, participants per active provider. Fit is expected to be weaker than a lagged model, but the trade-off is interpretability: the model is forced to lean on observable mix, scale, geography and market variables rather than yesterday’s spend.

The partial dependence curves below are descriptive shape checks. They hold the rest of the training distribution in place and vary one numeric predictor at a time, so they are useful for asking “what does the fitted model tend to do as this factor changes?”, not for claiming causality.

Code

spend_pdp <- partial_dependence_data(
  spend_rf,
  spend_rf_train,
  features = c("log_participants", "log_providers", "log_participants_per_provider", "median_age", "pct_65_plus", "payment_share_top10"),
  response_transform = expm1,
  x_transforms = list(
    log_participants = expm1,
    log_providers = expm1,
    log_participants_per_provider = expm1,
    pct_65_plus = \(x) x * 100,
    payment_share_top10 = \(x) x * 100
  ),
  feature_labels = list(
    log_participants = "Participants",
    log_providers = "Active providers",
    log_participants_per_provider = "Participants per active provider",
    pct_65_plus = "LGA population aged 65+ (%)",
    payment_share_top10 = "Top 10 provider payment share (%)",
    median_age = "Median LGA age"
  )
)

plot_partial_dependence(spend_pdp, y_label = "Predicted grouped payments", line_colour = "#0b6b57") +
  scale_y_continuous(labels = dollar)

Partial dependence curves for selected grouped payment spend predictors.

Reading the Top Predictors

The top categorical predictors are not just statistical conveniences. support_class tells the model what kind of support is being paid for, disability_group captures broad differences in support needs and service patterns, and age_group captures lifecycle differences in both participant numbers and support intensity. The tables below are latest-quarter grouped payment summaries, so they should be read as model-cell summaries rather than unique participant counts.

For numeric predictors, the partial-dependence chart gives the main shape. Participant count is the scale variable: larger grouped cells have higher predicted payments. Active providers also tends to act as a market-size proxy, while participants per provider is more about local pressure or thinness. The age-structure and provider-concentration curves are weaker, but they help the model distinguish otherwise similar regions and support markets.

Support class

Code

format_spend_factor_summary(spend_support_summary) |>
  kable()

category	total_payments	paid_participants	payment_per_paid_participant	grouped_rows	median_cell_payment
Core	$33,376,937,000	445,707	$74,885	4,383	$3,661,000
Capital	$1,110,695,000	65,554	$16,943	2,363	$353,000

Disability group

Code

format_spend_factor_summary(spend_disability_summary) |>
  kable()

category	total_payments	paid_participants	payment_per_paid_participant	grouped_rows	median_cell_payment
Intellectual Disability	$9,013,455,000	89,121	$101,137	875	$2,755,000
Autism	$5,670,942,000	186,734	$30,369	535	$2,952,000
Psychosocial disability	$3,806,680,000	47,455	$80,217	345	$4,756,000
Other Neurological	$3,531,125,000	32,773	$107,745	647	$1,911,000
ABI	$2,885,983,000	22,634	$127,507	505	$2,748,000
Cerebral Palsy	$2,528,100,000	23,565	$107,282	755	$1,239,000
Stroke	$1,444,029,000	13,869	$104,119	337	$1,895,000
Other Physical	$1,239,948,000	24,950	$49,697	584	$964,000
Down Syndrome	$1,056,436,000	8,893	$118,794	312	$2,877,500
Multiple Sclerosis	$1,019,682,000	14,974	$68,097	375	$1,350,000

Age group

Code

format_spend_factor_summary(spend_age_summary) |>
  kable()

category	total_payments	paid_participants	payment_per_paid_participant	grouped_rows	median_cell_payment
55 to 64	$8,055,101,000	85,518	$94,192	1,337	$1,891,000
45 to 54	$5,436,150,000	57,052	$95,284	1,124	$1,563,000
65+	$5,167,903,000	53,947	$95,796	1,087	$1,701,000
25 to 34	$4,821,438,000	51,027	$94,488	678	$1,742,000
35 to 44	$4,459,941,000	44,441	$100,356	873	$1,730,000
19 to 24	$3,322,372,000	47,501	$69,943	451	$1,614,000
9 to 14	$1,485,000,000	86,229	$17,222	466	$603,500
15 to 18	$1,199,646,000	38,795	$30,923	308	$701,500
0 to 8	$540,081,000	46,751	$11,552	422	$355,000

Useful next explanatory data would include functional-capacity measures, plan duration and review timing, informal-support availability, support hours, provider wait times, workforce depth, plan-management type, local therapy availability, and direct measures of unmet demand. Those would help distinguish genuine support need from access friction and administrative timing.

Caveats

These are aggregate public datasets. They do not contain participant-level need, functional capacity, goals, informal supports or local service availability.
A plan budget is not the same thing as a payment, and a payment is not the same thing as a good outcome.
Disability-group payment trends use public aggregate rows. Where a national disability total is not exposed, support-class rows are summed and treated as a directional spend signal rather than a unique participant count.
Utilisation can be low because a participant does not need a support, cannot find a provider, faces administrative barriers, or has timing issues inside a plan. The public data cannot separate all of these explanations.
Regional ABS joins are approximate because NDIS service districts and ABS geographies do not always align perfectly.
Outcomes are handled in the companion post because the dashboard workbook structure and investment interpretation need their own workflow.

Next Steps

The next useful extension would be to connect NDIS service access with MBS/PBS or hospital-pressure indicators, especially for regions where utilisation, provider concentration and outcomes all point in the same direction.