I have 5 variables:
Academicyear,
dept,
InstrGID
curenr
credit_hours
I need the mean of credit hours grouped by Academic year, dept, and InstrGID. The mean must exclude the zeros when "curenr" equals 0. However, the zero values should remain in the newly created dataset. The below script runs the average excluding the zeros, but the created dataset now deletes the 0 values.
How I can create the mean variable without deleting the 0 values from the dataset?
Dataset example:
| Academicyear | Dept | INSTRGID | curenr | Credit_hours | mean_credit_hours |
|---|---|---|---|---|---|
| 2022 | CS | G11111111 | 10 | 20 | 25 |
| 2022 | CS | G11111111 | 10 | 30 | 25 |
| 2023 | GDA | G11111111 | 20 | 10 | 50 |
| 2023 | GDA | G11111111 | 20 | 90 | 50 |
| 2023 | GDA | G22222222 | 0 | 0 | - |
| 2023 | GDA | G22222222 | 20 | 30 | 35 |
| 2023 | GDA | G22222222 | 20 | 40 | 35 |
My current script:
proc sql;
create table avg_cred_hrs as
select *,
mean(Credit_Hours) as mean_credit_hours
from work.QUERY_FOR_XLSTCOUNT
where CURENR >0
group by AcademicYear, dept , INSTRGID;
quit;
Use a
CASEstatement that returns missing whenCURENRis 0, andCredit_Hoursotherwise.mean()will exclude all missing values.