Handling Repeated Courses in GPA Calculation Script in Bash

101 Views Asked by At

I'm working on a Bash script to calculate GPA for students, taking into account courses that might have been repeated. My goal is to ensure that if a student retakes a course, only the latest grade is considered in the GPA calculation. Additionally, I want to exclude subjects without a grade, as these represent courses that are currently in progress.

I'm using Bash version 5, so associative arrays are available to me, and I've attempted to utilize them to track the latest attempt for each course.

However, my script is not functioning as expected—it considers both attempts of a repeated course instead of just the latest one. Below is the relevant portion of my script:

#!/bin/bash

# Define a function to convert grades to points
grade_to_points() {
  case $1 in
    A) echo 4 ;;
    A-) echo 3.7 ;;
    B+) echo 3.3 ;;
    B) echo 3 ;;
    B-) echo 2.7 ;;
    C+) echo 2.3 ;;
    C) echo 2 ;;
    D) echo 1 ;;
    F) echo 0 ;;
    *) echo -1 ;; # for subjects with no grade
  esac
}

# Extract the list of subject files and student IDs
subject_files=()
student_ids=()
read_subjects=true
for arg in "$@"; do
  if [[ $arg == "student" ]]; then
    read_subjects=false
    continue
  fi
  if $read_subjects; then
    subject_files+=($arg)
  else
    student_ids+=($arg)
  fi
done

# Loop through each student ID to generate the transcript
for student_id in "${student_ids[@]}"; do
  # Get the student's name from student.dat
  student_name=$(grep "^$student_id" student.dat | cut -d ' ' -f 2-)
  echo "Transcript for $student_id $student_name"
  
  total_points=0
  subjects_count=0
  
  # Loop through each subject file to find grades for the student
  for subject_file in "${subject_files[@]}"; do
    if grep -q "^$student_id" $subject_file; then
      # Extract subject details and grade
      subject_code=$(head -n 1 $subject_file | cut -d ' ' -f 2)
      academic_year=$(head -n 1 $subject_file | cut -d ' ' -f 3)
      semester=$(head -n 1 $subject_file | cut -d ' ' -f 4)
      grade=$(grep "^$student_id" $subject_file | awk '{print ($2=="") ? "" : $2}')
      
      # Print subject details
      echo "$subject_code $academic_year Sem $semester $grade"
      
      # Calculate GPA if grade is present
      if [[ $grade != "" ]]; then
        points=$(grade_to_points $grade)
        if [[ $points != -1 ]]; then
          total_points=$(echo "$total_points + $points" | bc)
          ((subjects_count++))
        fi
      fi
    fi
  done
  
  # Calculate and print GPA if there are graded subjects
  if [[ $subjects_count -gt 0 ]]; then
    gpa=$(echo "scale=2; $total_points / $subjects_count" | bc)
    echo "GPA for $subjects_count subjects $gpa"
  else
    echo "No graded subjects found."
  fi
  echo # New line for separation
done

Wrong Output I get for Input:

transcript COMP* student 1236 1234 1223
Transcript for 1236 peter
COMP1011 2021 Sem 2 A
COMP2411 2022 Sem 1 A
COMP2432 2022 Sem 2 
GPA for 2 subjects 4.00

Transcript for 1234 john
COMP1011 2021 Sem 2 B
COMP2411 2022 Sem 1 B-
GPA for 2 subjects 2.85

Transcript for 1223 bob
COMP1011 2021 Sem 2 F
COMP1011 2022 Sem 1 B
COMP2411 2022 Sem 1 C+
COMP2432 2022 Sem 2 
GPA for 3 subjects 1.76

Correct Output I should get for Input:

transcript COMP* student 1236 1234 1223
Transcript for 1236 peter
COMP1011 2021 Sem 2 A
COMP2411 2022 Sem 1 A
COMP2432 2022 Sem 2
GPA for 2 subjects 4.00

Transcript for 1234 john
COMP1011 2021 Sem 2 B
COMP2411 2022 Sem 1 B
GPA for 2 subjects 2.85

Transcript for 1223 bob
COMP1011 2021 Sem 2 F
COMP1011 2022 Sem 1 B
COMP2411 2022 Sem 1 C+
COMP2432 2022 Sem 2
GPA for 2 subjects 2.65

Files and content I used:

student.dat

1223 bob
1224 kevin
1225 stuart
1226 otto
1234 john
1235 mary
1236 peter
1237 david
1238 alice

COMP101121S2.dat

Subject COMP1011 2021 2
1223 F
1234 B
1235 B+
1236 A

COMP101122S1.dat 

Subject COMP1011 2022 1
1223 B
1224 B+
1225 B1238 C+

COMP241122S1.dat 

Subject COMP2411 2022 1
1223 C+
1234 B1235 B
1236 A

COMP243222S2.dat

Subject COMP2432 2022 2
1223
1235
1236
1237

Here's what I've tried:

  1. Using associative arrays to track the most recent attempt for each course. Unfortunately, this doesn't seem to be working as expected.

  2. Ensuring my Bash version supports associative arrays, which it does since I'm on version 5.

My requirements are:

  1. If a student fails and retakes the same subject, consider only the latest grade in the GPA calculation.

  2. Ignore failed grades if a retake is present, but include them if there's no retake.

  3. Do not count subjects with no grade, as they represent ongoing courses.

Can anyone advise on how to modify my script to meet these requirements? Any help would be greatly appreciated!

2

There are 2 best solutions below

2
Ed Morton On

Here is a significant starting point for how to do this using GNU awk for arrays of arrays, gensub(), PROCINFO["sorted_in"], and \S/\s:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    function grade_to_points(grade) {
        if ( dfltPts == "" ) {
            grade2pts["A"]  = 4
            grade2pts["A-"] = 3.7
            grade2pts["B+"] = 3.3
            grade2pts["B"]  = 3
            grade2pts["B-"] = 2.7
            grade2pts["C+"] = 2.3
            grade2pts["C"]  = 2
            grade2pts["D"]  = 1
            grade2pts["F"]  = 0
            dfltPts         = -1 # for subjects with no grade
        }
        return ( grade in grade2pts ? grade2pts[grade] : dfltPts )
    }

    NR == FNR {
        studId = $1
        studName = gensub(/^\S+\s+/,"",1)
        studId2Name[studId] = studName
        next
    }
    FNR == 1 {
        courseSubj = $2
        courseYear = $3
        courseSem  = $4
        next
    }
    {
        studId = $1
        grade = $2
        grades[studId][courseSubj][courseYear][courseSem] = grade
    }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for ( id in grades ) {
            totPoints = totSubjs = 0
            print "Transcript for", id, studId2Name[id]
            for ( subj in grades[id] ) {
                totSubjs ++
                for ( year in grades[id][subj] ) {
                    for ( sem in grades[id][subj][year] ) {
                        grade = grades[id][subj][year][sem]
                        totPoints += grade_to_points(grade)
                        print subj, year, "Sem", sem, grade
                    }
                }
            }
            gpa = ( totSubjs ? totPoints / totSubjs : 0 )
            printf "GPA for %d subjects %0.2f\n\n", totSubjs, gpa
        }
    }
' student.dat COMP101121S2.dat COMP101122S1.dat COMP241122S1.dat COMP243222S2.dat

$ ./tst.sh
Transcript for 1223 bob
COMP1011 2021 Sem 2 F
COMP1011 2022 Sem 1 B
COMP2411 2022 Sem 1 C+
COMP2432 2022 Sem 2
GPA for 3 subjects 1.43

Transcript for 1224 kevin
COMP1011 2022 Sem 1 B+
GPA for 1 subjects 3.30

Transcript for 1225 stuart
COMP1011 2022 Sem 1 B
GPA for 1 subjects 3.00

Transcript for 1234 john
COMP1011 2021 Sem 2 B
COMP2411 2022 Sem 1 B
GPA for 2 subjects 3.00

Transcript for 1235 mary
COMP1011 2021 Sem 2 B+
COMP2411 2022 Sem 1 B
COMP2432 2022 Sem 2
GPA for 3 subjects 1.77

Transcript for 1236 peter
COMP1011 2021 Sem 2 A
COMP2411 2022 Sem 1 A
COMP2432 2022 Sem 2
GPA for 3 subjects 2.33

Transcript for 1237 david
COMP2432 2022 Sem 2
GPA for 1 subjects -1.00

Transcript for 1238 alice
COMP1011 2022 Sem 1 C+
GPA for 1 subjects 2.30

It doesn't do what you want regarding ignoring failed grades if there's a retake or ignoring subjects with no grade (got to leave something for you to do!) but hopefully you'll find the above much easier to understand and modify to do whatever it is you want to do than your existing bash script. If you can't figure out how to do it yourself you can always ask a new question using an awk script as your code sample instead of your bash script.

0
markp-fuso On

Focusing solely on a 'quick fix' for OP's current code so that GPA is only calculated on the latest grade for a course that's been taken more than once ...


OP's current code is failing to ignore a grade when the student retakes the same course (and assumedly gets a better grade). In other words the current code calculates GPA on all grades for that course instead of just the latest grade.

One idea for keeping only the latest grade for a given course:

  • for a given student ...
  • process all grades in year/semester order (for OP's example it looks like printf "%s\n" COMP* | sort -V should be sufficient to insure files are processed in year/semester order)
  • save grades in an array that is indexed on the course (aka subject_code) (eg, grades[COMP1011]='F')
  • when a grade is processed for a course that's been retaken, the new grade will overwrite the older grade for the same course (eg, grades[COMP1011]='B+' will overwrite the previous grades[COMP1011]='F')
  • once the grades[] array has been processed for all of a student's courses then calculate GPA based on the contents of the array

Modications to OP's current code:

Insure input files are sorted in year/semester order

############
### replace this:

for subject_file in "${subject_files[@]}"; do
...
done

### with this:

while read -r subject_file; do
...
done < <( printf "%s\n" "${subject_files[@]}" | sort -V )

NOTE: assumes no file names contain embedded linefeeds; this appears to be the case with OP's data so I'm ignoring this as a potential issue

Populate the grades[] (associative) array

############
### replace this:

for student_id in "${student_ids[@]}"; do

### with this:

unset      grades
declare -A grades

for student_id in "${student_ids[@]}"; do
    grades=()          

############
### after subject_code and grade have been determined add the following:

grades[${subject_code}]="${grade}"

Move the points/total_points/subjects_count calculations outside the for student_id in "${student_ids[@]}" loop and before the if/else/fi block where gpa is calculated and printed. Feed the points/total/points/subjects_count calculations from the grades[] array

############
### new block of code:

for grade in "${grades[@]}"
do
   if [[ $grade != "" ]]; then
      points=$(grade_to_points $grade)
      if [[ $points != -1 ]]; then
        total_points=$(echo "$total_points + $points" | bc)
        ((subjects_count++))
      fi
   fi
done

Rolling all of these changes into OP's current script:

#!/bin/bash

# Define a function to convert grades to points
grade_to_points() {
  case $1 in
    A) echo 4 ;;
    A-) echo 3.7 ;;
    B+) echo 3.3 ;;
    B) echo 3 ;;
    B-) echo 2.7 ;;
    C+) echo 2.3 ;;
    C) echo 2 ;;
    D) echo 1 ;;
    F) echo 0 ;;
    *) echo -1 ;; # for subjects with no grade
  esac
}

# Extract the list of subject files and student IDs
subject_files=()
student_ids=()
read_subjects=true
for arg in "$@"; do
  if [[ $arg == "student" ]]; then
    read_subjects=false
    continue
  fi
  if $read_subjects; then
    subject_files+=($arg)
  else
    student_ids+=($arg)
  fi
done

unset      grades                                                      # new
declare -A grades                                                      # new

# Loop through each student ID to generate the transcript
for student_id in "${student_ids[@]}"; do
  grades=()                                                            # new; reset array for new student

  # Get the student's name from student.dat
  student_name=$(grep "^$student_id" student.dat | cut -d ' ' -f 2-)
  echo "Transcript for $student_id $student_name"

  total_points=0
  subjects_count=0

  # Loop through each subject file to find grades for the student
  while read -r subject_file; do                                       # modified
    if grep -q "^$student_id" $subject_file; then
      # Extract subject details and grade
      subject_code=$(head -n 1 $subject_file | cut -d ' ' -f 2)
      academic_year=$(head -n 1 $subject_file | cut -d ' ' -f 3)
      semester=$(head -n 1 $subject_file | cut -d ' ' -f 4)
      grade=$(grep "^$student_id" $subject_file | awk '{print ($2=="") ? "" : $2}')

      # Print subject details
      echo "$subject_code $academic_year Sem $semester $grade"

      grades[${subject_code}]="${grade}"                               # new

    fi
  done < <( printf "%s\n" "${subject_files[@]}" | sort -V )            # modified

  for grade in "${grades[@]}"                                          # new
  do                                                                   # |
     if [[ $grade != "" ]]; then                                       # moved
        points=$(grade_to_points $grade)                               # |
        if [[ $points != -1 ]]; then                                   # |
          total_points=$(echo "$total_points + $points" | bc)          # |
          ((subjects_count++))                                         # |
        fi                                                             # |
     fi                                                                # |
  done                                                                 # new

  # Calculate and print GPA if there are graded subjects
  if [[ $subjects_count -gt 0 ]]; then
    gpa=$(echo "scale=2; $total_points / $subjects_count" | bc)
    echo "GPA for $subjects_count subjects $gpa"
  else
    echo "No graded subjects found."
  fi
  echo # New line for separation
done

Taking for a test drive with OP's example run:

$ ./transcript2 COMP* student 1236 1234 1223
Transcript for 1236 peter
COMP1011 2021 Sem 2 A
COMP2411 2022 Sem 1 A
COMP2432 2022 Sem 2
GPA for 2 subjects 4.00

Transcript for 1234 john
COMP1011 2021 Sem 2 B
COMP2411 2022 Sem 1 B
GPA for 2 subjects 3.00

Transcript for 1223 bob
COMP1011 2021 Sem 2 F
COMP1011 2022 Sem 1 B
COMP2411 2022 Sem 1 C+
COMP2432 2022 Sem 2
GPA for 2 subjects 2.65

NOTES:

  • while the script now generates OP's desired output there are several performance issues with the current/new script
  • two major performance issues are a) scanning files more than once and b) excessive subshell invocations (eg, $(grep/head ... | awk/cut ...))
  • a complete rewrite of the script to eliminate these two performance issues should improve performance by a magnitude for the sample data set (even more for larger data sets)
  • performance could be improved further with a complete rewrite in a language/tool more suited to text processing (eg, awk, python, perl, etc)