Excel Python How to group names alphabetically section wise such that each group has equal number of Alphabets

Question

I'm working with a university and have an Excel dummy sheet with 879 students on it that I need to analyze some data within Python. Several columns in this Excel document include:

Student name
Section
% of attendance
Enrollment No.

But the names of the students are not correctly arranged alphabetically.

I wish to divide the students evenly by "Section" so that each Section has an equal amount of students whose names begin with each letter.

I've tried sorting the data in Excel alphabetically, but it results in the opposite of what I want: an alphabetical order for the entire column. Instead, I would prefer the data to be organized "Section"-wise, with students in each section whose names begin with each alphabet, or with none if all the names beginning with that alphabet have already been used in prior sections. There would be an equal number of students in each section, if not nearly so.

For example:
the dataset 12 sections with 879 students:
Section A has 74 students
Section B has 74 students
Section C has 74 students
Section D has 73 students
Section E has 73 students
Section F has 73 students
Section G has 73 students
Section H has 73 students
Section I has 73 students
Section J has 73 students
Section K has 73 students
Section L has 73 students

Number of students having first character A is 89
Number of students having first character B is 47
Number of students having first character C is 7
    :         :            :              :
    :         :            :              :  
Number of students having first character Y is 1
Number of students having first character Z is 2

My goal:

Section A will have: 
 - 7 students whose name start with A (89/12 = 7 students)
 - 3 students whose name start with B (47/12 = 3 students) 
 - 1 student whose name start with C  (As 7<12, so cant put all students in all sections, so 1)
    :         :            :             
    :         :            :                

 - 1 student whose name start with Y   
 - 1 student whose name start with Z

Section B will have: 
- 7 students whose name start with A
  :    :      :     :     :    :
  :    :      :     :     :    :
- 0 students whose name start with Y  (As no student left with alphabet starting with Y)
- 1 student whose name start with Z

Is there a way to achieve this using Excel or Python Pandas Library query?
Here is my Excel Sheet

narikkadan · Answer 1 · Dec 10, 2022

Here is a flimsy answer. The number of times a specific name appears among all names beginning with the same letter should be compared to the number of names beginning with that letter that should be in each section before assigning a section number. :

=LET(names,E5:E883,
rows,ROWS(names),
sections,12,
seq,SEQUENCE(rows,1,0),
startrow,XLOOKUP(LEFT(names,1)&"*",names,seq,,2),
counts,COUNTIF(names,LEFT(names,1)&"*"),
countspersection,counts/sections,
occurrence,seq-startrow,
section,QUOTIENT(occurrence+0.5,countspersection),
section)

Then sort on the assigned section number:

=LET(names,E5:E883,
rows,ROWS(names),
sections,12,
seq,SEQUENCE(rows,1,0),
startrow,XLOOKUP(LEFT(names,1)&"*",names,seq,,2),
counts,COUNTIF(names,LEFT(names,1)&"*"),
countspersection,counts/sections,
occurrence,seq-startrow,
section,QUOTIENT(occurrence+0.5,countspersection),
SORTBY(names,section,1,names,1))

The size of each section will vary depending on how many of the numbers assigned are underestimates and how many overestimates compared to the theoretical value (for example, there should be 7.4 A's in each section, but in practice, there can only be 7 or 8). This is because the number of students can only be a whole number. I've done some research on this, and the results show that the group sizes are as follows:

It's quite difficult to get both the group sizes and numbers per letter exactly right - you would need an iterative method to take this further I think.