I'm working with a university and have an Excel dummy sheet with 879 students on it that I need to analyze some data within Python. Several columns in this Excel document include:

1. Student name
2. Section
3. % of attendance
4. Enrollment No.

But the names of the students are not correctly arranged alphabetically.

I wish to divide the students evenly by "Section" so that each Section has an equal amount of students whose names begin with each letter.

I've tried sorting the data in Excel alphabetically, but it results in the opposite of what I want: an alphabetical order for the entire column. Instead, I would prefer the data to be organized "Section"-wise, with students in each section whose names begin with each alphabet, or with none if all the names beginning with that alphabet have already been used in prior sections. There would be an equal number of students in each section, if not nearly so.

```For example:
the dataset 12 sections with 879 students:
Section A has 74 students
Section B has 74 students
Section C has 74 students
Section D has 73 students
Section E has 73 students
Section F has 73 students
Section G has 73 students
Section H has 73 students
Section I has 73 students
Section J has 73 students
Section K has 73 students
Section L has 73 students

Number of students having first character A is 89
Number of students having first character B is 47
Number of students having first character C is 7
:         :            :              :
:         :            :              :
Number of students having first character Y is 1
Number of students having first character Z is 2
```

My goal:

```Section A will have:
- 7 students whose name start with A (89/12 = 7 students)
- 3 students whose name start with B (47/12 = 3 students)
- 1 student whose name start with C  (As 7<12, so cant put all students in all sections, so 1)
:         :            :
:         :            :

Section B will have:
:    :      :     :     :    :
:    :      :     :     :    :
- 0 students whose name start with Y  (As no student left with alphabet starting with Y)

Is there a way to achieve this using Excel or Python Pandas Library query?
Here is my Excel Sheet

Dec 9, 2022 in Others 66 views

## 1 answer to this question.

Here is a flimsy answer. The number of times a specific name appears among all names beginning with the same letter should be compared to the number of names beginning with that letter that should be in each section before assigning a section number. :

```=LET(names,E5:E883,
rows,ROWS(names),
sections,12,
seq,SEQUENCE(rows,1,0),
startrow,XLOOKUP(LEFT(names,1)&"*",names,seq,,2),
counts,COUNTIF(names,LEFT(names,1)&"*"),
countspersection,counts/sections,
occurrence,seq-startrow,
section,QUOTIENT(occurrence+0.5,countspersection),
section)
```

Then sort on the assigned section number:

```=LET(names,E5:E883,
rows,ROWS(names),
sections,12,
seq,SEQUENCE(rows,1,0),
startrow,XLOOKUP(LEFT(names,1)&"*",names,seq,,2),
counts,COUNTIF(names,LEFT(names,1)&"*"),
countspersection,counts/sections,
occurrence,seq-startrow,
section,QUOTIENT(occurrence+0.5,countspersection),
SORTBY(names,section,1,names,1))
``` The size of each section will vary depending on how many of the numbers assigned are underestimates and how many overestimates compared to the theoretical value (for example, there should be 7.4 A's in each section, but in practice, there can only be 7 or 8). This is because the number of students can only be a whole number. I've done some research on this, and the results show that the group sizes are as follows: It's quite difficult to get both the group sizes and numbers per letter exactly right - you would need an iterative method to take this further I think.

• 53,520 points

## Python - how to read contents of an excel file

xlwings is an excellent library to interact with ...READ MORE

## Group by Sum in Excel

It is very easy and you can ...READ MORE

## What are the RGB and HEX codes of the 3 color conditional format in Excel?

In Excel 2016 at least the colors ...READ MORE

## Convert column in excel date format (DDDDD.tttt) to datetime using pandas

Given # s = df['date'] s 0 ...READ MORE