I’ve linked the poster that my team presented at the SURP Symposium below, and it’s a good overview, but the writeup just beneath that certainly goes into more depth. Both provide a good overview of the project, so its a bit of a pick your poison decision.

Intro / Goals

The problem statement for this project was clear, we wanted to make the tedious and often conflict inducing process of assigning students to project teams easy. We had to be able to respect who wants which project, who wants to work together (or not), and keep group sizes even all at once. We also wanted the tool to be able to take a survey of preferences and “who to team with,” as this was the existing input for this process. The tool would then automatically propose good team assignments and let instructors or admins explore trade-offs instead of guessing in a spreadsheet.

I worked from an existing codebase that had grown out of architecture-design work (Vassar/Jess and MATLAB) from my advisor/mentor's own collegiate work and research. The goal on my side was to adapt and extend it into a usable auto-grouping tool that had better default parameters, clearer handling of group-size constraints, and a more practical workflow. This included making a GUI for loading rules, running the optimizer, and inspecting results.

What the Tool Does

AutoGroup is a MATLAB application that assigns students to project teams using a multi-objective genetic algorithm. It optimizes two objectives at once:

  • Project preference score — How well each student’s assigned project matches their stated preferences (from the survey).
  • Member synergy score — Bonuses for placing “preferred teammates” together and penalties for placing “do not team with” pairs in the same group (also derived from the survey.

Group sizes are enforced via a penalty so that teams stay close to an ideal size set by the user (e.g., 4). The solver produces a set of Pareto-optimal assignments which allows the user to then pick a solution that balances preference satisfaction and team member synergy (from a scatter plot in the GUI). Input is an Excel rulesheet (students × projects, preferences, and an optional “member_pref” sheet for preferred/do-not-team). Output is the same data plus Pareto solutions and, in the GUI, interactive selection and export. Under the hood, the project preference score is computed by a Jess rule engine (Vassar-style). The spreadsheet is turned into rules, and each candidate assignment is asserted as a fact and evaluated. That keeps the logic flexible and consistent with the original research stack.

Tech Stack

CategoryTechnology
Language / environmentMATLAB
OptimizationMulti-objective GA (gamultiobj), integer chromosome (student → project index)
Preference scoringJess rule engine (Vassar templates.clp, generated prefrules.clp / aggrules.clp)
Input / outputExcel rulesheets (preferences + optional synergy sheet)
UIMATLAB App (scatter plot, team builder table, load/run/export)

Changes I Made (Contributions)

Below are the main areas I improved this tool upon.

1. Research-based default parameters

When a rulesheet is loaded in the GUI, population size, number of generations, and penalty weight are no longer fixed magic numbers. They are set from formulas tied to the size of the problem (number of students and projects), as suggested in the research materials:

  • Population size: 125 × log10(students) × log10(projects)
  • Generations: 20 × log10(students) × projects
  • Penalty weight: 14 × (students − projects)

So the GA scales with class and project count instead of using one-size-fits-all defaults. This lives in loadRules.m (under the “Continue” branch when loading a new rule file).

2. Stronger, cubic penalty for group-size violation

Group-size feasibility is enforced by penalizing solutions whose team sizes deviate from the ideal. I replaced the earlier penalty with a cubic deviation term so that large deviations are punished much more than small ones:

  • For each team, compute deviation from ideal size (only for non-empty teams).
  • Penalty term: (sum(deviation³) + 3) × penalty_weight / number_of_teams.

This is applied both in the main GA loop (GA_autoGroup.m) and in the fitness evaluation (fordff.m). I also added bounds checks so that group indices stay valid (1 to number of projects) and don’t cause silent indexing errors.

3. Feasibility when class size doesn’t divide evenly

When the number of students isn’t a multiple of the ideal group size, you can’t have every team exactly at the ideal. The code now treats a solution as feasible when:

  • If there is a remainder (students mod ideal ≠ 0): every non-empty team is either at the ideal size or one member short (ideal − 1), as needed to account for the remainder.
  • If there is no remainder: every non-empty team is exactly the ideal size.

So “one member away” is explicitly allowed when the math requires it. This logic appears in GA_autoGroup.m and evaluate_score.m and is used to mark solutions as feasible and to highlight them in the UI (e.g., stars on the scatter plot).

4. GUI and workflow

I extended the GUI so that the user can:

  • Load a rulesheet from a rules/*.xlsx file and have the research-based parameters filled in automatically.
  • See a Team Builder table: rows = students, columns = projects, with optional project ID shortening (e.g., strip suffix after _ for readability).
  • View a scatter plot of the two objectives (preference vs. synergy), with feasible solutions highlighted and click-to-select to see the corresponding assignment in the table.
  • Use a toggle (e.g., “All” vs. filtered) to show only solutions with low penalty when desired.
  • Reset state (clear population, scores, plot, table) before loading new rules or re-running.

Styling in the Team Builder (e.g., green for “correct group size,” blue for preference cells, and labels like “(n=4)” and “(pref=3)”) were added so users can quickly see how well a selected solution satisfies preferences and group sizes. The scatter plot’s click callback finds the nearest point and updates the builder table to that architecture.

5. Synergy preprocessing and name matching

The synergy matrix (who wants to work with who) is built from a “member_pref” sheet (e.g., PreferredTeam, DoNotTeam columns). Names in that sheet often don’t match the main roster exactly, so I used strnearest (Levenshtein-based, with optional case-insensitive matching) to map free-text names to the canonical student list before building the synergy matrix in createSynergy.m. That makes the tool more robust to real life data with typos or formatting issues.

Workflow / Outcome

In practice, a professor or admin can:

  1. Export survey data into the Excel rulesheet format (and optionally the member_pref sheet).
  2. Open the app, load the rulesheet, and (optionally) tweak ideal group size or penalty.
  3. Run the GA; the app shows the Pareto front and feasible solutions.
  4. Click points on the scatter plot to inspect different trade-offs and pick an assignment.
  5. Export the chosen assignment (e.g., to Excel) for publishing or further editing.

The combination of multi-objective search, explicit feasibility handling, and research-based defaults makes it possible to get good, balanced team assignments without hand making groups and tuning for every new class. The Jess integration keeps the preference logic transparent and easy to extend (e.g., different scoring schemes) without rewriting the entire project as a whole.

Conclusion

This project was a massive undertaking for me at a young point in my career, but I consider it one of the greatest growth opportunities I've had thus far. Getting to work hand in hand with a mentor/advisor and build upon the work of another developer (in this case that very same mentor) was an experience that I feel has been valuable in my industry work thus far and likely will continue to be. The core work that many of us developers due is tantamount to standing on the shoulders of all those that have come before us, and placing another brick on the foundation that our predecessors have laid before us. This was the greatest element I took away from this project.