IPE-24 Classroom

Production

2025-06-1518 min

Next.jsPostgreSQLPrismaRedisTelegramFastAPI

IPE-24 Classroom: Building a Zero-Cost Operating System for a University Batch

Executive Summary

IPE-24 Classroom started as a class portal, but that description is too small for what the system actually became. The real problem was not "students need a website." The real problem was that one batch of students was coordinating academic life across too many inconsistent channels: verbal updates, WhatsApp threads, Google Drive folders, routine changes, exam notices, Discord messages, and Class Representative memory.

One trusted source of truth for class state.
Fast publishing across multiple communication channels.
Strong enough guardrails that the wrong person, wrong request, or wrong automation could not corrupt the batch's academic workflow.

What follows is the real case study: the problem, the system I designed, every major feature, the technical hassle behind each layer, the security posture, and the recurring architectural patterns I learned to recognize.

The Problem I Was Actually Solving

At surface level, the product looks familiar: announcements, routine, resources, exams, chatbot, admin dashboard.

At system level, the product solves a much nastier problem: academic coordination in an environment where information changes frequently, trust is role-sensitive, and the communication surface is fragmented.

The class had several recurring pain points:

Announcements were getting buried in chat streams.
Routine changes were temporal, not permanent, so a simple timetable table was never enough.
Files existed in Google Drive, but discoverability was weak and folder hygiene degraded over time.
Exam information was time-sensitive and needed structured metadata, not just message text.
The Class Representative needed to publish quickly from mobile, not from a laptop admin panel every time.
Students kept asking the same contextual questions: what classes are today, what changed this week, where is the file, what exam is next.

The product therefore had to behave less like a content site and more like a lightweight academic operations platform.

That shift changed every design decision.

The First Important Realization: This Was Not a Website

The first serious architecture lesson was that the website could not be the system. It could only be one projection of the system.

The actual system had four planes:

| Plane | Responsibility | |---|---| | Authoritative state | PostgreSQL via Prisma for announcements, exams, routines, files, users, audit, knowledge, notifications | | Operator interface | Next.js admin panel plus Telegram command center for the CR | | Distribution | Website, push notifications, Discord, WhatsApp | | Derived acceleration | Redis cache, SSE invalidation, service worker cache, IndexedDB, AI context assembly |

Once I started thinking in those planes, the codebase got more coherent. Features stopped being isolated pages and became state transitions with downstream fan-out.

That mental model is the backbone of the whole product.

Architecture in Its Final Shape

The system lives in a monorepo, but operationally it is split between Vercel and a self-hosted VPS:

apps/web: Next.js 14 app serving the student UI, admin UI, and versioned API routes.
apps/bot: WhatsApp delivery service using Baileys.
services/telegram-bot: the CR's command center, including classification and approval flow.
services/discord-bot: outbound Discord announcement delivery.
services/discord-listener: inbound Discord ingestion for approved messages and knowledge capture.
services/transcriber: Python FastAPI service for faster-whisper transcription and text embeddings.
PostgreSQL 16 + pgvector: durable state plus semantic search substrate.
Redis: rate limiting, cache, pub/sub, and event fan-out.

The most important boundary is that the web app is the authority, while the bots are controlled writers through internal APIs. That means the bots do not mutate the database directly. They authenticate with an internal secret and write through narrowly scoped routes like:

/api/v1/internal/announcements
/api/v1/internal/exams
/api/v1/internal/routine/overrides
/api/v1/internal/files

This was a deliberate decision. It kept business rules centralized, cache invalidation consistent, and auditing possible.

The Story of the Features

1. Announcements: The Simplest Feature That Was Not Actually Simple

Announcements look trivial until you ask what "published" means.

In this system, an announcement is not just text in a table. It carries:

semantic type (general, exam, routine_update, urgent, event, course_update)
author identity
publication state
downstream delivery state for WhatsApp and Discord
optional course scoping through a join table

The deeper problem was delivery integrity. A post could originate from the admin panel or from Telegram. Either way, once persisted, it had to trigger cache invalidation, student visibility, and optionally cross-channel distribution without duplicating logic.

This is where a recurring pattern emerged:

ingest -> validate -> persist -> audit -> invalidate -> publish

I started seeing that same pipeline show up in almost every feature.

2. Routine: Modeling Time Correctly Was Harder Than Rendering It

The routine system taught me one of the biggest system design lessons in this project: temporal exceptions deserve first-class modeling.

If I had modeled the routine as one mutable timetable, I would have created a maintenance nightmare. Real class schedules are a mix of:

stable baseline slots
A/B week parity
target-group-specific classes
lab vs theory distinctions
one-off overrides for cancellations, makeup classes, room shifts, or teacher changes

So the system split routine data into:

BaseRoutine
RoutineOverride
RoutineWeek

That split matters.

BaseRoutine stores the canonical recurring schedule. RoutineOverride captures time-bounded deviations. RoutineWeek tracks calendar week semantics, including week type and skipped weeks. This kept the model resilient against the most common academic scheduling problem: exceptions outgrowing the original schedule.

The student UI only shows the result. The real engineering work is in preserving the distinction between stable truth and temporary mutation.

3. Exams and Assignments: Deadlines Need Structure, Not Messages

Exam tracking became its own bounded context because informal notices were not enough. Students needed:

countdown visibility
course linkage
room metadata
syllabus/instructions
assignment-vs-exam distinction
submission state per student

That is why Exam and AssignmentSubmission exist separately instead of treating everything as announcement text. Once data became structured, the system could support timelines, filters, upcoming windows, and student-specific submission states.

This is one of those places where "normalizing the domain" mattered more than adding UI polish.

4. Resource Library and Google Drive: Metadata Was the Real Product

The files feature looks like upload/download, but the real product is metadata discipline on top of external storage.

The source of bytes is Google Drive. The source of truth for discoverability is the database.

The file layer tracks:

driveId
driveUrl
optional downloadUrl
MIME type and size
course association
uploader
drive connection lineage
folder and subfolder visibility rules

The actual hard problems were:

dealing with multiple auth sources for Drive access
supporting connected drives and shared-drive folders
keeping subfolders soft-hidden instead of hard-deleted
making uploads resumable
avoiding local buffering when streaming to Drive
preserving referential cleanup semantics when files are removed

This feature changed my understanding of integrations. The API client is easy. The lifecycle rules are the real engineering.

5. Admin Operations: CRUD Was the Least Interesting Part

The admin surface includes:

announcements
exams
routine and overrides
files
shared drives
courses
users and roles
audit log
knowledge base
settings
Telegram bot config

The interesting part is not that these pages exist. It is that they all sit on top of the same permission spine and side-effect model.

Every admin mutation has to answer four questions:

Is the actor allowed to do this?
What exact fields are mutable?
What dependent caches or client views become stale?
What audit artifact must survive even if the UI changes later?

That is why centralized guards and audit logging mattered more than page generation speed.

6. Telegram Command Center: The Highest-Leverage Feature in the Whole Product

If I had to name the feature with the highest operational leverage, it would be the Telegram bot.

This is where the product stopped being a portal and became an operating system for class communication.

The CR can send text or voice. The bot transcribes if needed, classifies the content, sends back a preview, waits for explicit approval, and only then writes through internal APIs and triggers downstream fan-out.

That approval loop is not cosmetic. It is a trust boundary.

The flow is intentionally human-in-the-loop:

Telegram input -> optional whisper transcription -> AI classification/formatting -> preview -> human confirm -> internal API write -> invalidate cache -> distribute

This pattern reduced the blast radius of misclassification and made automation usable in a real academic workflow. Full automation would have been faster, but materially less trustworthy.

7. Discord, WhatsApp, and Inbound Automation

Outbound Discord and WhatsApp matter because the portal cannot assume students will check the web app first. The system therefore pushes updates to where attention already exists.

But there is also an inbound side: the Discord listener can observe configured channels and route approved content back into the web platform as structured knowledge or announcements.

That gave the system an interesting duality:

website as source of truth
chat platforms as both distribution channels and context collection surfaces

This is one of the first places where I began to recognize bidirectional integration as a category with very different failure modes from simple outbound automation.

8. Virtual CR: The Most Ambitious Feature, and the One That Taught Me the Most Humility

The AI Virtual CR was supposed to be a clean RAG feature: ingest documents, chunk them, embed them, retrieve relevant context, answer with Gemini, done.

Reality was messier.

The codebase clearly shows both the intended vector-search architecture and the operational compromises:

KnowledgeDocument and KnowledgeChunk models
chunking with overlap
pgvector similarity search
embedding generation
chunk caps to protect free-tier storage
prompt-building with source grounding
prompt-injection heuristics
live context assembly from announcements, routine, exams, files, and course catalog

The deep lesson here was that retrieval quality is only one part of the problem. Production viability is shaped by:

Vercel duration ceilings
Gemini quota behavior
embedding latency
cold starts
context-window budgeting
prompt abuse
user expectation mismatch

That is why the system evolved toward a hybrid model. Instead of relying only on vector recall, the chatbot can assemble live database context for high-frequency questions such as:

what classes are today
what exams are coming
what changed in the routine
where is a file
what recent announcements exist

This was an architectural concession to reality, and a good one. A pure semantic retrieval pipeline is elegant on paper. A bounded live-context assistant is often more reliable under free-tier constraints.

9. Student Experience Features

The student-facing surface is broader than the obvious core pages. It includes:

dashboard
announcements
routine
exams
resources
chat
polls
study groups
notifications
search
profile
settings

Some of these are fully central to the product. Some are secondary but important because they reduce the need to leave the platform. Some are partially built and need more proof in live usage.

Polls and study groups are good examples. Technically they exist and are integrated into the role model and API structure. Product-wise, they are more experimental. I would rather say that honestly than oversell them in a portfolio.

10. The Small Features That Make the System Feel Complete

Some of the most important features are not architecturally glamorous, but they reduce friction enough that the system starts feeling cohesive:

the dashboard compresses announcements, routine, and upcoming exams into one operational snapshot
push notifications make the portal behave like an active channel instead of a passive destination
notifications history gives students a second chance after missing a real-time update
search reduces dependency on remembering where a file or post originally appeared
profile and settings keep the identity layer useful instead of purely decorative

These are not "extra pages." They are the glue that reduces context switching.

11. Peripheral and Transitional Features

A mature case study should also acknowledge the edges:

search exists as a surface, but needs clearer validation as a daily-use feature
notifications complement push but may overlap in value
2FA and password flows exist, but the primary login path is still Google OAuth
a Discord admin page exists without equivalent backend depth
Google Sheets and n8n appear as evolutionary artifacts from earlier architecture phases
feature reschedule exists as a specialized adjunct to the routine domain

These are not failures. They are evidence of a live system that evolved through real constraints rather than clean-room planning.

The Technical Hassles That Actually Shaped the Product

Free Tier Economics Affected Core Architecture

This product was designed under an aggressive cost constraint: recurring cost had to stay effectively zero.

That single constraint influenced:

choice of Gemini free-tier usage patterns
local transcription and embedding
storage caps for knowledge chunks
resumable uploads instead of heavyweight file proxying
serverless-compatible cache choices
selective reliance on VPS-hosted services

This is one of the strongest examples in the project of product constraints becoming architectural constraints.

Cache Invalidation Was a First-Class Problem

The app is not fast because Next.js is fast. It is fast because the data plane is layered:

client SWR cache
persistent browser cache
service worker
server ETags
SSE invalidation
Redis server cache

That is a serious system, not a frontend trick.

The important insight is that invalidation was not generic. It had to be domain-aware:

announcements invalidate announcement feeds
exams invalidate exam windows and dashboard projections
routine overrides invalidate temporal views, not just raw routine records
file changes can invalidate global resources or course-scoped subsets
chat history invalidation is user-scoped

If I had treated caching as a library concern instead of a domain concern, the app would have become both stale and unpredictable.

Security Work Was Not an Afterthought

Security was not a later polish pass. It was embedded into the design because the system mixes personal data, privileged actors, internal service routes, and AI-driven behavior.

The meaningful controls include:

strict @iut-dhaka.edu domain gating for primary auth
multi-layer role enforcement in middleware, route guards, query scoping, and UI
explicit field whitelisting to block mass assignment
raw SQL discipline around pgvector queries
XSS mitigation through sanitization and content policy discipline
origin checks and CSRF protection on state-changing routes
prompt-injection detection for chatbot input
Redis-backed rate limiting
internal-route secret headers for bot-to-web writes
audit logs for every meaningful admin action

The more senior lesson was that security in a system like this is mostly about trust-boundary clarity. Once the boundaries are fuzzy, every feature starts leaking risk into the next one.

Production Safety Needed Runtime Guards, Not Good Intentions

One of the hardest lessons in any real system is that dangerous operations should not merely be discouraged; they should be technically blocked.

In this project, destructive database operations, especially anything capable of truncation or unscoped deletion, became a design-level concern. That is the right instinct. In a product with live student data, "be careful" is not a control. Environment guards are controls.

This also applies to test isolation. If a test can accidentally point at a production-like Supabase URL, the test harness is under-designed.

Drive Integration Was Mostly About Failure Modes

Google Drive looked easy at first and then expanded into one of the more complex subsystems in the codebase.

The real work was in:

auth fallback strategy
refresh token lineage
service-account fallback
multi-drive indexing
public link semantics
resumable upload sessions
stream-oriented transfer
delete ordering between Drive and DB
subfolder visibility state

The pattern I learned here is that third-party integration bugs rarely come from the happy path. They come from the states in between: expired tokens, partial uploads, stale folder topology, revoked access, duplicate indexing, and inconsistent deletion order.

Security Model as a System Design Story

The easiest way to understand the security model is to treat each class of actor as potentially honest-but-bounded or malicious-with-opportunity:

outsiders should not authenticate
students should not escalate privilege
admins should not exceed their scope
bots should not bypass business rules
AI should not become an unrestricted answer engine
background automation should not become a silent corruption vector

That threat framing led to a layered design:

Identity and Access

Google OAuth for domain-bound identity
optional password and TOTP path for privileged admin access
role hierarchy: student, admin, super_admin
fresh role checks at session usage time, not just first login

Data Integrity

Prisma as the default query surface
tagged raw SQL only where vector operations require it
audit logs as durable mutation history
explicit mutation fields instead of blind request-body spread

Service Trust

internal bot routes require shared secret headers
bots do not write directly to the database
the web app remains the policy enforcement point

Content and AI Safety

sanitization for rendered rich text
prompt-injection pattern detection
refusal posture for off-domain questions
bounded context sources for AI answers

From a system design perspective, the most important idea is not any single defense. It is the decision to avoid single-point trust.

Pattern Recognition: The Deeper Architecture Lessons

This project taught me to recognize repeating patterns much earlier.

Pattern 1: Most Features Were Just Specialized State Pipelines

Announcements, exams, routine overrides, files, and knowledge ingestion all eventually followed the same shape:

trusted input -> schema validation -> permission gate -> durable write -> audit -> cache invalidation -> downstream projection

Once I recognized that, implementation got simpler because I stopped treating every feature as unique.

Pattern 2: Temporal Exceptions Should Usually Be Separate Models

Routine changes were the clearest example, but the principle generalizes. Systems become fragile when transient exceptions are merged into baseline truth.

The fix is not smarter conditionals. The fix is better domain separation.

Pattern 3: AI Features Become Stable Only When the Non-AI Context Is Strong

The Virtual CR did not improve because the prompt got prettier. It improved when the surrounding system got better at:

supplying current state
limiting topic scope
rejecting hostile input
capping storage and latency
choosing operationally cheap context paths

That is a strong pattern I now look for in every AI feature: model quality is downstream of systems quality.

Pattern 4: Caching Is Really a Consistency Strategy

Redis, SSE, ETags, service workers, and SWR are not performance decorations. Together they define how stale the user is allowed to be, for how long, and under what mutation events.

That framing made the caching layer far more coherent.

Pattern 5: Honest Documentation Is Part of System Design

The codebase contains fully active features, yellow-flag features, abandoned paths, and legacy artifacts. Calling those states accurately is part of engineering maturity. A system becomes easier to evolve when its ambiguity is documented rather than hidden.

What This Product Proved to Me

IPE-24 Classroom proved that I can design beyond interface-level development.

I did not just assemble pages. I had to reason about:

trust boundaries
temporal data modeling
integration failure modes
human-in-the-loop automation
AI guardrails
multi-channel delivery
cache coherence
role-based operational control
free-tier systems economics

The most important outcome was not the number of features. It was that the system developed an internal logic. Each new capability either fit the architecture or exposed where the architecture was weak.

That is the shift from "self-taught person who can code" to "developer who can shape a system."

Final Reflection

If I had to summarize this case study in one sentence, it would be this:

I set out to build a class portal, but I ended up building a trust-aware, multi-runtime academic coordination system where the hardest engineering problems were not UI problems at all.

The portfolio value of this product is not that it has announcements, exams, files, or a chatbot. Many projects can list those features.

The real value is that this system forced me to confront the parts of software that only start to matter once features interact:

what is the source of truth
who is allowed to mutate it
how changes propagate
how you stay fast without lying
how you automate without losing human control
how you use AI without turning the product into a hallucination engine

That is the level at which I now try to think about software.