Hey everyone, I’m finalizing the architecture for a high-concurrency ticket reservation system (think concert tickets or limited sneaker drops).
The Constraints:
I'm going with a vertically scaled, containerized Go API talking directly to a massive Redis instance, with PostgreSQL acting only as the final persistent truth after checkout.
I explicitly rejected the popular "modern" approach of Serverless (AWS Lambda + API Gateway + SQS + DynamoDB).
I know a single beefy Redis instance is technically a single point of failure, but I feel like keeping it simple is safer than managing a massive distributed event-driven mess.
As a Senior Engineer, tear this apart.
The Constraints:
- Traffic: We expect spikes of up to 50,000 concurrent users hitting the "Buy" button within the first 30 seconds of a drop.
- Inventory: Only 5,000 items available. Overbooking is absolutely not an option.
- Latency: Needs to feel instant to the user, either giving them a confirmed lock on the item for 5 minutes to complete checkout, or an instant "Sold Out."
I'm going with a vertically scaled, containerized Go API talking directly to a massive Redis instance, with PostgreSQL acting only as the final persistent truth after checkout.
- All inventory is loaded into Redis before the sale starts.
- The Go backend handles the queue and uses Redis atomic operations (DECR and Lua scripts) to lock inventory.
- Postgres only gets written to once the payment actually clears.
I explicitly rejected the popular "modern" approach of Serverless (AWS Lambda + API Gateway + SQS + DynamoDB).
- Why I hate it for this: Lambda cold starts and API Gateway latency add too much overhead when 50,000 users hit it at exactly 12:00:00. Plus, provisioning DynamoDB write capacity for a traffic spike that lasts literally 30 seconds feels like a complete waste of money and a recipe for throttling disasters.
I know a single beefy Redis instance is technically a single point of failure, but I feel like keeping it simple is safer than managing a massive distributed event-driven mess.
As a Senior Engineer, tear this apart.
- Where is my blind spot?
- When this inevitably crashes under load, what is the exact component that catches fire first?
- Prove to me why I should have gone with a distributed/serverless stack instead.