A high-performance distributed task scheduler built with Go, capable of handling millions of scheduled tasks with fault tolerance and horizontal scalability.
Overview
Built to solve the problem of reliably scheduling and executing tasks across a distributed system. Provides exactly-once execution guarantees and automatic failover.
Features
- High Throughput: Process 100k+ tasks per second
- Fault Tolerant: Automatic failover and task reassignment
- Flexible Scheduling: Cron expressions, one-time, and recurring tasks
- Priority Queues: Execute critical tasks first
- Monitoring: Real-time metrics and alerting
Technical Highlights
Architecture
- Leader election using Raft consensus
- Sharding for horizontal scalability
- Message queue for task distribution
- State management with PostgreSQL
Performance Optimizations
- Connection pooling and reuse
- Batch processing for database operations
- In-memory caching with Redis
- Worker pool with dynamic sizing
Metrics
- Latency: P99 < 50ms for task submission
- Availability: 99.99% uptime
- Scale: Tested with 10M+ concurrent scheduled tasks
- Recovery: < 30 seconds failover time
Use Cases
- ETL Pipelines: Schedule data extraction and transformation jobs
- Notifications: Send time-based alerts and reminders
- Maintenance: Automated cleanup and backup tasks
- Reporting: Generate scheduled reports
Lessons Learned
- Importance of observability in distributed systems
- Trade-offs between consistency and availability
- Effective testing strategies for concurrent systems