Two years ago, I started building Best Algo Trading's core system: a real-time trading scanner that monitors 200+ stocks, runs 8 different strategies, and sends signals to users within milliseconds of opportunity. We built it on a bootstrap budget. No venture capital. No enterprise infrastructure.
In this post, I'll share the architectural decisions we made, the trade-offs we accepted, and the lessons that matter for anyone building fintech or trading systems. This is the real story—not the glossy demo, but the decisions that keep the system alive in production.
The Core Architecture: Python + Flask + Real-Time Data
Most people assume high-frequency trading systems need Rust, C++, or Java. We chose Python + Flask. Here's why, and the caveats.
Why Python?
- Speed to market: We needed to move fast. Python let us prototype strategies in days, not months.
- Data ecosystem: pandas, numpy, scipy, TA-Lib are the lingua franca of quantitative finance. Fighting against the ecosystem wastes time.
- Talent: Finding a Rust developer who knows options strategies is hard. Finding a Python developer is trivial.
- Cost: Serverless Python functions (AWS Lambda, GCP Functions) are cheap. Maintaining bare-metal C++ infrastructure isn't.
Where Python Fails (And How We Fixed It)
Problem 1: The GIL (Global Interpreter Lock) Python's GIL prevents true parallelism. If you're processing multiple stock streams simultaneously, your threads fight for the same lock.
Solution: We use multiprocessing instead of threading. Each worker process gets its own Python interpreter, bypassing the GIL. Trade-off: higher memory overhead, but acceptable for our scale (200 stocks = 4-6 worker processes).
Problem 2: Latency Python is slower than compiled languages. For a retail trading system, we need sub-100ms latency from data receipt to signal send.
Solution: We don't do heavy computation in the main loop. Strategy calculations run once per minute (on pre-computed buffers), not on every tick. This works because most strategies don't need tick-level granularity—they work on 1-minute or 5-minute candles.
"The choice isn't between fast languages and slow languages. It's between doing heavy work efficiently and doing light work frequently."
Data Pipeline: Real-Time Streaming vs. Polling
This is where most bootstrap systems fail. You need to decide: Do you stream data from the broker's API, or poll it periodically?
Streaming (WebSocket/gRPC)
Pros: Instant data, true real-time.
Cons: Brokers often have unstable WebSockets. Connection drops = signal delays. You need resilience, reconnection logic, and buffer management.
Polling (REST API)
Pros: Simple, reliable, forgiving of failures.
Cons: Latency. If you poll every 1 second, you miss intra-second moves.
Our Hybrid Approach
We use a primary WebSocket connection for low-latency data, with an automatic fallback to REST polling if WebSocket drops. The key insight: Most trading signals don't require sub-second precision. A 2-3 second delay is acceptable if it means we never lose data.
This single decision saved us 20+ hours of debugging during broker API outages.
Strategy Execution: Decoupling Signals from Orders
Here's a critical lesson: Your signal generation system and your order execution system must be completely decoupled.
Why? Because signals are fast and deterministic. Order execution is slow and error-prone. If a signal generation failure crashes your entire system, you miss trades. If an order execution failure crashes the system, you might place duplicate trades.
We have:
- Signal Engine: Runs strategies, generates buy/sell signals. Output: a queue of signals with metadata.
- Order Manager: Reads signals from queue, applies risk checks, places orders via broker API. Output: execution logs, order confirmations.
- Message Queue (Redis/RabbitMQ): Decouples the two. Even if Order Manager is down, Signal Engine keeps running.
This architecture means we can deploy updates to our signal logic without touching the order execution. We can test new strategies in "dry run" mode (generating signals without placing orders) before turning them live. Most importantly, a broker API outage doesn't crash the entire system.
Handling 200+ Stocks in Real-Time
Here's the practical constraint: We can't afford to do deep computation on every stock every second. So we tiered our processing:
Tier 1: Fast Scan (Every 10 Seconds)
Price checks, simple volatility metrics, entry/exit conditions. Ultra-lightweight, runs across all 200 stocks.
Tier 2: Medium Computation (Every Minute)
Technical indicators (RSI, MACD, Bollinger Bands), IV percentile rankings, cross-asset correlations. Runs on stocks that passed Tier 1.
Tier 3: Heavy Computation (Every 5 Minutes)
Options Greeks, portfolio risk metrics, Monte Carlo simulations for complex strategies. Only runs on highest-conviction setups.
This tiering means we do 90% less computation while catching 95% of tradeable setups. The cost? We miss some ultra-fast reversals. Worth it.
Docker Deployment and Observability
We containerize everything with Docker. Each component is a separate container:
- Data ingestion service
- Signal engine service
- Order manager service
- Logging/monitoring service
- Backtest engine service
On failure, Kubernetes (or simple Docker restart logic) relaunches the failed container. This is critical because trading systems must never go down. In the past 18 months, we've had ~4 unplanned outages (all due to broker API changes, not our code).
For observability, we log everything: every signal generated, every order placed, every error. We use Prometheus for metrics and Grafana for dashboards. When something breaks at 4 AM, we can trace exactly what happened.
The Lessons
- Don't over-engineer early: We started with basic REST polling. WebSocket came only after we proved the concept worked.
- Embrace asynchronous processing: Don't wait for slow operations. Queue them, process them in background, check status later.
- Decouple everything: Signal generation, order execution, logging—treat them as separate systems. Failures in one shouldn't bring down others.
- Test in production (safely): Dry-run mode is your friend. New strategies should run in dry-run for days before touching real capital.
- Monitor ruthlessly: You can't fix what you don't measure. Invest in logging and dashboards early.
- Respect the broker API: Broker infrastructure will fail. Plan for it. Have fallbacks. Expect rate limits.
What We'd Change
If I were rebuilding from scratch, I'd consider:
- Rust for the core data pipeline: Not for speed (Python is fast enough), but for confidence. Memory safety matters when losing data costs money.
- Event sourcing: Store every signal, every quote, every order as immutable events. Makes auditability and debugging trivial.
- Kafka instead of Redis: For higher volumes. Though for 200 stocks, Redis works fine.
The Bottom Line
Building a production trading system is hard. It's not hard because of the algorithms—most strategies are simple. It's hard because of infrastructure, reliability, and operations. An 80/20 system (simple signal, robust infrastructure) beats a complex algorithm on shaky infrastructure every single time.
We chose Python + Flask not because it's the fastest, but because it let us focus on what matters: reliable signal generation and rock-solid risk management.