I Heart Logs
What is Log in your mind?
Log - System Perspective
- A Log is a commit log or journal.
- Append only sequence of records ordered by time.
Log in a database
- Write-ahead logging(WAL)
- Durability
- Atomicity
- Replication data between databases
Things to Consider in Distributed System
- Consistency
- Availability
- Partition Tolerance
Log in Distributed System
-
Log-Centric Design
-
Act as Consensus
- Paxos
- Raft
Data Integration
Data Integration Painpoints
- Data is More Diverse
- Every kind of data you can think of
- Explosion of Specialized Data Systems
- OLAP
- Search
- Online Storage
- Batch Processing
- Stream Processing
- Graph Analysis
Log-Structured Data Flow
Fully Connected Architecture
Log-Centric Architecture
Scaling a Log
- Partitioning the Log
- Optimize by batching read and write
- Avoid needless data copy
Stream Processing
A stream processing job will be anything that reads from logs and writes output to logs or other systems.
Log and Streaming Processing
- Logs make each data set to be a multi-subscriber
- Ensure the order is maintained in the processing done by each consumer
- Provide isolation and buffering
Lambda Architecture
- Good:
- Reprocessing is possible as system envolve
- Bad:
- Maintain code in two different complex system
An Alternative
Log Compaction
Place of Log in System Architecture
- Handle data consistency
- Provide data replication between nodes
- Provide “commit” semantics to writer
- Provide external data subscription feed
- Provide capability to restore from failure
- Handle rebalancing of data between nodes
References
- https://en.wikipedia.org/wiki/Write-ahead_logging
- https://en.wikipedia.org/wiki/CAP_theorem
- https://en.wikipedia.org/wiki/Paxos_(computer_science)
- https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
-End-