What is Log in your mind?

img

Log - System Perspective

  • A Log is a commit log or journal.
  • Append only sequence of records ordered by time.

img

Log in a database

  • Write-ahead logging(WAL)
    • Durability
    • Atomicity
  • Replication data between databases

Things to Consider in Distributed System

  • Consistency
  • Availability
  • Partition Tolerance

Log in Distributed System

  • Log-Centric Design

    img

  • Act as Consensus

    • Paxos
    • Raft

Data Integration

img

Data Integration Painpoints

  • Data is More Diverse
    • Every kind of data you can think of
  • Explosion of Specialized Data Systems
    • OLAP
    • Search
    • Online Storage
    • Batch Processing
    • Stream Processing
    • Graph Analysis

Log-Structured Data Flow

img

Fully Connected Architecture

img

Log-Centric Architecture

img

Scaling a Log

  • Partitioning the Log
  • Optimize by batching read and write
  • Avoid needless data copy

Stream Processing

A stream processing job will be anything that reads from logs and writes output to logs or other systems.

img

Log and Streaming Processing

  • Logs make each data set to be a multi-subscriber
  • Ensure the order is maintained in the processing done by each consumer
  • Provide isolation and buffering

Lambda Architecture

img

  • Good:
    • Reprocessing is possible as system envolve
  • Bad:
    • Maintain code in two different complex system

An Alternative

img

Log Compaction

img

Place of Log in System Architecture

  • Handle data consistency
  • Provide data replication between nodes
  • Provide “commit” semantics to writer
  • Provide external data subscription feed
  • Provide capability to restore from failure
  • Handle rebalancing of data between nodes

References

-End-