Database Specific and Papers
- Ceph - Scalable and Distributed Object Store
- Paper WiscKey - separating keys from values in SSD
- Paper The Log-Structured Merge Tree
- WiscKey: Separating Keys from Values in SSD-conscious Storage usenix.org
- CockroachDB Engineering Blog
Database Concepts
- A minimal distributed key-value database with Hashicorp’s Raft library notes.eatonphil.com
- An entire db tutorial made with C.
- https://jepsen.io/consistency#
- Database Systems - CMU Andy Pavlo
- CMU Databaseology 2015
- CS186 Database Systems and Chapters
- A basic intro to product quantization
- How query engines work howqueryengineswork.com
- Awesome Database Development
- https://planetscale.com/blog/btrees-and-database-indexes
- https://planetscale.com/blog/database-sharding
- HyperLogLog
- Getting started with Databases
- German Strings: https://cedardb.com/blog/german_strings/
- Database Page Layout
- Thomas Write Rule
- https://github.com/marvin-j97
- https://github.com/fjall-rs
- LSM in a week
- Write your own vector DB
- chidb
- CS186 Introduction to DBS Berkeley Assignments
- Phil Eaton blogs tagged databases notes.eatonphil.com
- KV Seperation in LSM Trees
- SIP Hashing
- https://abseil.io/about/design/swisstables
- https://go.dev/blog/swisstable
- The Databaseology Lectures (CMU)
- Database Systems (CMU)
- Introduction to Database Systems (Berkeley) (See the assignments)
- chidb
- Let’s Build a Simple Database
- Build your own disk based KV store
- Let’s build a database in Rust
- Let’s build a distributed Postgres proof of concept
- LSM Tree: Data structure powering write heavy storage engines
- MemTable, WAL, SSTable, Log Structured Merge(LSM) Trees
- Btree vs LSM
- Modern B-Tree Techniques
- Organization and maintenance of large ordered indices (Original paper)
- Architecture of a Database System
- Awesome Database Development (Not your average awesome X page, genuinely good)
- The Third Manifesto Recommends
- The Design and Implementation of Modern Column-Oriented Database Systems
- CMU Database Group Interviews
- Database Programming Stream (CockroachDB)
- Murat Demirbas
- Ayende (CEO of RavenDB)
Uncategorised links
- Justin Jaffray blog posts justinjaffray.com
- Mark Callaghan, Small Datum blogs smalldatum.blogspot.com
- Tanel Poder Blog tanelpoder.com
- Redpanda Engineering Blog redpanda.com
- Andy Grove blogs andygrove.io
- Jamie Brandon blog scattered-thoughts.net
- Distributed Computing Musings distributed-computing-musings.com
- Alex Chi Z. blogs skyzh.dev
- MIT 6.824 Lecture 2: RPC and Threads youtube.com
- Decoding Atomicity - The A in ACID arpitbhayani.me
- Decoding Consistency - The C in ACID arpitbhayani.me
- Decoding Isolation - The I in ACID arpitbhayani.me
- Difference between “read commited” and “repeatable read” in SQL Server stackoverflow.com
- Consistency Models jepsen.iok
- Decoding Durability - The D in ACID arpitbhayani.me
- An Illustrated Proof of the CAP Theorem mwhittaker.github.io
- How Paxos and Two-Phase Commit Differ predr.ag
- Deterministic simulation testing for async Rust s2.dev
- Colossus under the hood: How we deliver SSD performance at HDD prices cloud.google.com
- CacheSack: Admission Optimization for Google Datacenter Flash Caches usenix.org
- Data Replication Design Spectrum transactional.blog
- Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation vldb.org
- Magma: A High Data Density Storage Engine Used in Couchbase vldb.org
- Build Your Own Database From Scratch in Go build-your-own.org
- What I Learned Building a Storage Engine That Outperforms RocksDB tidesdb.com
- How AWS S3 is built youtube.com
- TidesDB Bengaluru Systems Presentation youtube.com
- Re-Designing Data-Intensive Applications: The Shift to Cloud-Native Storage youtube.com
- Fixing five “two-year” bugs per day youtube.com
- SQLite Internals: Pages & B-trees fly.io
- SQLite File Format Viewer sqlite-internal.pages.dev
- The Rollback Journal sqlite.org
- Write-Ahead Logging sqlite.org
- Database File Format sqlite.org
- How does SQLite store data? michalpitr.substack.com
- SQLite Internals: How The World’s Most Used Database Works compileralchemy.com
- Learning What the Heck is Inside SQLite ryanisaacg.com
- Pushdown Automaton wikipedia.org
- The Lemon LALR(1) Parser Generator sqlite.org
- A sqlite basic file parser github.com/bwaklog
- SQLite Varint Decoding Stage SZ4 forum.codecrafters.io
- vu128: Efficient variable-length integers john-millikin.com
- Reading Sqlite Schema Tables the Hard Way www.philosophicalhacker.com
- Hosting SQLite databases on Github Pages phiresky.github.io
- Rocksdb Overview github.com/facebook/rocksdb
- Rocksdb MemTable github.com/facebook/rocksdb
- https://www.youtube.com/watch?v=5vL6aCvgQXU
- Two years of vector search at Notion: 10x scale, 1/10th cost notion.com
- To BLOB or Not to BLOB : Large Object Storage in a Database or a Filesystem paper
- NULLS! Revisiting NULL Value Representation in Modern Columnar Formats [paper](https://db.cs.cmu.edu/papers/2024/zeng-damon24.pdf
- https://transactional.blog/notes-on/disaggregated-oltp
- video by matklad @ tigerbeetle youtube.com
- Writing a SQL Database, takes two: Zig and Rocks DB notes.eatonphil.com
- Building a serverless ACID database with one neat trick (atomic PutIfAbsent) notes.eatonphil.com
- Implementing MVCC and major SQL transaction isolation levels [notes.eatonphil.com](https://notes.eatonphil.com/2024-05-16-mvcc.html
- Pushing boundaries: Quantum-Enhanced Leader Election and Limits of Consensus arxiv.org
- But how, exactly, databases use mmpa brunocalza.medium.com (supporting hackernews article 25881911)
- How does one do Raw IO on Mac OS X? (ie. equivalent to Linux’s O_DIRECT flag) stackoverflow.com
darwin-xnu/bsd/vfs/vfs_cluster.ccomment onMIN_DIRECT_WRITE_SIZEset to 16384 governing how much I/O should be allowed before considering to allow the caller to bypass the buffer cache, not allowing I/O less than 16k to bypass the UBC github.com/apple/darwin-xnu- (got above from this) Issue: OSX
fcntl(fd, f_NOCACHE, 1)is not the same asO_DIRECTon Linux axboe/fio/issues/48 - Vectored I/O (wikipedia) for scatter gather operations, with
sys/uio.h,readv,writevpubs.opengroup.org - Userland Disk I/O transactional.blog
- Darwins deceptive durability transactional.blog
- Files are hard (consistency) danluu.com
- Different I/O Access Methods for Linux, What We chose for ScyllaDB, and why scylladb.com
- An Introduction to Distributed Systems github.com/aphyr/distsys-class and labs (based on maelstrom: https://github.com/jepsen-io/maelstrom/blob/main/doc/02-echo/index.md)
- Foundation DB - Simulation and Testing apple.github.io/foundationdb
- Tigerbeetle DST and VOPR github.com/tigerbeetle
- Tigerbeetle VSR github.com/tigerbeetle
- Foundation DB technical overview apple.github.io
- Please stop calling databases CP or AP martin.kleppmann.com
- Amazon Aurora: Design considerations for high throughput cloud-native relational databases amazon.science
- A Database Without Dynamic Memory Allocation tigerbeetle.com
- Protocol-aware deterministic simulation testing | Chaitanya Bhandari | Bug Bash 2026 youtube.com
- Will Wilson on Swarm Testing — Papers We Love SF March 2026 youtube.com
- Swarm Testing paper users.cs.utah.edu
- BugBash Session antithesis.com
- Web Browser Engineering by Pavel Panchekha & Chris Harrelson browser.engineering
- BPF-DB: A Kernel-Embedded Transactional Database Management System For eBPF Applications dl.acm.org
- Carnegie Mellon Database Group publications db.cs.cmu.edu
- Atproto for distributed systems engineers atproto.com
- Understanding Atproto atproto.com
- Bluesky and the AT Protocol: Usable Decentralized Social Media arxiv.org
- Martin Kleppman Thinking in Events: from databases to distributed collaboration youtube.com
- The AT Protocol docs.bsky.app
- Open social overreacted.io
- How I learned about Merklix trees (without having to become a cryptocurrency enthusiast) decomposition.al
- Linux: When to use scatter/gather IO (readv,writev) vs large buffer with fread stackoverflow.com
- vishnujayadevan.com
- Hands-on lab for Unikernels labs.iximiuz.com
- Why now article on turbopuffer - a cache memory whynowtech.substack.com
- Object storages in database architecture pingcap.com
- Vector and FTS on object storage turbopuffer.com
- blog.cloudflare.com
- youtube.com
- youtube.com
- Future of distributed systems and object storages blog.colinbreck.com
- RFC on Delay-Tolerant Network Architectures datatracker.ietf.org
- Amazon Aurora DSQL block post series by Marc Bowes marc-bowes.com
- Distributed Task Queue with Celery docs.celeryq.dev
- Quorum Systems by yale www.cs.yale.edu
- Relaxed Paxos paper dl.acm.org
- Raft consensus with a “minority” with finite padhye.org
- A guide for C and C++ programmers for understanding linker essentials www.lurklurk.org
- 0xfe.blogspot.com - Article on MachO binary format on OS-X and how applications are executed
- A video on UDP Hole Punching youtube.com
- Efficient IO with io_uring by kernel developer Jens Axboe kernel.dk
- LWN article for io_uring lwn.net
- Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines arxiv.org
- Discovering hard disk physical geometry through physical benchmarking https://blog.stuffedcow.net/2019/09/hard-disk-geometry-microbenchmarking/
- FROST: Fingerprinting remotely using OPFS-based SSD timings hannesweissteiner.com
- S3 Files are the changing face of S3 www.allthingsdistributed.com