On data indexing

4 minute(s)

Introduction

Most people don’t need to think about “indexing”. When you build an app, you decide how you structure your data. If I’m building X, I have a list of all my users and a list of all their tweets. These two lists tie together so I know who tweeted what, and I can present that in my app. Things get more complicated from there: I need to map usernames to display names, manage media like photos and videos, and more. But generally, I can set things up how I want, and I can use the right tool for each job:

to store text and search it, I use one type of database
to store images and media, I use another one
for analytics, yet another one

What I described above is not how things work in crypto. The basis of blockchains is that everyone is working off a shared database. Regardless of which blockchain (Ethereum, Bitcoin, Solana, etc.), they are great for allowing a lot of different people to run it at the same time and all come to consensus on what the database says, but it’s really bad at most other things.

Problems with crypto data

#1: Generic structure

Blockchains were not designed for app-friendly reads. They expose a handful of primitive data structures. For example, Ethereum’s database is sorted into four tables:

blocks
logs
traces
transactions

No “users”, “posts”, “media”, etc. These basic structures work well for maintaining a ledger, but terrible for anything even one layer of abstraction away. If you’re building an NFT app, you don’t want “logs”, you want “NFT transfers.” If you’re building a decentralized exchange, you don’t want “traces”. You want “swaps,” “pools,” “positions,” and “24h volume”. But the chain doesn’t give you these things.

#2: Noisy

In a normal app, your database contains your app’s data. On a blockchain, your app’s activity lives in the same global history as everything else. Your NFT app shares the same underlying “book” as every trade that ever happened. If you want to answer a simple product question (“who owns these NFTs?”), you’re forced to sift through the entire world’s activity to find the tiny slice you care about. Imagine if Facebook had to filter out every Amazon product and order in order to render your news feed.

#3: Unoptimized

Even worse: the underlying databases used by blockchain nodes are primarily built to support the network’s write/verification needs (executing transactions, keeping state), not read workloads. The data is stored in an inefficient layer, often hashed (encrypted to minimize storage requirements requirements) and with key information required to decode it not sitting in the database at all.

Solution

Indexing is the missing middle layer that solves these 3 problems, turning the encoded, noisy, and generic data from a blockchain into contextual and specific databases that you can actually build with. They sit between:

what you’re building (apps, dashboards, alerts)
and how blockchain data actually exists (blocks/logs/traces/transactions)

Their job is to turn “raw chain exhaust” into “app-ready data”. In practice, that means:

extracting only the events/state you care about
decoding and normalizing it into readable fields
organizing it into useful entities (eg. swaps)
storing it somewhere queryable
keeping it updated as new transactions happen

If you like metaphors: blockchains are like a giant book, written in hieroglyphics by many authors at once, all writing unrelated sentences one after the other. Indexers translate those hieroglyphics into English, and pull out only the sentences you care about. They add in an index, table of contents, clean chapters, and even summaries and statistics. They do this following the exact instructions you give them.

Conclusion

Blockchains are not application databases. They’re global, write-optimized ledgers with a minimal, awkward read interface. Indexers exist because the raw chain is not a usable application database, and people building apps need:

structured, domain-shaped data
fast and flexible queries
real-time updates
reliability at scale

There’s many different approaches to indexing - Goldsky is one of them (in my unbiased opinion, the best), but there are others. Going into the options there is probably a job for a separate followup post. And why this effort is worth the trouble is a whole separate question as well. But for now, I hope this gives you a clearer picture of what “indexing” is, and what problem Goldsky is solving.

← All posts

Table of Contents Introduction Problems with crypto data #1: Generic structure #2: Noisy #3: Unoptimized Solution Conclusion Date 2025-12-23December 23rd, 2025 Tl;dr Blockchains are strange "shared databases" that come with data challenges most companies never face. This post outlines these complexities and how a new category of infrastructure called indexers solve them. Meta Many of my friends and family ask me, "what does Goldsky do"? This is my attempt to answer that question.