Skip to main content

· 4 min read

We’re excited to announce that we’re rolling out a new version of atproto repositories that removes history from the canonical structure of repositories, and replaces it with a logical clock. We’ll start rolling out this update next week (August 28, 2023).

For most developers with projects subscribed to the firehose, such as feed generators, this change shouldn’t affect you. These will only affect you if you’re doing commit-aware repo sync (a good rule of thumb is if you’ve ever passed earliest or latest to the com.atproto.sync.getRepo method) or are explicitly checking the repo version when processing commits.

Removing Repository History

Repositories on the AT Protocol are like Git repositories, but for structured records. Just like Git, each commit to an atproto repository currently includes a pointer to the previous commit. However, this approach has caused a couple of pain points:

  • Record deletions are difficult to process. If a user deletes a record, that commit needs to be erased from their repository to match their intent.
  • Increased storage cost. Maintaining repo history can cause anywhere from a 5-10x increase in repo size.

We attempted to resolve both of these in the current model through rebases (discrete moments when the history of a repository is deleted/mutated, like in Git). However, this is a tricky and sensitive operation that is expensive to conduct and complex to communicate across the network.

Using a Logical Clock for Repositories

To address the above issues, we’re replacing the prev pointer in commits with a logical clock. We originally published our intention to do so a few weeks ago. These are the changes we’re making to the way we handle repository history:

  • Incrementing the repo version to 3
  • Making the prev field on repo commits optional
  • Adding a new required rev (revision) field which is a logical clock
  • Removing or adjusting commit-aware repo sync mechanisms

Note: If you explicitly verify the version of a repo commit or do strict type checking on commit repo commits (which you shouldn’t — the spec allows unspecified fields!), you will need to make that check inclusive of version 3.

To facilitate backwards compatibility with software that is still running repo v2, we will continue setting the prev field on commits in the interim.

Even though we are setting the prev field, this can be considered a “hint” and the history is no longer considered a canonical part of the repository.


Repository Revisions

The new sync semantics for the repository rely on a logical clock included in each signed commit.

This “revision” takes the form of a TID and must be monotonically increasing.

The included revision serves a few functions:

Ordering

The clock provides a simple ordering mechanism for encountered repos or commits. If a consumer encounters the same repo from two different sources, each with a valid signature and structure, the revision gives a simple mechanism to determine which is the most recent repository.

Sync

When syncing a repository, revisions give a series of signposts that allow you to request everything from a given repo since a previously seen version. Because revisions are ordered and monotonically increasing, the provider does not necessarily need the exact revision that the consumer is asking for (as with a commit hash), rather they can provide all repo contents from the latest version of the repo that they remember that is before the requested revision.

The PDS for instance will track the revision at which each repo block or record was introduced into a repository. If a consumer asks for every block or record since a given revision, the PDS has a simple mechanism by which to give that information, without needing a complicated sync algorithm.

Stale Reads

Finally, a logical clock on the repo gives us a mechanism through which we can detect stale reads. (We actually already snuck this in with an optional revision field on v2 repos!)

Repo revisions may be returned in response headers to most requests. A client will know their own repo’s current revision and can compare that with the upstream service’s revision.

We use this today on the PDS to paper over some read-after-write concerns that are inherent in eventually consistent architectures. Some clients may use these headers to alert their users that their PDS is “out of sync” with other services in the network (for instance an AppView).

Available sync methods


If you have questions about these changes, join us on GitHub Discussions here.

· 11 min read

This blog post may become outdated as new features are added to atproto and the Bluesky application schemas.


First, you'll need a Bluesky account. We'll create a session with HTTPie (brew install httpie).

http post https://bsky.social/xrpc/com.atproto.server.createSession \
identifier="$BLUESKY_HANDLE" \
password="$BLUESKY_APP_PASSWORD"

Now you can create a post by sending a POST request to the createRecord endpoint.

http post https://bsky.social/xrpc/com.atproto.repo.createRecord \
Authorization:"Bearer $AUTH_TOKEN" \
repo="$BLUESKY_HANDLE" \
collection=app.bsky.feed.post \
record:="{\"text\": \"Hello world! I posted this via the API.\", \"createdAt\": \"`date -u +"%Y-%m-%dT%H:%M:%SZ"`\"}"

Posts can get a lot more complicated with replies, mentions, embedding images, and more. This guide will walk you through how to create these more complex posts in Python, but there are many API clients and SDKs for other programming languages and Bluesky PBC publishes atproto code in TypeScript and Go as well.

Skip the steps below and get the full script here. It was tested with Python 3.11, with the requests and bs4 (BeautifulSoup) packages installed.


Authentication

Posting on Bluesky requires account authentication. Have your Bluesky account handle and App Password handy.

import requests

BLUESKY_HANDLE = "example.bsky.social"
BLUESKY_APP_PASSWORD = "123-456-789"

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.server.createSession",
json={"identifier": BLUESKY_HANDLE, "password": BLUESKY_APP_PASSWORD},
)
resp.raise_for_status()
session = resp.json()
print(session["accessJwt"])

The com.atproto.server.createSession API endpoint returns a session object containing two API tokens: an access token (accessJwt) which is used to authenticate requests but expires after a few minutes, and a refresh token (refreshJwt) which lasts longer and is used only to update the session with a new access token. Since we're just publishing a single post, we can get away with a single session and not bother with refreshing.

Post Record Structure

Here is what a basic post record should look like, as a JSON object:

{
"$type": "app.bsky.feed.post",
"text": "Hello World!",
"createdAt": "2023-08-07T05:31:12.156888Z"
}

Bluesky posts are repository records with the Lexicon type app.bsky.feed.post — this just defines the schema for what a post looks like.

Each post requires these fields: text and createdAt (a timestamp).

This script below will create a simple post with just a text field and a timestamp. You'll need the datetime package installed.

import json
from datetime import datetime, timezone

# Fetch the current time
# Using a trailing "Z" is preferred over the "+00:00" format
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

# Required fields that each post must include
post = {
"$type": "app.bsky.feed.post",
"text": "Hello World!",
"createdAt": now,
}

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.createRecord",
headers={"Authorization": "Bearer " + session["accessJwt"]},
json={
"repo": session["did"],
"collection": "app.bsky.feed.post",
"record": post,
},
)
print(json.dumps(resp.json(), indent=2))
resp.raise_for_status()

The full repository path (including the auto-generated rkey) will be returned as a response to the createRecord request. It looks like:

{
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k4duaz5vfs2b",
"cid": "bafyreibjifzpqj6o6wcq3hejh7y4z4z2vmiklkvykc57tw3pcbx3kxifpm"
}

Setting the Post's Language

Setting the post's language helps custom feeds or other services filter and parse posts.

This snippet sets the text and langs value of a post to be Thai and English.

# an example with Thai and English (US) languages
post["text"] = "สวัสดีชาวโลก!\nHello World!"
post["langs"] = ["th", "en-US"]

The resulting post record object looks like:

{
"$type": "app.bsky.feed.post",
"text": "\u0e2a\u0e27\u0e31\u0e2a\u0e14\u0e35\u0e0a\u0e32\u0e27\u0e42\u0e25\u0e01!\\nHello World!",
"createdAt": "2023-08-07T05:44:04.395087Z",
"langs": [ "th", "en-US" ]
}

The langs field indicates the post language, which can be an array of strings in BCP-47 format.

You can include multiple values in the array if there are multiple languages present in the post. The Bluesky Social client auto-detects the languages in each post and sets them as the default langs value, but a user can override the configuration on a per-post basis.

Mentions and links are annotations that point into the text of a post. They are actually part of a broader system for rich-text "facets." Facets only support links and mentions for now, but can be extended to support features like bold and italics in the future.

Suppose we have a post:

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.

Our goal is to turn the handle (@atproto.com) into a mention and the URL (https://en.wikipedia.org/wiki/CBOR) into a link. To do that, we grab the starting and ending locations of each "facet".

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.
start=23^ end=35^ start=74^ end=108^

We then identify them in the facets array, using the mention and link feature types. (You can view the schema of a facet object here.) The post record will then look like this:

{
"$type": "app.bsky.feed.post",
"text": "\u2728 example mentioning @atproto.com to share the URL \ud83d\udc68\u200d\u2764\ufe0f\u200d\ud83d\udc68 https://en.wikipedia.org/wiki/CBOR.",
"createdAt": "2023-08-08T01:03:41.157302Z",
"facets": [
{
"index": {
"byteStart": 23,
"byteEnd": 35
},
"features": [
{
"$type": "app.bsky.richtext.facet#mention",
"did": "did:plc:ewvi7nxzyoun6zhxrhs64oiz"
}
]
},
{
"index": {
"byteStart": 74,
"byteEnd": 108
},
"features": [
{
"$type": "app.bsky.richtext.facet#link",
"uri": "https://en.wikipedia.org/wiki/CBOR"
}
]
}
]
}

You can programmatically set the start and end points of a facet with regexes. Here's a script that parses mentions and links:

import re
from typing import List, Dict

def parse_mentions(text: str) -> List[Dict]:
spans = []
# regex based on: https://atproto.com/specs/handle#handle-identifier-syntax
mention_regex = rb"[$|\W](@([a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)"
text_bytes = text.encode("UTF-8")
for m in re.finditer(mention_regex, text_bytes):
spans.append({
"start": m.start(1),
"end": m.end(1),
"handle": m.group(1)[1:].decode("UTF-8")
})
return spans

def parse_urls(text: str) -> List[Dict]:
spans = []
# partial/naive URL regex based on: https://stackoverflow.com/a/3809435
# tweaked to disallow some training punctuation
url_regex = rb"[$|\W](https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*[-a-zA-Z0-9@%_\+~#//=])?)"
text_bytes = text.encode("UTF-8")
for m in re.finditer(url_regex, text_bytes):
spans.append({
"start": m.start(1),
"end": m.end(1),
"url": m.group(1).decode("UTF-8"),
})
return spans

Once the facet segments have been parsed out, we can then turn them into app.bsky.richtext.facet objects.

# Parse facets from text and resolve the handles to DIDs
def parse_facets(text: str) -> List[Dict]:
facets = []
for m in parse_mentions(text):
resp = requests.get(
"https://bsky.social/xrpc/com.atproto.identity.resolveHandle",
params={"handle": m["handle"]},
)
# If the handle can't be resolved, just skip it!
# It will be rendered as text in the post instead of a link
if resp.status_code == 400:
continue
did = resp.json()["did"]
facets.append({
"index": {
"byteStart": m["start"],
"byteEnd": m["end"],
},
"features": [{"$type": "app.bsky.richtext.facet#mention", "did": did}],
})
for u in parse_urls(text):
facets.append({
"index": {
"byteStart": u["start"],
"byteEnd": u["end"],
},
"features": [
{
"$type": "app.bsky.richtext.facet#link",
# NOTE: URI ("I") not URL ("L")
"uri": u["url"],
}
],
})
return facets

The list of facets gets attached to the facets field of the post record:

post["text"] = "✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR."
post["facets"] = parse_facets(post["text"])

Replies, Quote Posts, and Embeds

Replies and quote posts contain strong references to other records. A strong reference is a combination of:

  • AT URI: indicates the repository DID, collection, and record key
  • CID: the hash of the record itself

Posts can have several types of embeds: record embeds, images and exernal embeds (like link/webpage cards, which is the preview that shows up when you post a URL).

Replies

A complete reply post record looks like:

{
"$type": "app.bsky.feed.post",
"text": "example of a reply",
"createdAt": "2023-08-07T05:49:40.501974Z",
"reply": {
"root": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
"cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
},
"parent": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
"cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
}
}
}

Since threads of replies can get pretty long, reply posts need to reference both the immediate "parent" post and the original "root" post of the thread.

Here's a Python script to find the parent and root values:

# Resolve the parent record and copy whatever the root reply reference there is
# If none exists, then the parent record was a top-level post, so that parent reference can be reused as the root value
def get_reply_refs(parent_uri: str) -> Dict:
uri_parts = parse_uri(parent_uri)

resp = requests.get(
"https://bsky.social/xrpc/com.atproto.repo.getRecord",
params=uri_parts,
)
resp.raise_for_status()
parent = resp.json()

parent_reply = parent["value"].get("reply")
if parent_reply is not None:
root_uri = parent_reply["root"]["uri"]
root_repo, root_collection, root_rkey = root_uri.split("/")[2:5]
resp = requests.get(
"https://bsky.social/xrpc/com.atproto.repo.getRecord",
params={
"repo": root_repo,
"collection": root_collection,
"rkey": root_rkey,
},
)
resp.raise_for_status()
root = resp.json()
else:
# The parent record is a top-level post, so it is also the root
root = parent

return {
"root": {
"uri": root["uri"],
"cid": root["cid"],
},
"parent": {
"uri": parent["uri"],
"cid": parent["cid"],
},
}

The root and parent refs are stored in the reply field of posts:

post["reply"] = get_reply_refs("at://atproto.com/app.bsky.feed.post/3k43tv4rft22g")

Quote Posts

A quote post embeds a reference to another post record. A complete quote post record would look like:

{
"$type": "app.bsky.feed.post",
"text": "example of a quote-post",
"createdAt": "2023-08-07T05:49:39.417839Z",
"embed": {
"$type": "app.bsky.embed.record",
"record": {
"uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k44deefqdk2g",
"cid": "bafyreiecx6dujwoeqpdzl27w67z4h46hyklk3an4i4cvvmioaqb2qbyo5u"
}
}
}

The record embedded here is the post that's getting quoted. The post record type is app.bsky.feed.post, but you can also embed other record types in a post, like lists (app.bsky.graph.list) and feed generators (app.bsky.feed.generator).

Images Embeds

Images are also embedded objects in a post. This example code demonstrates reading an image file from disk and uploading it, capturing a blob in the response:

IMAGE_PATH = "./example.png"
IMAGE_MIMETYPE = "image/png"
IMAGE_ALT_TEXT = "brief alt text description of the image"

with open(IMAGE_PATH, "rb") as f:
img_bytes = f.read()

# this size limit is specified in the app.bsky.embed.images lexicon
if len(img_bytes) > 1000000:
raise Exception(
f"image file size too large. 1000000 bytes maximum, got: {len(img_bytes)}"
)

# TODO: strip EXIF metadata here, if needed

resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Content-Type": IMAGE_MIMETYPE,
"Authorization": "Bearer " + session["accessJwt"],
},
data=img_bytes,
)
resp.raise_for_status()
blob = resp.json()["blob"]

The blob object, as JSON, would look something like:

{
"$type": "blob",
"ref": {
"$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
},
"mimeType": "image/png",
"size": 760898
}

The blob is then included in a app.bsky.embed.images array, along with an alt-text string. The alt field is required for each image. Pass an empty string if there is no alt text available.

post["embed"] = {
"$type": "app.bsky.embed.images",
"images": [{
"alt": IMAGE_ALT_TEXT,
"image": blob,
}],
}

A complete post record, containing two images, would look something like:

{
"$type": "app.bsky.feed.post",
"text": "example post with multiple images attached",
"createdAt": "2023-08-07T05:49:35.422015Z",
"embed": {
"$type": "app.bsky.embed.images",
"images": [
{
"alt": "brief alt text description of the first image",
"image": {
"$type": "blob",
"ref": {
"$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
},
"mimeType": "image/webp",
"size": 760898
}
},
{
"alt": "brief alt text description of the second image",
"image": {
"$type": "blob",
"ref": {
"$link": "bafkreif3fouono2i3fmm5moqypwskh3yjtp7snd5hfq5pr453oggygyrte"
},
"mimeType": "image/png",
"size": 13208
}
}
]
}
}

Each post contains up to four images, and each image can have its own alt text and is limited to 1,000,000 bytes in size. Image files are referenced by posts, but are not actually included in the post (eg, using bytes with base64 encoding). The image files are first uploaded as "blobs" using com.atproto.repo.uploadBlob, which returns a blob metadata object, which is then embedded in the post record itself.

It's strongly recommended best practice to strip image metadata before uploading. The server (PDS) may be more strict about blocking upload of such metadata by default in the future, but it is currently the responsibility of clients (and apps) to sanitize files before upload today.

Website Card Embeds

A website card embed, often called a "social card," is the rendered preview of a website link. A complete post record with an external embed, including image thumbnail blob, looks like:

{
"$type": "app.bsky.feed.post",
"text": "post which embeds an external URL as a card",
"createdAt": "2023-08-07T05:46:14.423045Z",
"embed": {
"$type": "app.bsky.embed.external",
"external": {
"uri": "https://bsky.app",
"title": "Bluesky Social",
"description": "See what's next.",
"thumb": {
"$type": "blob",
"ref": {
"$link": "bafkreiash5eihfku2jg4skhyh5kes7j5d5fd6xxloaytdywcvb3r3zrzhu"
},
"mimeType": "image/png",
"size": 23527
}
}
}
}

Here's an example of embedding a website card:

from bs4 import BeautifulSoup

def fetch_embed_url_card(access_token: str, url: str) -> Dict:

# the required fields for every embed card
card = {
"uri": url,
"title": "",
"description": "",
}

# fetch the HTML
resp = requests.get(url)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")

# parse out the "og:title" and "og:description" HTML meta tags
title_tag = soup.find("meta", property="og:title")
if title_tag:
card["title"] = title_tag["content"]
description_tag = soup.find("meta", property="og:description")
if description_tag:
card["description"] = description_tag["content"]

# if there is an "og:image" HTML meta tag, fetch and upload that image
image_tag = soup.find("meta", property="og:image")
if image_tag:
img_url = image_tag["content"]
# naively turn a "relative" URL (just a path) into a full URL, if needed
if "://" not in img_url:
img_url = url + img_url
resp = requests.get(img_url)
resp.raise_for_status()

blob_resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Content-Type": IMAGE_MIMETYPE,
"Authorization": "Bearer " + access_token,
},
data=resp.content,
)
blob_resp.raise_for_status()
card["thumb"] = blob_resp.json()["blob"]

return {
"$type": "app.bsky.embed.external",
"external": card,
}

An external embed is stored under embed like all the others:

post["embed"] = fetch_embed_url_card(session["accessJwt"], "https://bsky.app")

On Bluesky, each client fetches and embeds this card metadata, including blob upload if needed. Embedding the card content in the record ensures that it appears consistently to everyone and reduces waves of automated traffic being sent to the referenced website, but it does require some extra work by the client.

Putting It All Together

A complete script, with command-line argument parsing, is available from this Git repository.

As mentioned at the beginning, we expect most folks will use SDKs or libraries for their programming language of choice to help with most of the details described here. But sometimes it is helpful to see what is actually going on behind the abstractions.

· 6 min read

SkyFeed is a third-party client built by redsolver. Users can create a dashboard out of their feeds, profiles, and more. Additionally, while custom feeds currently take some developer familiarity to build from scratch, SkyFeed allows Bluesky users to easily build their own custom feeds based off of regexes or lists.

You can try SkyFeed yourself here, and follow SkyFeed’s Bluesky account for updates.

Screenshot of SkyFeed


Can you share a bit about yourself and your background?

Hi, I’m redsolver, a developer from Germany. In the past I tried building a decentralized social network twice, but both times it failed, most recently due to the decentralized storage layer (Skynet) just shutting down completely. So last year I started working on a new content-addressed storage network myself with all features needed for a truly reliable social network. I'm still actively working and building open-source apps like an end-to-end-encrypted cloud storage app on top of it, but instead of building yet another social network from scratch, I decided to focus on building cool stuff for atproto/Bluesky. The AT Protocol shares many ideas with my previous attempts (like decentralized identity) and is already a lot more mature.

What is SkyFeed?

There's the SkyFeed app, which is a third-party web client (cross-platform soon) for using Bluesky. Some users compare the experience to TweetDeck. A unique feature is that it subscribes to a custom minimal version of the Bluesky firehose (all events happening on the network). This makes it possible to have all like/reply/repost counts update in real-time and new posts pop up in near real-time everywhere in the app! Another cool feature is the collapsible thread view which makes following big discussions a lot easier.

But most users are using the app because of the integrated SkyFeed Builder, a tool to make building feeds easier for both developers and non-developers. It's really exciting watching a very diverse set of users build the over 6,000 feeds that are already published using the builder! The SkyFeed web app is available at https://skyfeed.app/.

Screenshot of SkyFeed Builder

What inspired you to build SkyFeed?

As mentioned earlier, I've been really interested in decentralized social networks for a while. After getting a Bluesky invite and reading the atproto docs, the tech really caught my interest.

There were already quite a few third-party clients, but none of them were written in Flutter (my favorite framework). So I started working on a new one, both for getting a better feel of the Bluesky internals and because I wanted a desktop client that I personally enjoy using daily. Even though the first release was missing quite a lot of important features (like notifications), the positive feedback motivated me to continue building.

When the Bluesky team published the custom feed spec and the feed generator starter kit, things really took off. I made some feeds and added experimental support for using them to the SkyFeed app. They are an awesome concept and in my opinion really give Bluesky the edge over competing networks. It makes content discovery so much easier, because no algorithm or AI has more relevant suggestions than highly engaged users building elaborate feeds for any and all niche interests they have. So the reason I made the SkyFeed Builder was to give this power to as many people as possible. And what inspires me to continue building and improving SkyFeed is all the positive feedback and happy users :)

What tech stack is SkyFeed built on?

The SkyFeed app is built using the Flutter framework and the Dart programming language. I'm using the excellent Dart atproto/Bluesky packages, created by Shinya Kato. Most of the backend is written in Dart and running on some Hetzner servers, the feed generator proxy and cache were recently moved to fly.io for better scalability. I'm running multiple open-source indexers which listen to the entire network firehose and store everything in an instance of SurrealDB. SurrealDB is still in beta, but it's fun to work with! And apart from some performance issues, it has been pretty reliable. The query engine for the SkyFeed Builder feeds is written in Rust and open-source too. It keeps all posts from the last 7 days and their metadata in memory and then executes all of the SkyFeed Builder steps/blocks. Additional metadata (like the full post history for a single user) are fetched on demand from SurrealDB.

What's in the future for SkyFeed?

  • New "Remix" feature to edit, improve and re-publish any SkyFeed Builder feed (as long as it has an open license)
  • Make it easier to self-host the SkyFeed Builder infrastructure and get some third-party providers online. This will give users more choice and make the whole feed ecosystem more reliable and robust
  • Add support for personalized feeds and SurrealQL queries to the builder, but they are very resource-intensive so will likely be invite-only (but self-hosting always works of course!)
  • Improve the SkyFeed app, get a nice new logo, fully open-source it and release cross-platform (Android, iOS, Linux, Windows, macOS)
  • Support for videos, audio and other media content with a new custom lexicon in a backwards-compatible way. They will use the storage network I'm working on, but with an atproto-compatible blob format. The main difference is that it uses the BLAKE3 hash function instead of SHA256 and has no file size limit
  • A self-hosted proxy which bridges other social networks (Mastodon, Nostr, RSS, Hacker News) and makes them available in any Bluesky client. Reddit and "X" might be supported too, but with a bring-your-own-API-key requirement. The proxy also adds more features like advanced (word) muting, an audit log to see exactly which changes third-party apps made to your account and the option to use a self-hosted "App View" (basically the SkyFeed Indexer with SurrealDB)
  • A new List Builder (based on profile, name, follower count and more) as soon as lists other than Mute Lists are supported

In summary: Make SkyFeed (apps, builder and more) the ultimate power user experience, while open-sourcing everything and keeping the option to self-host all components.


You can follow redsolver on Bluesky here, SkyFeed for project updates here, and be sure to try out SkyFeed yourself here.

Note: Use an App Password when logging in to third-party tools for account security and read our disclaimer for third-party applications.

· One min read

Bluesky is an open social network built on the AT Protocol, a flexible technology that will never lock developers out of the ecosystems that they help build. With atproto, third-party can be as seamless as first-party through custom feeds, federated services, clients, and more.

If you're a developer interested in building on atproto, we'd love to email you an invite code. Simply share your GitHub (or similar) profile with us via this form.

Read more about the AT Protocol here and check out some third-party developer projects here.

· 7 min read

Welcome to the atproto federation developer sandbox! ✨

This is a completely separate network from our production services that allows us to test out the federation architecture and wire protocol.

The federation sandbox environment is an area set up for exploration and testing of the technical components of the AT Protocol distributed social network. It is intended for developers and self-hosters to test out data availability in a federated environment.

To maintain a positive and productive developer experience, we've established this Code of Conduct that outlines our expectations and guidelines. This sandbox environment is initially meant to test the technical components of federation.

Given that this is a testing environment, we will be defederating from any instances that do not abide by these guidelines, or that cause unnecessary trouble, and will not be providing specific justifications for these decisions.

Using the sandbox environment means you agree to adhere to our Guidelines. Please read the following carefully:

Post responsibly. The sandbox environment is intended to test infrastructure, but user content may be created as part of this testing process. Content generation can be automated or manual. Do not post content that requires active moderation or violates the Bluesky Community Guidelines.

Keep the emphasis on testing. We’re striving to maintain a sandbox environment that fosters learning and technical growth. We will defederate with instances that recruit users without making it clear that this is a test environment.

Do limit account creation. We don't want any one server using a majority of the resources in the sandbox. To keep things balanced, to start, we’re only federating with Personal Data Servers (PDS) with up to 1000 accounts. However, we may change this if needed.

Don’t expect persistence or uptime. We will routinely be wiping the data on our infrastructure. This is intended to reset the network state and to test sync protocols. Accounts and content should not be mirrored or migrated between the sandbox and real-world environments.

Don't advertise your service as being "Bluesky." This is a developer sandbox and is meant for technical users. Do not promote your service as being a way for non-technical users to use Bluesky.

Do not mirror sandbox did:plcs to production.

Status and Wipes

🐉 Beware of dragons!

This hasn’t been production tested yet. It seems to work pretty well, but who knows what’s lurking under the surface — that's what this sandbox is for! Have patience with us as we prep for federation.

On that note, please give us feedback either in Issues (actual bugs) or Discussions (higher-level questions/discussions) on the atproto repo.

🗓 Routine wipes

As part of the sandbox, we will be doing routine wipes of all network data.

We expect to perform wipes on a weekly or bi-weekly basis, though we reserve the right to do a wipe at any point.

When we wipe data, we will be wiping it on all services (BGS, App View, PLC). We will also mark any existing DIDs as “invalid” & will refuse to index those accounts in the next epoch of the network to discourage users from attempting to “rollover” their accounts across wipes.

Getting started ✨

Now that you've read the sandbox guidelines, you're ready to self-host a PDS in the developer sandbox. For complete instructions on getting your PDS set up, check out the README.

To access your account, you’ll log in with the client of your choice in the exact same way that you log into production Bluesky, for instance the Bluesky web client. When you do so, please provide the url of your PDS as the service that you wish to log in to.

Auto-updates

We’ve included Watchtower in the PDS distribution. Every day at midnight PST, this will check our GitHub container registry to see if there is a new version of the PDS container & update it on your service.

This will allow us to rapidly iterate on protocol changes, as we’ll be able to push them out to the network on a daily basis.

When we do routine network wipes, we will be pushing out a database migration to participating PDS that wipes content and accounts.

You are within your rights to disable Watchtower auto-updates, but we strongly encourage their use and will not be providing support if you decide not to run the most up-to-date PDS distribution.

Odds & Ends & Warnings & Reminders

🧪 Experiment & have fun!

🤖 Run feed generators. They should work the exact same way as production - be sure to adjust your env to listen to Sandbox BGS!

🌈 Feel free to run your own AppView or BGS - although it’s a bit more involved & we’ll be providing limited support for this.

👤 Your PDS will provide your handle by default. Custom domain handles should work exactly the same in sandbox as they do on production Bluesky. Although you will not be able to re-use your handle from production Bluesky as you can only have one DID set per handle.

🚨 If you follow the self-hosted PDS setup instructions, you’ll have private key material in your env file - be careful about sharing that!

📣 This is a sandbox version of a public broadcast protocol - please do not share sensitive information.

🤝 Help each other out! Respond to issues & discussions, chat in the community-run Matrix or Discord, etc.

Learn more about atproto federation

Check out the high-level view of federation.

Dive deeper with the atproto docs.

Network Services

We are running three services: PLC, BGS, Bluesky "App View"

PLC

Hostname: plc.bsky-sandbox.dev

Code: https://github.com/did-method-plc/did-method-plc

PLC is the default DID provider for the network. DIDs are the root of your identity in the network. Sandbox PLC functions exactly the same as production PLC, but it is run as a separate service with a separate dataset. The DID resolution client in the self-hosted PDS package is set up to talk the correct PLC service.

BGS

Hostname: bgs.bsky-sandbox.dev

Code: https://github.com/bluesky-social/indigo/tree/main/bgs

BGS (Big Graph Service) is the firehose for the entire network. It collates data from PDSs & rebroadcasts them out on one giant websocket.

BGS has to find out about your server somehow, so when we do any sort of write, we ping BGS with com.atproto.sync.requestCrawl to notify it of new data. This is done automatically in the self-hosted PDS package.

If you’re familiar with the Bluesky production firehose, you can subscribe to the BGS firehose in the exact same manner, the interface & data should be identical

Bluesky App View

Hostname: api.bsky-sandbox.dev

Code: https://github.com/bluesky-social/atproto/tree/main/packages/bsky

The Bluesky App View aggregates data from across the network to service the Bluesky microblogging application. It consumes the firehose from the BGS, processing it into serviceable views of the network such as feeds, post threads, and user profiles. It functions as a fairly traditional web service.

When you request a Bluesky-related view from your PDS (getProfile for instance), your PDS will actually proxy the request up to App View.

Feel free to experiment with running your own App View if you like!

The PDS

The PDS (Personal Data Server) is where users host their social data such as posts, profiles, likes, and follows. The goal of the sandbox is to federate many PDS together, so we hope you’ll run your own.

We’re not actually running a Bluesky PDS in sandbox. You might see Bluesky team members' accounts in the sandbox environment, but those are self-hosted too.

The PDS that you’ll be running is much of the same code that is running on the Bluesky production PDS. Notably, all of the in-pds-appview code has been torn out. You can see the actual PDS code that you’re running on the atproto/simplify-pds branch.

Feedback

We're excited for you to join us in the developer sandbox soon! Please give us feedback either in Issues (actual bugs) or Discussions (higher-level questions/discussions) on the atproto repo.

· 11 min read

The technical implementation of public blocks and some possibilities for more privacy preserving block implementations — an area of active research and experimentation.


In April, we shipped a block feature to all users. Unlike on other centralized platforms, blocks on Bluesky are public and enumerable data, because all servers across the network need to know that they exist in order to respect the user’s request.

The current system of public blocks is just one aspect of our composable moderation stack, which we are actively building during our beta period. We’re working on more sophisticated individual and community-level interaction controls and moderation tooling, and we also encourage third-party community developers to contribute to this ecosystem.

In this post, we’ll share the technical implementation of public blocks and discuss some possibilities for more privacy preserving block implementations — an area of active research and experimentation. We welcome community suggestions, so if you have a proposal to share with us on how to implement private blocks after you read this post, please contribute to our public discussion here.

What are blocks?

At an abstract level, across many social media platforms, blocks between two accounts usually have the following features:

  • Symmetric: the behavior is the same regardless of which account initiated a block first
  • Mutual mute: neither account can read any content (public or private) from the other account, while logged in
  • Mutual interaction block: direct interactions between the two accounts are not allowed. This includes direct mentions resulting in a notification, replies to posts, direct messages (DMs), and follows (which normally result in notifications).

Blocks add a significant and high-impact degree of friction. There are many cases where this friction alone is sufficient to de-escalate conflict.

However, it is important to note that blocking does not prevent all possible interaction (even on centralized social networks). For example, when content is public, as it is on Bluesky, blogs, or websites, blocked people can still easily access the content by simply logging out or opening an incognito browser tab. Posts can still be screenshotted and shared either on-network or off-network. Harassment can continue to occur even without direct mentions or replies (”subtweeting,” posting screenshots, etc.).

On most existing services, the blockee can detect that they’ve been blocked, though it may not be immediately obvious. For example, if they’re able to navigate to the blocker’s profile, they may see a screen that says they’ve been blocked, or the absence of the profile is indication enough that they have been blocked. Most social apps provide each user with a list of the accounts that they have blocked.

You can read more about blocking behaviors on other platforms:

How are blocks currently implemented in Bluesky?

Blocks prevent interaction. Blocked accounts will not be able to like, reply, mention, or follow you, and if they navigate directly to your profile, they will see that they have been blocked. Like other public social networks, if they log out of their account or use a different account, they will be able to view your content. (This much is standard across centralized social networks as well.)

Currently, on Bluesky, you can view a list of your blocked accounts, and while the list of people who have blocked you is not surfaced in the app, developers familiar with the API could crawl the network to parse this information. This section will dive into the technical constraints that cause blocks to be public, and in a later section, we’ll discuss possible alternative implementations.

Blocks in Bluesky are implemented as part of the app.bsky.* application protocol, which builds on top of the underlying AT Protocol (atproto). Blocks are a record stored in account repositories. They look and behave very similarly to “follows”: the app.bsky.graph.block and app.bsky.graph.follow record schemas are nearly identical.

The block behavior is then implemented by several pieces of software. Servers and clients will index the block records and prevent actions which would have violated the intended behaviors: posts will not appear in feeds and reply threads; profile fetches will be empty or annotated with block state; creation of reply posts, quote posts, embeds, and mentions are blocked; any notifications involving the other account are additionally suppressed.

One of the core principles of the AT Protocol, which Bluesky is built on, is that account holders have total control over their own data. This means that while protocol-compliant clients and servers prevent blocked accounts from creating replies or other disallowed records in each user’s data repository, it is technically possible to bypass those restrictions if a client refuses to be protocol-compliant. The act of being blocked also does not result in any change to the blockee’s repository, and any old replies or mentions remain in place, untouched. For example, in the user-facing app, if someone replies to your post and then you block them, their replies will now be hidden to you. If you later decide to unblock them, their replies to that post will appear again, because the replies themselves were not deleted.

Despite blocks not removing the content of other user’s repositories, the data is not shown because blocks are primarily enforced by other nodes and services — personal data servers (PDS), App Views, and clients. One side effect that comes out of this architecture is that follow relationships are not changed due to a block, and “soft blocks” (rapid block/unblock) do not work as a mechanism to remove a follower. While a follow relationship might still exist in the graph, the block prevents any actual viewing or delivery of content. As future work, we can also ensure that details such as ”like” counts and “follower” accounts are updated when block status changes.

How will blocks work with open federation?

Bluesky is a public social network built on a protocol to support public conversation, so similar to blogs and websites, you do not need a Bluesky account in order to see content posted to the app. In order to support open federation where many servers, clients, and App Views are collaborating to surface content to users, each account’s data repository — which contains information like follows and blocks — must be public. All of the servers across the network must be able to read the data. Servers must know which accounts you have blocked in order to be able to enforce that relationship.

Once we launch federation there will be many personal data servers (PDS), clients, and App Views. The expectation is that virtually all accounts will be using clients and servers that respect blocking behavior.

It is this need for multiple parties to coordinate that necessitates blocks being public. “Mute” behavior can be implemented entirely in a client app because it only impacts the view of the local account holder. Blocks require coordination and enforcement by other parties, because the views and actions of multiple (possibly antagonistic) parties are involved.

In theory, a bad actor could create their own rogue client or interface which ignores some of the blocking behaviors, since the content is posted to a public network. But showing content or notifications to the person who created the block won’t be possible, as that behavior is controlled by their own PDS and client. It’s technically possible for a rogue client to create replies and mentions, but they would be invisible or at least low-impact to the recipient account for the same reasons. Protocol-compliant software in the ecosystem will keep such content invisible to other accounts on the network. If a significant fraction of accounts elected to use noncompliant rogue infrastructure, we would consider that a failure of the entire ecosystem.

Remember that clever bypasses of the blocking behaviors are already possible on most networks (centralized or not), and it is the added friction that matters.

Are there other ways to implement blocks in federated systems?

Yes, and we are actively exploring other implementations and novel research areas to inform our development on the AT Protocol. We also welcome community suggestions and discussions on this topic.

One example is ActivityPub, which is the protocol that Mastodon is built on. ActivityPub does not require public blocks because content there is not globally public by default — this is also why picking which server you join matters, because it limits the content that you see. Despite this, Mastodon does sometimes show block information to other parties, which is a frequent topic of discussion in the ActivityPub ecosystem.

As we currently understand it, on Mastodon, you only see content when there is an explicit follow relationship between accounts and servers, and follows require mutual consent. (In practice, most follow requests are auto-accepted, so this behavior is not always obvious to end users.) The mutual-mute behavior that blocks require can be implemented on Mastodon by first, disallowing any follows between the two accounts, and second, by adding a regular “mute.” Similar to Bluesky, the interaction-block behavior relies on enforcement by both the server and the client. So on Mastodon too, it’s possible that a bad actor implements a server that ignores blocks and displays blocked replies in threads. Both ActivityPub and AT Protocol can use de-federation as an enforcement mechanism to disconnect from servers that don’t respect blocks.

Technical approaches we’ve considered for private blocks

One proposed mechanism to make blocks less public on Bluesky is the use of bloom filters. The basic idea is to encode block relationships in a statistical data structure, and to distribute that data structure instead of the set of actual blocks. The data structure would make it easy to check if there was a block relationship between two specific accounts, but not make it easy to list all of the blocks. Other servers and clients in the network would then use the data structure to enforce the blocking behaviors. The bloom filters could either be per-account (aka, a bloom filter stored in a record), or per-PDS, or effectively global, with individual PDS instances submitting block relationships to a trusted central service which would publish the bloom filter lists. We considered a scheme like this before implementing blocks, but there are a few issues and concerns:

  • Bloom filters don’t fully prevent enumerating blocks, and if a bad actor was only interested in specific accounts, they could still easily find the list of blocked accounts. Bloom filters really only add a mask, and it would still be relatively easy to enumerate blocks. While the full matrix of possible block relationships is NxN (where N is the number of accounts in the network, which could ultimately be upwards of hundreds of millions in the future) might be too large to test against, in reality, a bad actor would likely only be targeting prominent accounts or specific communities. In that case, only on the order of billions of possible relationships would need to be tested, which would be trivial on modern hardware.
  • Bloom filters are computationally expensive. While bloom filters are known for efficiently reducing the storage size for looking up a large number of hashes, they have a large overhead compared to individual hashes. In the context of blocks, every creation or deletion of a block record would potentially require the generation and distribution of a full-sized bloom filter. The storage and bandwidth overhead becomes significant at scale, especially since a significant fraction of social media accounts could have many thousands of blocks.
  • Latency problems persist in mitigations for bloom filter overhead. The above storage and bandwidth concerns could be mitigated by “batching,” or through a trusted central service. But those solutions have their own problems with latency (time until block is enforced across the network) and trust and reliability (in a central service, which would have the full enumeration of block relationships).

The team is still actively discussing this option, and it’s possible that the extra effort and resources required by bloom filters is worth the imperfect but additional friction that they provide. At the moment, it’s not entirely obvious to us that the tradeoff is worth it. While we’re currently iterating on other moderation and account safety features, we decided to initially release blocks with this simple public system as a first pass.

Some other proposals we’re exploring include:

  • Label-based block enforcement. Instead of trying to prevent all violations of blocking relationships across the network, scan for violations of them and label them.
  • Interaction gating. Place authority for post threads and quote posts in the original poster’s PDS, so block information doesn’t need to leave that server.
  • Zero-knowledge proofs. We’re aware of existing ZK approaches to distributed blocks, such as SNARKBlock, and we’re speaking with trusted advisors about this open area of research and experimentation. Perhaps this research might lead to us deploying a novel system in the future.
  • Trusted App Views. Accounts could privately register their blocks with their PDS, and then these servers would forward block metadata to a small number of “blessed” App Views.

If you have experience here or have thoughts about how to implement private block relationships in decentralized systems, we’d love to hear from you. Please contribute to our discussion here.