2024-08-12T00:00:00.000Z

Typescript API Package Auth Refactor

August 12, 2024 · 11 min read

Today we are merging some changes to how the TypeScript @atproto/api package works with authentication sessions. The changes are mostly backwards compatible, but some parts are now deprecated, and there are some breaking changes for advanced uses.

The motivation for these changes is the need to make the @atproto/api package compatible with OAuth session management. We don't have OAuth client support "launched" and documented quite yet, so you can keep using the current app password authentication system. When we do "launch" OAuth support and begin encouraging its usage in the near future (see the OAuth Roadmap), these changes will make it easier to migrate.

In addition, the redesigned session management system fixes a bug that could cause the session data to become invalid when Agent clones are created (e.g. using agent.withProxy()).

New Features

We've restructured the XrpcClient HTTP fetch handler to be specified during the instantiation of the XRPC client, through the constructor, instead of using a default implementation (which was statically defined).

With this refactor, the XRPC client is now more modular and reusable. Session management, retries, cryptographic signing, and other request-specific logic can be implemented in the fetch handler itself rather than by the calling code.

A new abstract class named Agent, has been added to @atproto/api. This class will be the base class for all Bluesky agents classes in the @atproto ecosystem. It is meant to be extended by implementations that provide session management and fetch handling. Here is the class hierarchy:

AT protocol api class hierarchy

As you adapt your code to these changes, make sure to use the Agent type wherever you expect to receive an agent, and use the AtpAgent type (class) only to instantiate your client. The reason for this is to be forward compatible with the OAuth agent implementation that will also extend Agent, and not AtpAgent.

import { Agent, AtpAgent } from '@atproto/api'

async function setupAgent(service: string, username: string, password: string): Promise<Agent> {
  const agent = new AtpAgent({
    service,
    persistSession: (evt, session) => {
      // handle session update
    },
  })

  await agent.login(username, password)

  return agent
}

import { Agent } from '@atproto/api'

async function doStuffWithAgent(agent: Agent, arg: string) {
  return agent.resolveHandle(arg)
}

import { Agent, AtpAgent } from '@atproto/api'

class MyClass {
  agent: Agent

  constructor () {
    this.agent = new AtpAgent()
  }
}

Breaking changes

Most of the changes introduced in this version are backward-compatible. However, there are a couple of breaking changes you should be aware of:

Customizing fetch: The ability to customize the fetch: FetchHandler property of @atproto/xrpc's Client and @atproto/api's AtpAgent classes has been removed. Previously, the fetch property could be set to a function that would be used as the fetch handler for that instance, and was initialized to a default fetch handler. That property is still accessible in a read-only fashion through the fetchHandler property and can only be set during the instance creation. Attempting to set/get the fetch property will now result in an error.
The fetch() method, as well as WhatWG compliant Request and Headers constructors, must be globally available in your environment. Use a polyfill if necessary.
The AtpBaseClient has been removed. The AtpServiceClient has been renamed AtpBaseClient. Any code using either of these classes will need to be updated.
Instead of wrapping an XrpcClient in its xrpc property, the AtpBaseClient (formerly AtpServiceClient) class - created through lex-cli - now extends the XrpcClient class. This means that a client instance now passes the instanceof XrpcClient check. The xrpc property now returns the instance itself and has been deprecated.
setSessionPersistHandler is no longer available on the AtpAgent or BskyAgent classes. The session handler can only be set though the persistSession options of the AtpAgent constructor.
The new class hierarchy is as follows:
- BskyAgent extends AtpAgent: but add no functionality (hence its deprecation).
- AtpAgent extends Agent: adds password based session management.
- Agent extends AtpBaseClient: this abstract class that adds syntactic sugar methods app.bsky lexicons. It also adds abstract session management methods and adds atproto specific utilities (labelers & proxy headers, cloning capability) - AtpBaseClient extends XrpcClient: automatically code that adds fully typed lexicon defined namespaces (instance.app.bsky.feed.getPosts()) to the XrpcClient.
- XrpcClient is the base class.

Non-breaking changes

The com.* and app.* namespaces have been made directly available to every Agent instances.

Deprecations

The default export of the @atproto/xrpc package has been deprecated. Use named exports instead.
The Client and ServiceClient classes are now deprecated. They are replaced by a single XrpcClient class.
The default export of the @atproto/api package has been deprecated. Use named exports instead.
The BskyAgent has been deprecated. Use the AtpAgent class instead.
The xrpc property of the AtpClient instances has been deprecated. The instance itself should be used as the XRPC client.
The api property of the AtpAgent and BskyAgent instances has been deprecated. Use the instance itself instead.

Migration

The `@atproto/api` package

If you were relying on the AtpBaseClient solely to perform validation, use this:

Before After

import { AtpBaseClient, ComAtprotoSyncSubscribeRepos } from '@atproto/api'

const baseClient = new AtpBaseClient()

baseClient.xrpc.lex.assertValidXrpcMessage('io.example.doStuff', {
  // ...
})

import { lexicons } from '@atproto/api'

lexicons.assertValidXrpcMessage('io.example.doStuff', {
  // ...
})

If you are extending the BskyAgent to perform custom session manipulation, define your own Agent subclass instead:

Before After

import { BskyAgent } from '@atproto/api'

class MyAgent extends BskyAgent {
  private accessToken?: string

  async createOrRefreshSession(identifier: string, password: string) {
    // custom logic here

    this.accessToken = 'my-access-jwt'
  }

  async doStuff() {
    return this.call('io.example.doStuff', {
      headers: {
        'Authorization': this.accessToken && `Bearer ${this.accessToken}`
      }
    })
  }
}

import { Agent } from '@atproto/api'

class MyAgent extends Agent {
  private accessToken?: string
  public did?: string

  constructor(private readonly service: string | URL) {
    super({
      service,
      headers: {
        Authorization: () =>
          this.accessToken ? `Bearer ${this.accessToken}` : null,
      }
    })
  }

  clone(): MyAgent {
    const agent = new MyAgent(this.service)
    agent.accessToken = this.accessToken
    agent.did = this.did
    return this.copyInto(agent)
  }

  async createOrRefreshSession(identifier: string, password: string) {
    // custom logic here

    this.did = 'did:example:123'
    this.accessToken = 'my-access-jwt'
  }
}

If you are monkey patching the xrpc service client to perform client-side rate limiting, you can now do this in the FetchHandler function:

Before After

import { BskyAgent } from '@atproto/api'
import { RateLimitThreshold } from "rate-limit-threshold"

const agent = new BskyAgent()
const limiter = new RateLimitThreshold(
  3000,
  300_000
)

const origCall = agent.api.xrpc.call
agent.api.xrpc.call = async function (...args) {
  await limiter.wait()
  return origCall.call(this, ...args)
}

import { AtpAgent } from '@atproto/api'
import { RateLimitThreshold } from "rate-limit-threshold"

class LimitedAtpAgent extends AtpAgent {
  constructor(options: AtpAgentOptions) {
    const fetch = options.fetch ?? globalThis.fetch
    const limiter = new RateLimitThreshold(
      3000,
      300_000
    )

    super({
      ...options,
      fetch: async (...args) => {
        await limiter.wait()
        return fetch(...args)
      }
    })
  }
}

If you configure a static fetch handler on the BskyAgent class - for example to modify the headers of every request - you can now do this by providing your own fetch function:

Before After

import { BskyAgent, defaultFetchHandler } from '@atproto/api'

BskyAgent.configure({
  fetch: async (httpUri, httpMethod, httpHeaders, httpReqBody) => {

    const ua = httpHeaders["User-Agent"]

    httpHeaders["User-Agent"] = ua ? `${ua} ${userAgent}` : userAgent

    return defaultFetchHandler(httpUri, httpMethod, httpHeaders, httpReqBody)
  }
})

import { AtpAgent } from '@atproto/api'

class MyAtpAgent extends AtpAgent {
  constructor(options: AtpAgentOptions) {
    const fetch = options.fetch ?? globalThis.fetch

    super({
      ...options,
      fetch: async (url, init) => {
        const headers = new Headers(init.headers)

        const ua = headersList.get("User-Agent")
        headersList.set("User-Agent", ua ? `${ua} ${userAgent}` : userAgent)

        return fetch(url, { ...init, headers })
      }
    })
  }
}

The `@atproto/xrpc` package

The Client and ServiceClient classes are now deprecated. If you need a lexicon based client, you should update the code to use the XrpcClient class instead.

The deprecated ServiceClient class now extends the new XrpcClient class. Because of this, the fetch FetchHandler can no longer be configured on the Client instances (including the default export of the package). If you are not relying on the fetch FetchHandler, the new changes should have no impact on your code. Beware that the deprecated classes will eventually be removed in a future version.

Since its use has completely changed, the FetchHandler type has also completely changed. The new FetchHandler type is now a function that receives a url pathname and a RequestInit object and returns a Promise<Response>. This function is responsible for making the actual request to the server.

export type FetchHandler = (
  this: void,
  /**
   * The URL (pathname + query parameters) to make the request to, without the
   * origin. The origin (protocol, hostname, and port) must be added by this
   * {@link FetchHandler}, typically based on authentication or other factors.
   */
  url: string,
  init: RequestInit,
) => Promise<Response>

A noticeable change that has been introduced is that the uri field of the ServiceClient class has not been ported to the new XrpcClient class. It is now the responsibility of the FetchHandler to determine the full URL to make the request to. The same goes for the headers, which should now be set through the FetchHandler function.

If you do rely on the legacy Client.fetch property to perform custom logic upon request, you will need to migrate your code to use the new XrpcClient class. The XrpcClient class has a similar API to the old ServiceClient class, but with a few differences:

The Client + ServiceClient duality was removed in favor of a single XrpcClient class. This means that:
- There no longer exists a centralized lexicon registry. If you need a global lexicon registry, you can maintain one yourself using a new Lexicons (from @atproto/lexicon).
- The FetchHandler is no longer a statically defined property of the Client class. Instead, it is passed as an argument to the XrpcClient constructor.
The XrpcClient constructor now requires a FetchHandler function as the first argument, and an optional Lexicon instance as the second argument.
The setHeader and unsetHeader methods were not ported to the new XrpcClient class. If you need to set or unset headers, you should do so in the FetchHandler function provided in the constructor arg.

Before After

import client, { defaultFetchHandler } from '@atproto/xrpc'

client.fetch = function (
  httpUri: string,
  httpMethod: string,
  httpHeaders: Headers,
  httpReqBody: unknown,
) {
  // Custom logic here
  return defaultFetchHandler(httpUri, httpMethod, httpHeaders, httpReqBody)
}

client.addLexicon({
  lexicon: 1,
  id: 'io.example.doStuff',
  defs: {},
})

const instance = client.service('http://my-service.com')

instance.setHeader('my-header', 'my-value')

await instance.call('io.example.doStuff')

import { XrpcClient } from '@atproto/xrpc'

const instance = new XrpcClient(
  async (url, init) => {
    const headers = new Headers(init.headers)

    headers.set('my-header', 'my-value')

    // Custom logic here

    const fullUrl = new URL(url, 'http://my-service.com')

    return fetch(fullUrl, { ...init, headers })
  },
  [
    {
      lexicon: 1,
      id: 'io.example.doStuff',
      defs: {},
    },
  ],
)

await instance.call('io.example.doStuff')

If your fetch handler does not require any "custom logic", and all you need is an XrpcClient that makes its HTTP requests towards a static service URL, the previous example can be simplified to:

import { XrpcClient } from '@atproto/xrpc'

const instance = new XrpcClient('http://my-service.com', [
  {
    lexicon: 1,
    id: 'io.example.doStuff',
    defs: {},
  },
])

If you need to add static headers to all requests, you can instead instantiate the XrpcClient as follows:

import { XrpcClient } from '@atproto/xrpc'

const instance = new XrpcClient(
  {
    service: 'http://my-service.com',
    headers: {
      'my-header': 'my-value',
    },
  },
  [
    {
      lexicon: 1,
      id: 'io.example.doStuff',
      defs: {},
    },
  ],
)

If you need the headers or service url to be dynamic, you can define them using functions:

import { XrpcClient } from '@atproto/xrpc'

const instance = new XrpcClient(
  {
    service: () => 'http://my-service.com',
    headers: {
      'my-header': () => 'my-value',
      'my-ignored-header': () => null, // ignored
    },
  },
  [
    {
      lexicon: 1,
      id: 'io.example.doStuff',
      defs: {},
    },
  ],
)

2024-05-15T00:00:00.000Z

Labeling Services Microgrants

May 15, 2024 · 3 min read

We’re launching microgrants for labeling services on Bluesky!

Moderation is the backbone of healthy social spaces online. Bluesky has our own moderation team dedicated to providing around-the-clock coverage to uphold our community guidelines, and additionally, we recognize that there is no one-size-fits-all approach to moderation. No single company can get online safety right for every country, culture, and community in the world. So we’ve also been building something bigger — an ecosystem of moderation and open-source safety tools that gives communities power to create their own spaces, with their own norms and preferences.

Labeling services on Bluesky allow users and communities to participate in a stackable ecosystem of services. Users can create and subscribe to filters from independent moderation services, which are layered on top of Bluesky’s own service. You can read more about how stackable moderation works on Bluesky here.

Update: As of 2025 we are not currently accepting further grant applications.

To support the first labelers in our ecosystem and encourage more, we are launching a microgrants program for labeling services.

Program Details

For this program, we have an allocation of $10,000. We will be distributing $500 per labeling service that is approved for a grant.

The application has a rolling deadline, and we will announce both the recipients of the grants and when all of the grants have been distributed. We pay out grants via public GitHub Sponsorships.

In addition, we’ve also partnered with Amazon Web Services (AWS) to offer $5,000 in AWS Activate¹ credits to labeling services as well. These credits are applied to your AWS bill to help cover costs from cloud services, including machine learning, compute, databases, storage, containers, dev tools, and more. Simply check a box in your grant application if you’re interested in receiving these credits as well.

If you’re an organization interested in running a labeler but do not currently have the technical capacity to implement one, please reach out to our team at partnerships@blueskyweb.xyz. We may be able to assist in matching you with a developer.

Initial Labeling Grant Recipients

We're kicking off the program with grants to three initial recipients:

XBlock

@XBlock.aendra.dev is an attempt to help give users control over the types of content they see on Bluesky. Screenshots serve a variety of uses on social media, but quite often are intended to create discourse or drive dogpiles. By letting users toggle the visibility of screenshots from various platforms, XBlock aims to give users a "volume dial" for certain types of content.

Aegis

@aegis.blue is a volunteer-run labeling service providing community moderation predominantly to Bluesky's LGBTQIA+ and marginalized users. Featuring a diverse team of both industry and aspiring experts, Aegis lives by the motto, We Keep Us Safe. More info can be found on their website at https://aegis.blue/Home.

News Detective

News Detective fights misinformation by combining the experience of professional factcheckers with the wisdom of crowds. A crowd of volunteer factcheckers transparently investigates posts, and professional factcheckers make sure only the highest quality factchecks make it through the system. Users who use News Detective will be able to see factchecks (including explanations and sources) on posts they come across and request factchecks on posts they find questionable. They can also watch News Detectives discuss the posts and even participate in factchecking to create a more honest, democratic, and transparent internet. Incubated at MIT DesignX, MIT Sandbox, and HacksHackers.

Contact

Please feel free to leave questions or comments on the GitHub discussion for this announcement here, or on the Bluesky post here.

AWS Activate Credits are subject to the program's terms and conditions. ↩

2024-05-06T00:00:00.000Z

2024 Protocol Roadmap

May 6, 2024 · 11 min read

Discuss this post in our Github Discussion forums here

This roadmap is an update on our progress and lays out our general goals and focus for the coming months. This document is written for developers working on atproto clients, implementations, and applications (including Bluesky-specific projects). This is not a product announcement: while some product features are hinted at, we aren't promising specific timelines here. As always, most Bluesky software is free and open source, and observant folks can follow along with our progress week by week in GitHub.

In the big picture, we made a lot of progress on the protocol in early 2024. We opened up federation on the production network, demonstrated account migration, specified and launched stackable moderation (labeling and Ozone), shared our plan for OAuth, specified a generic proxying mechanism, built a new API documentation website (docs.bsky.app), and more.

After this big push on the protocol, the Bluesky engineering team is spending a few months catching up on some long-requested features like GIFs, video, and DMs. At the same time, we do have a few "enabling" pieces of protocol work underway, and continue to make progress towards a milestone of protocol maturity and stability.

Summary-level notes:

Federation is now open: you don't need to pre-register in Discord any more.
It is increasingly possible to build independent apps and integrations on atproto. One early example is https://whtwnd.com/, a blogging web app built on atproto.
The timeline for a formal standards body process is being pushed back until we have additional independent active projects building on the protocol.

Current Work

Proxying of Independent Lexicons: earlier this year we added a generic HTTP proxying mechanism, which allows clients to specify which onward service (eg, AppView) instance they want to communicate with. To date this has been limited to known Lexicons, but we will soon relax this restriction and make arbitrary XRPC query and procedure requests. Combined with allowing records with independent Lexicon schemas (now allowed), this finally enables building new independent atproto applications. PR for this work

Open Federation: the Bluesky Relay service initially required pre-registration before new PDS instances were crawled. This was a very informal process (using Discord) to prevent automated abuse, but we have removed this requirement, making it even easier to set up PDS instances. We will also bump the per-PDS account limits, though we will still enforce some limits to minimize automated abuse; these limits can be bumped for rapidly growing communities and projects.

Email 2FA: while OAuth is our main focus for improving account security (OAuth flows will enable arbitrary MFA, including passkeys, hardware tokens, authenticators, etc), we are rapidly rolling out a basic form of 2FA, using an emailed code in addition to account password for sign-in. This will be an optional opt-in functionality. Announcement with details

OAuth: we continue to make progress implementing our plan for OAuth. Ultimately this will completely replace the current account sign-up, session, and app-password API endpoints, though we will maintain backwards compatibility for a long period. With OAuth, account lifecycle, sign-in, and permission flows will be implementation-specific web views. This means that PDS implementations can add any sign-up screening or MFA methods they see fit, without needing support in the com.atproto.* Lexicons. Detailed Proposal

Product Features

These are not directly protocol-related, but are likely to impact many developers, so we wanted to give a heads up on these.

Harassment Mitigations: additional controls and mechanisms to reduce the prevalence, visibility, and impact of abusive mentions and replies, particularly coming from newly created single-purpose or throw-away accounts. May expand on the existing thread-gating and reply-gating functionality.

Post Embeds: the ability to embed Bluesky posts in external public websites. Including oEmbed support. This has already shipped! See embed.bsky.app

Basic "Off-Protocol" Direct Messages (DMs): having some mechanism to privately contact other Bluesky accounts is the most requested product feature. We looked closely at alternatives like linking to external services, re-using an existing protocol like Matrix, or rushing out on-protocol encrypted DMs, but ultimately decided to launch a basic centralized system to take the time pressure off our team and make our user community happy. We intend to iterate and fully support E2EE DMs as part of atproto itself, without a centralized service, and will take the time to get the user experience, security, and privacy polished. This will be a distinct part of the protocol from the repository abstraction, which is only used for public content.

Better GIF and Video support: the first step is improving embeds from external platforms (like Tenor for GIFs, and YouTube for video). Both the post-creation flow and embed-view experience will be improved.

Feed Interaction Metrics: feed services currently have no feedback on how users are interacting with the content that they curate. There is no way for users to tell specific feeds that they want to see more or less of certain kinds of content, or whether they have already seen content. We are adding a new endpoint for clients to submit behavior metrics to feed generators as a feedback mechanism. This feedback will be most useful for personalized feeds, and less useful for topic or community-oriented feeds. It also raises privacy and efficiency concerns, so sending of this metadata will both be controlled by clients (optional), and will require feed generator opt-in in the feed declaration record.

Topic/Community Feeds: one of the more common uses for feed generators is to categorize content by topic or community. These feeds are not personalized (they look the same to all users), are not particularly "algorithmic" (posts are either in the feed or not), and often have relatively clear inclusion criteria (though they may be additionally curated or filtered). We are exploring ways to make it easier to create, curate, and explore this type of feed.

User/Labeler Messaging: currently, independent moderators have no private mechanism to communicate with accounts which have reported content, or account which moderation actions have been taken against. All reports, including appeals, are uni-directional, and accounts have no record of the reports they have submitted. While Bluesky can send notification emails to accounts hosted on our own PDS instance, this does not work cross-provider with self-hosted PDS instances or independent labelers.

Protocol Stability Milestone

A lot of progress has been made in recent months on the parts of the protocol relevant to large-scale public conversation. The core concepts of autonomous identity (DIDs and handles), self-certifying data (repositories), content curation (feed generators), and stackable moderation (labelers) have now all been demonstrated on the live network.

While we will continue to make progress on additional objectives (see below), we feel we are approaching a milestone in development and stability of these components of the protocol. There are a few smaller tasks to resolve towards this milestone.

Takedowns: we have a written proposal for how content and account takedowns will work across different pieces of infrastructure in the network. Takedowns are a stronger intervention that complement the labeling system. Bluesky already has mechanisms to enact takedowns on our own infrastructure when needed, but there are some details of how inter-provider takedown requests are communicated.

Remaining Written Specifications: a few parts of the protocol have not been written up in the specifications at atproto.com.

Guidance on Building Apps and Integrations: while we hope the protocol will be adopted and built upon in unexpected ways, it would be helpful to have some basic pointers and advice on creating new applications and integrations. These will probably be informal tutorials and example code to start.

Account and Identity Firehose Events: while account and identity state are authoritatively managed across the DID, DNS, and PDS systems, it is efficient and helpful for changes to this state to be broadcast over the repository event stream ("firehose"). The semantics and behavior of the existing #identity event type will be updated and clarified, and an additional #account event type will be added to communicate PDS account deletion and takedown state to downstream services (Relay, and on to AppView, feed generator, labelers, etc). Downstream services might still need to resolve state from an authoritative source after being notified on the firehose.

Private Account Data Iteration: the app.bsky Lexicons currently include a preferences API, as well as some additional private state like mutes. The design of the current API is somewhat error-prone, difficult for independent developers to extend, and has unclear expectations around providing access to service providers (like independent AppViews). We are planning to iterate on this API, though it might not end up part of the near-term protocol milestone.

Protocol Tech Debt: there are a few other small technical issues to resolve or clean up; these are tracked in this GitHub discussion

On the Horizon

There are a few other pieces of protocol work which we are starting to plan out, but which are not currently scheduled to complete in 2024. It is very possible that priorities and schedules will be shuffled, but we mostly want to call these out as things we do want to complete, but will take a bit more time.

Protocol-Native DMs: as mentioned above, we want to have a "proper" DM solution as part of atproto, which is decentralized, E2EE, and follows modern security best practices.

Limited-Audience (Non-Public) Content: to start, we have prioritized the large-scale public conversation use cases in our protocol design, centered around the public data repository concept. While we support using the right tool for the job, and atproto is not trying to encompass every possible social modality, there are many situations and use-cases where having limited-audience content in the same overall application would be helpful. We intend to build a mechanism for group-private content sharing. It will likely be distinct from public data repositories and the Relay/firehose mechanism, but retain other parts of the protocol stack.

Firehose Bandwidth Efficiency: as the network grows, and the volume and rate of repository commits increases, the cost of subscribing to the entire Relay firehose increases. There are a number of ways to significantly improve bandwidth requirements: removing MST metadata for most use-cases; filtering by record types or subsets of accounts; batch compression; etc.

Record Versioning (Post Editing): atproto already supports updating records in repositories: one example is updating bsky profile records. And preparations were made early in the protocol design to support post editing while avoiding misleading edits. Ideally, it would also be possible to (optionally) keep old versions of records around in the repository, and allow referencing and accessing multiple versions of the same record.

PLC Transparency Log: we are exploring technical and organizational mechanisms to further de-centralize the DID PLC directory service. The most promising next step looks to be publishing a transparency log of all directory operations. This will make it easier for other organizations to audit the behavior of the directory and maintain verifiable replicas. The recent "tiling" transparency log design used for https://sunlight.dev/ (described here) is particularly promising. Compatibility with RFC 6962 (Certificate Transparency) could allow future integration with an existing ecosystem of witnesses and auditors.

Identity Key Self-Management UX: the DID PLC system has a concept of "rotation keys" to control the identity itself (in the form of the DID document). We would like to make it possible for users to optionally register additional keys on their personal devices, password managers, or hardware security keys. If done right, this should improve the resilience of the system and reduce some of the burden of responsibility on PDS operators. While this is technically possible today, it will require careful product design and security review to make this a safe and widely-adopted option.

Standards Body Timeline

As described in our 2023 Protocol Roadmap, we hope to bring atproto to an existing standards body to solidify governance and interoperability of the lower levels of the protocol. We had planned to start the formal process this summer, but as we talked to more people experienced with this process, we realized that we should wait until the design of the protocol has been explored by more developers. It would be ideal to have a couple organizations with atproto experience collaborate on the standards process together. If you are interested in being part of the atproto standards process, leave a message in the discussion thread for this post, or email protocol@blueskyweb.xyz.

While there has been a flowering of many projects built around the app.bsky microblogging application, there have been very few additional Lexicons and applications built from scratch. Some of this stemmed from restrictions on data schemas and proxying behavior on the Bluesky-hosted PDS instances, only relaxed just recently. We hope that new apps and Lexicons will exercise the full capabilities and corner-cases of the protocol.

We will continue to participate in adjacent standards efforts to make connections and get experience. Bluesky staff will attend IETF 120 in July, and are always happy to discuss responsible DNS integrations, OAuth, and HTTP API best practices.

2024-04-23T00:00:00.000Z

Meet the second batch of AT Protocol Grant Recipients

April 23, 2024 · 3 min read

In March, we announced the AT Protocol Grant program, aimed at fostering the growth and sustainability of the atproto developer ecosystem, as well as the first three recipients.

We’re excited to share the second batch of grant recipients with you today!

In this batch, we distributed $4,800 total in grants. There is $2,200 remaining in the initial allocation of $10,000. Congratulations to all of the recipients so far, and thank you to everyone who has applied — we're very lucky to have such a great developer ecosystem.

Update: As of 2025 we are not currently accepting further grant applications.

Blacksky Algorithms — @rudyfraser.com ($1000)

Blacksky is a suite of services on AT Protocol that all serve to amplify, protect, and provide moderation services for the network’s Black users. It also doubles as a working atproto implementation in Rust. AT Protocol creates the opportunity to build community in public (e.g. custom feeds, PDS-based handles) while retaining enough agency to build protective measures (e.g. blocking viewers of a feed, third-party labeling) for the unique issues that community may face such as anti-blackness. Rudy has worked on Blacksky for the last 10 months and has built a custom firehose subscriber, feedgen, and has almost completed a PDS implementation from scratch.

SkyBridge — @videah.net ($800)

SkyBridge is a in-progress proxy web server that translates Mastodon API calls into appropriate Bluesky ones, with the main goal of making already existing Mastodon tools/apps/clients such as Ivory compatible with Bluesky. The project is currently going through a significant rewrite from Dart into Rust.

TOKIMEKI — @holybea.social ($500)

TOKIMEKI is a third-party web client for Bluesky. It is one of the most popular 3rd party clients in Japan. Development and release began in March 2023. It supports multi-column and multi-accounts like TweetDeck, and features lightweight behavior, a variety of themes, and unique features such as bookmarks.

Bluesky Case Bots — Free Law Project ($500)

Free Law Project is a small non-profit dedicated to making the legal sector more fair and competitive. We run a number of bots on Blue Sky that post whenever there are updates in important American legal cases, such as the Trump indictments, cases related to AI, and much more.

Morpho — Orual ($500)

Morpho is a native Android app for Bluesky and the AT Protocol, written in Kotlin. The primary goals are to have improved performance and accessibility relative to the official app, and to accommodate de-Googled Android devices and security/privacy-minded users while retaining a full set of features.

hexPDS — nova ($500)

hexPDS is intended to be a production-grade PDS, written in Elixir/Rust. So far, some DID:PLC related operations have been completed, and MST logic and block storage are in progress. The next milestone is loading a repo’s .CAR file and parsing the records.

ATrium — @sugyan.com ($500)

ATrium is a collection of Rust libraries for implementing AT Protocol applications. It generates code from Lexicon schema and provides type definitions, etc. for use with Rust to handle XRPC requests.

SkeetStats — @ameliamnesia.xyz ($500)

SkeetStats is currently running and used actively by 400+ users. It tracks account stats for opted-in users and displays them in a variety of charts and graphs, and has a bot that allows users to opt in/out.

Thank you to everyone who has applied for a grant so far!

2024-03-15T00:00:00.000Z

Bluesky's Moderation Architecture

March 15, 2024 · 12 min read

Moderation is a crucial aspect of any social network. However, traditional moderation systems often lack transparency and user control, leaving communities vulnerable to sudden policy changes and potential mismanagement. To build a better social media ecosystem, it is necessary to try new approaches.

Today, we’re releasing an open labeling system on Bluesky. “Labeling” is a key part of moderation; it is a system for marking content that may need to be hidden, blurred, taken down, or annotated in applications. Labeling is how a lot of centralized moderation works under the hood, but nobody has ever opened it up for anyone to contribute. By building an open source labeling system, our goal is to empower developers, organizations, and users to actively participate in shaping the future of moderation.

In this post, we’ll dive into the details on how labeling and moderation works in the AT Protocol.

An open network of services

The AT Protocol is an open network of services that anyone can provide, essentially opening up the backend architecture of a large-scale social network. The core services form a pipeline where data flows from where it’s hosted, through a data firehose, and out to the various application indexes.

Data flows from independent account hosts into a firehose and then to applications.

Data flows from independent account hosts into a firehose and then to applications.

This event-driven architecture is similar to other high-scale systems, where you might traditionally use tools like Kafka for your data firehose. However, our open system allows anyone to run a piece of the backend. This means that there can be many hosts, firehoses, and indexes, all operated by different entities and exchanging data with each other.

Account hosts will sync with many firehoses.

Account hosts will sync with many firehoses.

Why would you want to run one of these services?

You’d run a PDS (Personal Data Server) if you want to self-host your data and keys to get increased control and privacy.
You’d run a Relay if you want a full copy of the network, or to crawl subsets of the network for targeted applications or services.
You’d run an AppView if you want to build custom applications with tailored views and experiences, such as a custom view for microblogging or for photos.

So what if you want to run your own moderation?

Decentralized moderation

On traditional social media platforms, moderation is often tightly coupled with other aspects of the system, such as hosting, algorithms, and the user interface. This tight coupling reduces the resilience of social networks as businesses change ownership or as policies shift due to financial or political pressures, leaving users with little choice but to accept the changes or stop using the service.

Decentralized moderation provides a safeguard against these risks. It relies on three principles:

Separation of roles. Moderation services operate separately from other services – particularly hosting and identity – to limit the potential for overreach.
Distributed operation. Multiple organizations providing moderation services reduces the risk of a single entity failing to serve user interests.
Interoperation. Users can choose between their preferred clients and associated moderation services without losing access to their communities.

In the AT Protocol, the PDS stores and manages user data, but it isn’t designed to handle moderation directly. A PDS could remove or filter content, but we chose not to rely on this for two main reasons. First, users can easily switch between PDS providers thanks to the account-migration feature. This means any takedowns performed by a PDS might only have a short-term effect, as users could move their data to another provider. Second, data hosting services aren't always the best equipped to deal with the challenges of content moderation, and those with local expertise and community building skills who want to participate in moderation may lack the technical capacity to run a server.

This is different from ActivityPub servers (Mastodon), which manage both data hosting and moderation as a bundled service, and do not make it as easy to switch servers as the AT Protocol does. By separating data storage from moderation, we let each service focus on what it does best.

Where moderation is applied

Moderation is done by a dedicated service called the Labeler (or “Labeling service”).

Labelers produce “labels” which are associated with specific pieces of user-generated content, such as individual posts, accounts, lists, or feeds. These labels make an assertion about the content, such as whether it contains sensitive material, is unpleasant, or is misleading.

These labels get synced to the AppViews where they can be attached to responses at the client’s request.

Labels are synced into AppViews where they can be attached to responses.

Labels are synced into AppViews where they can be attached to responses.

The clients read those labels to decide what to hide, blur, or drop. Since the clients choose their labelers and how to interpret the labels, they can decide which moderation systems to support. The chosen labels do not have to be broadcast, except to the AppView and PDS which fulfill the requests. A user subscribing to a labeler is not public, though the PDS and AppView can privately infer which users are subscribed to which services.

In the Bluesky app, we hardcode our in-house moderation to provide a strong foundation that upholds our community guidelines. We will continue to uphold our existing policies in the Bluesky app, even as this new architecture is made available. With the introduction of labelers, users will be able to subscribe to additional moderation services on top of the existing foundation of our in-house moderation.

The Bluesky application hardcodes its labeling and then stacks community labelers on top.

The Bluesky application hardcodes its labeling and then stacks community labelers on top.

For the best user experience, we suggest that clients in the AT Protocol ecosystem follow this pattern: have at least one built-in moderation service, and allow additional user-chosen mod services to be layered in on top.

The Bluesky app is a space that we create and maintain, and we want to provide a positive environment for our users, so our moderation service is built-in. On top of that, the additional services that users can subscribe to creates a lot of options within the app. However, if users disagree with Bluesky’s application-level moderation, they can choose to use another client on the network with its own moderation system. There are additional nuances to infrastructure-level moderation, which we will discuss below, but most content moderation happens at the application level.

How are labels defined?

A limited core set of labels are defined at the protocol level. These labels handle generic cases (“Content Warning”) and common adult content cases (“Pornography,” “Violence”).

A label can cover the content of a post with a warning.

A label can cover the content of a post with a warning.

Labelers may additionally define their own custom labels. These definitions are relatively straightforward; they give the label a localized name and description, and define the effects they can have.

interface LabelDefinition {
  identifier: string
  severity: 'inform' | 'alert'
  blurs: 'content' | 'media' | 'none'
  defaultSetting: 'hide' | 'warn' | 'ignore'
  adultContent: boolean
  locales: Record<string, LabelStrings>
}

interface LabelStrings {
  name: string
  description: string  
}

Using these definitions, it’s possible to create labels which are informational (“Satire”), topical (“Politics”), curational (“Dislike”), or moderational (“Rude”).

Users can then tune how the application handles these labels to get the outcomes they want.

Users configure whether they want to use each label.

Users configure whether they want to use each label.

Learn more about label definitions in the API Docs on Labelers and Moderation.

Running a labeler

We recently open-sourced Ozone, our powerful open-source Labeler service that we use in-house to moderate Bluesky. This is a significant step forward in transparency and community involvement, as we're sharing the same professional-grade tooling that our own moderation team relies on daily.

Ozone is designed for traditional moderation, where a team of moderators receives reports and takes action on them, but you're free to apply other models with custom software. We recommend anyone interested in running a labeler try out Ozone, as it simplifies the process by helping labelers set up their service account, field reports, and publish labels from the same web interface used by Bluesky's own moderation team, ensuring that all the necessary technical requirements are met. Detailed instructions for setting up and operating Ozone can be found in the readme.

Beyond Ozone’s interface, you can also explore alternative ways of labeling content. A few examples we've considered:

Community-driven voting systems (great for curation)
Network analysis (e.g., for detecting botnets)
AI models

If you want to create your own labeler, you simply need to build a web service that implements two endpoints to serve its labels to the wider network:

com.atproto.label.subscribeLabels : a realtime subscription of all labels
com.atproto.label.queryLabels : for looking up labels you've published on user-generated content

Reporting content is an important part of moderation, and reports should be sent privately to whoever is able to act on them. The Labeling service's API includes a specific endpoint designed for this use case. To receive user reports, a labeler can implement:

com.atproto.report.createReport : files a report from a particular user

When a user submits a report, they choose which of their active Labelers to report to. This gives users the ability to decide who should be informed of an issue. Reports serve as an additional signal for labelers, and they can handle them in a manner that best suits their needs, whether through human review or automated resolution. Appeals from users are essentially another type of report that provides feedback to Labelers. It's important to note that a Labeler is not required to accept reports.

In addition to the technical implementation, your labeler should also have a dedicated Bluesky account associated with it. This account serves as your labeler's public presence within the Bluesky app, allowing you to share information about the types of labels you plan to publish and how users should interpret them. By publishing an app.bsky.labeler.service record, you effectively "convert" your account into a Bluesky labeler, enabling users to discover and subscribe to your labeling service.

More details about labels and labelers can be found in the atproto specs.

Infrastructure moderation

Labeling is the basic building block of composable moderation, but there are other aspects involved. In the AT Protocol network, various services, such as the PDS, Relay, and AppView, have ultimate discretion over what content they carry, though it's not the most straightforward avenue for content moderation. Services that are closer to users, such as the client and labelers, are designed to be more actively involved in community and content moderation. These service providers have a better understanding of the specific community norms and social dynamics within their user base. By handling content moderation at this level, clients and labelers can make more informed decisions that align with the expectations and values of their communities.

Infrastructure providers such as Relays play a different role in the network, and are designed to be a common service provider that serves many kinds of applications. Relays perform simple data aggregation, and as the network grows, may eventually come to serve a wide range of social apps, each with their own unique communities and social norms. Consequently, Relays focus on combating network abuse and mitigating infrastructure-level harms, rather than making granular content moderation decisions.

An example of harm handled at the infrastructure layer is content that is illegal to host, such as child sexual abuse material (CSAM). Service providers should actively detect and remove content that cannot be hosted in the jurisdictions in which they operate. Bluesky already actively monitors its infrastructure for illegal content, and we're working on systems to advise other services (like PDS hosts) about issues we find.

Labels drive moderation in the client. The Relay and Appview apply infrastructure moderation.

Labels drive moderation in the client. The Relay and Appview apply infrastructure moderation.

This separation between backend infrastructure and application concerns is similar to how the web itself works. The PDSs function like personal websites or blogs on the web, which are hosted by various hosting providers. Just as individuals can choose their hosting provider and move their website if needed, users on the AT Protocol can select their PDS and migrate their data if they wish to change providers. Multiple companies can then run Relays and AppViews over PDSs, which are similar to content delivery networks and search engines, that serve as the backbone infrastructure to aggregate and index information. To provide a unified experience to the end user, application and labeling systems then provide a robust, opinionated approach to content moderation, the way individual websites and applications set their own community guidelines.

In summary

Bluesky's open labeling system is a significant step towards a more transparent, user-controlled, and resilient way to do moderation. We’ve opened up the way centralized moderation works under the hood for anyone to contribute, and provided a seamless integration into the Bluesky app for independent moderators. In addition, by open sourcing our internal moderation tools, we're allowing anyone to use, run, and contribute to improving them.

This open labeling system is a fundamentally new approach that has never been tried in the realm of social media moderation. In an industry where innovation has been stagnant for far too long, we are experimenting with new solutions to address the complex challenges faced by online communities. Exploring new approaches is essential if we want to make meaningful progress in tackling the problems that plague social platforms today, and we have designed and implemented what we believe to be a powerful and flexible approach.

Our goal has always been to build towards a more transparent and resilient social media ecosystem that can better represent an open society. We encourage developers, users, and organizations to get involved in shaping the future of moderation on Bluesky by running their own labeling services, contributing to the open-source Ozone project, or providing feedback on this system of stackable moderation. Together, we can design a more user-controlled social media ecosystem that empowers individuals and communities to create better online spaces.

Additional reading:

2024-03-06T00:00:00.000Z

Announcing AT Protocol Grants

March 6, 2024 · 3 min read

We’re excited to announce the AT Protocol Grants program, aimed at fostering the growth and sustainability of the atproto developer ecosystem.

In the first iteration of this program, we’ll distribute a total of $10,000 in microgrants of $500 to $2,000 per project based on factors like cost, usage, and more.

Update: As of 2025 we are not currently accepting further grant applications.

Program Details

Over the last few months, we’ve seen independent developers create projects ranging from browser extensions and clients to PDS implementations and atproto tooling. Many of them have become widely adopted in the Bluesky community, too! As we continue on our path toward sustainability, we’re launching this grants program to encourage and support developers building on the AT Protocol.

We will be distributing a total of $10,000, and will publicly announce all grant recipients. We have already distributed $3,000, and the recipients of those grants are detailed below. This is a rolling application, though we will announce when all $10,000 of the initial allocated amount has been distributed.

We’ll evaluate each application based on the submitted project plan and the potential impact. The project should be useful to some user group, whether its fellow developers or Bluesky users. To be eligible for a grant, your project must be open source. We pay out grants via public GitHub Sponsorships.

In addition, we’ve also partnered with Amazon Web Services (AWS) to offer $5,000 in AWS Activate¹ credits to atproto developers as well. These credits are applied to your AWS bill to help cover costs from cloud services, including machine learning, compute, databases, storage, containers, dev tools, and more. Simply check a box in your atproto grant application if you’re interested in receiving these credits as well.

Initial AT Protocol Grant recipients

Ahead of Bluesky’s public launch in February, Bluesky PBC extended grants to three developers as a pilot program. We awarded $1,000 each to the following projects and developers:

AT Protocol Python SDK — Ilya Siamionau

AT Protocol Dart SDK — Shinya Kato

Listed on the homepage of the Bluesky API documentation site, these two SDKs have quickly become popular packages with atproto developers. We’re also especially impressed by their own documentation sites!

SkyFeed — redsolver

SkyFeed has helped bring Bluesky’s vision for custom feeds to life — now, there are more than 40,000 custom feeds that users can subscribe to, and a vast majority of them are built with SkyFeed.

Contact

We’re excited to continue to find ways to help developers make their projects built on atproto sustainable.

Please feel free to leave questions or comments on the GitHub discussion for this announcement here.

AWS Activate Credits are subject to the program's terms and conditions. ↩

2024-02-27T00:00:00.000Z

Skygaze Hackathon

February 27, 2024 · 3 min read

Cooper Edmunds

Skygaze

This is a guest blog post by Skygaze, creators of the For You feed. You can check out For You, the custom feed that learns what you like, at https://skygaze.io/feed.

Last Sunday, 70 engineers came together at the YC office in San Francisco for the first Bluesky AI Hackathon. The teams took full advantage of Bluesky’s complete data openness to build 17 pretty spectacular projects, many of which genuinely surprised us. My favorites are below. Thank you to Replicate for donating $50 of LLM and image model credits to each participant and sponsoring a $1000 prize for the winning team!

The 17 projects covered a wide range of categories: location-based feeds, feeds with dynamic author lists, collaborative image generation, text moderation, NSFW image labeling, creator tools, and more. The top three stood out for their creativity, practicality, and completeness (despite having only ~6 hours to build), and we’ll share a bit about them below.

Convo Detox

@paritoshk.bsky.social and team came in first place with Convo Detox–a bot that predicts when a thread is at high risk of becoming toxic and interjects to diffuse tension. We were particularly impressed with the team’s use of a self-hosted model trained on Reddit data specifically to predict conversations that are likely to get heated. As a proof of concept they deployed it as a bot that can be summoned via mention, but in the near future this would make for a great third party moderation label.

SF IRL

This is a bot that detects and promotes tech events happening in SF. In addition to flagging events, it keeps track of the accounts posting about SF tech and serves a feed with all of the posts from those accounts. We think simple approaches to dynamic author lists is a very interesting 90/10 on customized feeds and (if designed reasonably) could be both easier for the feed maintainer and higher quality for the feed consumers.

NSFW Image Detection

On Bluesky, users can set whether they want adult content to show up in their app. Beyond this level of customization, whether or not an image is labeled as NSFW can be customized as well — people have a wide variety of preferences. This team trained a model to classify images into a large number of NSFW categories, which would theoretically fit nicely into the 3rd party moderation labeler interface. It’s neat that their choice of architecture extends naturally to processing text in tandem with images.

Other Projects

Other noteworthy projects included translation bots, deep fake detectors, a friend matchmaker, and an image generator tool that allowed people to build image generated prompts together in reply threads. It was genuinely incredibly impressive and exciting to see what folks with no previous AT Proto experience were able to put together (and often deploy !!!) in only a few hours.

Additional Resources

We prepared some starter templates for the hackathon, and want to share them below for anyone who couldn’t attend the event in person!

And if you’re interested in hosting your own bluesky hackathon but don’t know where to start, please feel free to copy all of our invite copy, starter repos, and datasets.

2024-02-22T00:00:00.000Z

Early Access Federation for Self-Hosters

February 22, 2024 · 5 min read

For a high-level introduction to data federation, as well as a comparison to other federated social protocols, check out the Bluesky blog.

Update May 2024: we have removed the Discord registration requirement, and PDS instances can now connect to the network directly. You are still welcome to join the PDS Admins Discord for community support.

Today, we’re releasing an early-access version of federation intended for self-hosters and developers.

The atproto network is built upon a layer of self-authenticating data. This foundation is critical to guaranteeing the network’s long term integrity. But the protocol’s openness ultimately flows from a diverse set of hosts broadcasting this data across the network.

Up until now, every user on the network used a Bluesky PDS (Personal Data Server) to host their data. We’ve already federated our own data hosting on the backend, both to help operationally scale our service, and to prove out the technical underpinnings of an openly federated network. But today we’re opening up federation for anyone else to begin connecting with the network.

The PDS, in many ways, fulfills a simple role: it hosts your account and gives you the ability to log in, it holds the signing keys for your data, and it keeps your data online and highly available. Unlike a Mastodon instance, it does not need to function as a full-fledged social media service. We wanted to make atproto data hosting—like web hosting—into a fairly simple commoditized service. The PDS’s role has been limited in scope to achieve this goal. By limiting the scope, the role of a PDS in maintaining an open and fluid data network has become all the more powerful.

We’ve packaged the PDS into a friendly distribution with an installer script that handles much of the complexity of setting up a PDS. After you set up your PDS and join the PDS Admins Discord to submit a request for your PDS to be added to the network, your PDS’s data will get routed to other services in the network (like feed generators and the Bluesky Appview) through our Relay, the firehose provider. Check out our Federation Overview for more information on how data flows through the atproto network.

Early Access Limitations

As with many open systems, Relays will never be totally unconstrained in terms of what data they’re willing to crawl and rebroadcast. To prevent network and resource abuse, it will be necessary to place rate limits on the PDS hosts that they consume data from. As trust and reputation is established with PDS hosts, those rate limits will increase. We’ll gain capacity to increase the baseline rate limits we have in place for new PDSs in the network as we build better tools for detecting and mitigating abuse..

For a smooth transition into a federated network, we’re starting with some lower limits. Specifically, each PDS will be able to host 10 accounts and limited to 1500 evts/hr and 10,000 evts/day. After those limits are surpassed, we’ll stop crawling the PDS until the rate limit period resets. This is intended to keep the network and firehose running smoothly for everyone in the ecosystem.

These are early days, and we have some big changes still planned for the PDS distribution (including the introduction of OAuth!). The software will be updating frequently and things may break. We will not be breaking things indiscriminately. However, in this early period, in order to avoid cruft in the protocol and PDS distribution, we are not making promises of backwards compatibility. We will be supporting a migration path for each release, but if you do not keep your PDS distribution up to date, it may break and render the app unusable until you do so.

Because the PDS distribution is not totally settled, we want to have a line of communication with PDS admins in the network, so we’re asking any developer that plans to run a PDS to join the PDS Admins Discord. You’ll need to provide the hostname of your PDS and a contact email in order to get your PDS added to the Relay’s allowlist. This Discord will serve as a channel where we can announce updates about the PDS distribution, relay policy, and federation more generally. It will also serve as a community where PDS admins can experiment, chat, and help each other debug issues.

Account Migration

A major promise of the AT Protocol is the ability to migrate accounts between PDS hosts. This is an important check against potential abuse, and further safeguards the fluid open layer of data hosting that underpins the network.

The PDS distribution that we’re releasing has all of the facilities required to migrate accounts between servers. We’re also opening routes on our PDS that will allow you to migrate your account off our server. However — we do not yet provide the capability to migrate back to the Bluesky PDS, so for the time being, this is a one way street. Be warned: these migrations involve possibly destructive identity operations. While we have some guardrails in place, it may still be possible for you to break your account and we will not be able to help you recover it. So although it’s technically possible, we do not recommend migrating your main account between servers yet. We especially recommend against doing so if you do not have familiarity with how DID PLC operations work.

In the coming months we will be hardening this feature and making it safer and easier to do, including creating an in-app flow for moving between servers.

Getting Started

To get started, join the PDS Administrators Discord, and check out the bluesky-social/pds repo on Github. The README will provide all necessary information on getting your PDS setup and running.

2023-12-04T00:00:00.000Z

Featured Community Project: Bridgy Fed

December 4, 2023 · 3 min read

Bridgy Fed is a bridge between decentralized social networks that currently supports the IndieWeb and the Fediverse, a portmanteau of “federated” and “universe” that refers to a collection of networks including Mastodon. It's a work-in-progress by Ryan Barrett (@snarfed.org), who has already added initial Bluesky support, and is planning on launching it publicly once Bluesky launches federation early next year.

Bridgy Fed is open source, and Ryan has a guide on how IDs and handles are translated between networks. He welcomes feedback!

Screenshot of Bridgy Fed

I'm a dad, San Francisco resident, and stereotypical Silicon Valley engineer who's always been interested in owning his presence online.

What is Bridgy Fed?

Bridgy Fed is a bridge between decentralized social networks. It currently supports the IndieWeb and the Fediverse, and I soon plan to add other protocols like Bluesky and Nostr.

It's fully bidirectional; from any supported network, you can follow anyone on any other network, see their posts, reply or like or repost them, and those interactions flow across to their network and vice versa. More details here.

Initial Bluesky support is complete! All interactions are working, in both directions. I'm looking forward to launching it publicly after Bluesky federation itself launches!

What inspired you to build Bridgy Fed?

The very first time I posted on Facebook, back over 20 years ago when it was just for college students, I immediately understood that I didn't control or own that space. I had no guarantees as to whether my profile and posts would stay there, who'd see them, etc. I started posting to my website/blog first, and only afterward copied those posts to social networks like Facebook.

I've been working on this stuff ever since, including tools like Granary and Bridgy classic and the IndieWeb community, historical decentralized social protocols like OpenSocial and OStatus, and most recently ActivityPub and Bluesky’s AT Protocol.

What tech stack is Bridgy Fed built on?

Bridgy Fed runs on Google's App Engine serverless platform. It's written in Python, uses libraries like Granary, and leverages standards like webmention and microformats2 in addition to ActivityPub and atproto. I'd eventually like to migrate it to asyncio, but otherwise its stack is serving it well.

What's in the future for Bridgy Fed?

I can't wait to launch Bluesky support! Nostr too. I'm also looking forward to extending the current IndieWeb support to any web site, using standard metadata like OGP and RSS and Atom feeds.

You can follow Ryan on Bluesky here, find the Bridgy Fed GitHub repo here, and keep an eye out for Bridgy Fed’s launch next year!

Note: Use an App Password when logging in to third-party tools for account security and read our disclaimer for third-party applications.

2023-11-06T00:00:00.000Z

Download and Parse Repository Exports

November 6, 2023 · 9 min read

One of the core principles of the AT Protocol is simple access to public data, including posts, multimedia blobs, and social graph metadata. A user's data is stored in a repository, which can be efficiently exported all together as a CAR file (.car). This post will describe how to export and parse a data repository.

A user's data repository consists of individual records, each of which can be accessed in JSON format via HTTP API endpoints.

The example code in this post is in the Go programming language, and uses the atproto SDK packages from indigo. You can find the full source code in our example cookbook GitHub repository.

This post is written for a developer audience. We plan on adding a feature for users to easily export their own data from within the app in the future.

Privacy Notice

While atproto data is public, you should take care to respect the rights, intents, and expectations of others. The following examples work for downloading any account's public data.

This goes beyond following copyright law, and includes respecting content deletions and block relationships. Images and other media content does not come with any reuse rights, unless explicitly noted by the account holder.

Download a Repository

On Bluesky's Main PDS Instance

You can easily construct a URL to download a repository on Bluesky's main PDS instance. In this case, the PDS host is https://bsky.social, the Lexicon endpoint is com.atproto.sync.getRepo, and the account DID is passed as a query parameter.

As a result, the download URL for the @atproto.com account is:

https://bsky.social/xrpc/com.atproto.sync.getRepo?did=did:plc:ewvi7nxzyoun6zhxrhs64oiz

Note that this endpoint intentionally does not require authentication: content in a user's repository is public (much like a public website), and anybody can download it from the web. Such content includes posts and likes, but does not include content like mutes and list subscriptions.

If you navigate to that URL, you'll download the repository for the @atproto.com account on Bluesky. But if you try to open that file, it won't make sense yet. We'll show you how to parse the data later in this post.

On Another Instance

In the more general case, we start with any "AT Identifier" (handle or DID). We need to find account's PDS instance. This involves first resolving the handle or DID to the account's DID document, then parsing out the #atproto_pds service entry.

The github.com/bluesky-social/indigo/atproto/identity package handles all of this for us already:

import (
    "fmt"
    "context"

    "github.com/bluesky-social/indigo/atproto/identity"
    "github.com/bluesky-social/indigo/atproto/syntax"
    "github.com/bluesky-social/indigo/xrpc"
)

func main() {
    run()
}

func run() error {
    ctx := context.Background()
    atid, err := syntax.ParseAtIdentifier("atproto.com")
    if err != nil {
        return err
    }

    dir := identity.DefaultDirectory()
    ident, err := dir.Lookup(ctx, *atid)
    if err != nil {
        return err
    }

    if ident.PDSEndpoint() == "" {
        return fmt.Errorf("no PDS endpoint for identity")
    }

    fmt.Println(ident.PDSEndpoint())
}

Once we know the PDS endpoint, we can create an atproto API client, call the getRepo endpoint, and write the results out to a local file on disk:

carPath := ident.DID.String() + ".car"

xrpcc := xrpc.Client{
    Host: ident.PDSEndpoint(),
}

repoBytes, err := comatproto.SyncGetRepo(ctx, &xrpcc, ident.DID.String(), "")
if err != nil {
    return err
}

err = os.WriteFile(carPath, repoBytes, 0666)
if err != nil {
    return err
}

The go-repo-export example from the cookbook repository implements this with the download-repo command:

> ./go-repo-export download-repo atproto.com
resolving identity: atproto.com
downloading from https://bsky.social to: did:plc:ewvi7nxzyoun6zhxrhs64oiz.car

Now you know that the @atproto.com account's PDS instance is at bsky.social, and we've downloaded the repository's CAR file.

Parse Records from CAR File as JSON

What's a CAR File?

CAR files are a standard file format from the IPLD ecosystem. They stand for "Content Addressable aRchives." They have a simple binary format, with a series of binary (CBOR) blocks concatenated together, not dissimilar to tar files or Git packfiles. They're well-suited for efficient data processing and archival storage, but they're not the most accessible to developers.

The Repository CAR File

The repository data structure is a key-value store, with the keys being a combination of a collection name (NSID) and "record key", separated by a slash (<collection>/<rkey>). The CAR file contains a reference (CID) pointing to a (signed) commit object, which then points to the top of the key-value tree. The commit object also has an atproto repo version number (at time of writing, currently 3), the account DID, and a revision string.

Let's load the repository tree structure out of the CAR file in to memory, and list all of the record paths (keys).

import (
    "encoding/json"
    "context"
    "fmt"
    "os"
    "path/filepath"

    "github.com/bluesky-social/indigo/repo"
)
func carList(carPath string) error {
    ctx := context.Background()
    fi, err := os.Open(carPath)
    if err != nil {
        return err
    }

    // read repository tree in to memory
    r, err := repo.ReadRepoFromCar(ctx, fi)
    if err != nil {
        return err
    }

    // extract DID from repo commit
    sc := r.SignedCommit()
    did, err := syntax.ParseDID(sc.Did)
    if err != nil {
        return err
    }
    topDir := did.String()

    // iterate over all of the records by key and CID
    err = r.ForEach(ctx, "", func(k string, v cid.Cid) error {
        fmt.Printf("%s\t%s\n", k, v.String())
        return nil
    })
    if err != nil {
        return err
    }
    return nil
}

Note that the ForEach iterator provides a record path string as a key, and a CID as the value, instead of the record data itself. If we want to get the record itself, we need to fetch the "block" (CBOR bytes) from the repository, using the CID reference.

Let's also convert the binary CBOR data in to a more accessible JSON format, and write out records to disk. The following code snippet could go in the ForEach function in the previous example:

// where to write out data on local disk
recPath := topDir + "/" + k
os.MkdirAll(filepath.Dir(recPath), os.ModePerm)
if err != nil {
    return err
}


// fetch the record CBOR and convert to a golang struct
_, rec, err := r.GetRecord(ctx, k)
if err != nil {
    return err
}

// serialize as JSON
recJson, err := json.MarshalIndent(rec, "", "  ")
if err != nil {
    return err
}

if err := os.WriteFile(recPath+".json", recJson, 0666); err != nil {
    return err
}

return nil

In the cookbook repository, the go-repo-export example implements these as list-records and unpack-record:

> ./go-repo-export list-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
=== did:plc:ewvi7nxzyoun6zhxrhs64oiz ===
key record_cid
app.bsky.actor.profile/self bafyreifbxwvk2ewuduowdjkkjgspiy5li2dzyycrnlbu27gn3hfgthez3u
app.bsky.feed.like/3jucagnrmn22x    bafyreieohq4ngetnrpse22mynxpinzfnaw6m5xcsjj3s4oiidjlnnfo76a
app.bsky.feed.like/3jucahkymkk2e    bafyreidqrmqvrnz52efgqfavvjdbwob3bc2g3vvgmhmexgx4xputjty754
app.bsky.feed.like/3jucaj3qgmk2h    bafyreig5c2atahtzr2vo4v64aovgqbv6qwivfwf3ex5gn2537wwmtnkm3e
[...]

> ./go-repo-export unpack-records did:plc:ewvi7nxzyoun6zhxrhs64oiz.car
writing output to: did:plc:ewvi7nxzyoun6zhxrhs64oiz
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.actor.profile/self.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucagnrmn22x.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucahkymkk2e.json
did:plc:ewvi7nxzyoun6zhxrhs64oiz/app.bsky.feed.like/3jucaj3qgmk2h.json
[...]

If you were downloading and working with CAR in a higher-stakes situation than just running a one-off repository export, you would probably want to confirm the commit signature using the account's signing public key (included in the resolved identity metadata). Signing keys can change over time, meaning the signatures in old repo exports will no longer validate. It may be a good idea to keep a copy of the identity metadata along side the repository for long-term storage.

Downloading Blobs

An account's repository contains all the current (not deleted) records. These records include likes, posts, follows, etc. and may refer to images and other media "blobs" by hash (CID), but the blobs themselves aren't stored directly in the repository. So if you want a full public account data export, you also need to fetch the blobs.

It is possible to parse through all the records in a repository and extract all the blob references (tip: they all have $type: blob). But PDS instances also implement a helpful com.atproto.sync.listBlobs endpoint, which returns all the CIDs (blob hashes) for a specific account (DID).

The com.atproto.sync.getBlob endpoint is used to download the original blob itself.

Neither of these PDS endpoints require authentication, though they may be rate-limited by operators to prevent resource exhaustion or excessive bandwidth costs.

Note that the first part of the blob download function is very similar to the CAR download: resolving identity to find the account's PDS:

func blobDownloadAll(raw string) error {
    ctx := context.Background()
    atid, err := syntax.ParseAtIdentifier(raw)
    if err != nil {
        return err
    }

    // resolve the DID document and PDS for this account
    dir := identity.DefaultDirectory()
    ident, err := dir.Lookup(ctx, *atid)
    if err != nil {
        return err
    }

    // create a new API client to connect to the account's PDS
    xrpcc := xrpc.Client{
        Host: ident.PDSEndpoint(),
    }
    if xrpcc.Host == "" {
        return fmt.Errorf("no PDS endpoint for identity")
    }

    topDir := ident.DID.String() + "/_blob"
    os.MkdirAll(topDir, os.ModePerm)

    // blob-specific part starts here!
    cursor := ""
    for {
        // loop over batches of CIDs
        resp, err := comatproto.SyncListBlobs(ctx, &xrpcc, cursor, ident.DID.String(), 500, "")
        if err != nil {
            return err
        }
        for _, cidStr := range resp.Cids {
            // if the file already exists, skip
            blobPath := topDir + "/" + cidStr
            if _, err := os.Stat(blobPath); err == nil {
                continue
            }

            // download the entire blob in to memory, then write to disk
            blobBytes, err := comatproto.SyncGetBlob(ctx, &xrpcc, cidStr, ident.DID.String())
            if err != nil {
                return err
            }
            if err := os.WriteFile(blobPath, blobBytes, 0666); err != nil {
                return err
            }
        }

        // a cursor in the result means there are more CIDs to enumerate
        if resp.Cursor != nil && *resp.Cursor != "" {
            cursor = *resp.Cursor
        } else {
            break
        }
    }
    return nil
}

In the cookbook repository, the go-repo-export example implements list-blobs and download-blobs commands:

> ./go-repo-export list-blobs atproto.com
bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii
bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y
bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm
bafkreiebtvblnu4jwu66y57kakido7uhiigenznxdlh6r6wiswblv5m4py
[...]

> ./go-repo-export download-blobs atproto.com
writing blobs to: did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreiacrjijybmsgnq3mca6fvhtvtc7jdtjflomoenrh4ph77kghzkiii  downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreib4xwiqhxbqidwwatoqj7mrx6mr7wlc5s6blicq5wq2qsq37ynx5y  downloaded
did:plc:ewvi7nxzyoun6zhxrhs64oiz/_blob/bafkreibdnsisdacjv3fswjic4dp7tju7mywfdlcrpleisefvzf44c3p7wm  downloaded
[...]

This will download blobs for a repository.

A more rigorous implementation should verify the blob CID (by hashing the downloaded bytes), at a minimum to detect corruption and errors.

New Features​

Breaking changes​

Non-breaking changes​

Deprecations​

Migration​

The @atproto/api package​

The @atproto/xrpc package​

To support the first labelers in our ecosystem and encourage more, we are launching a microgrants program for labeling services.​

Program Details​

Initial Labeling Grant Recipients​

Contact​

Footnotes​

Current Work​

Product Features​

Protocol Stability Milestone​

On the Horizon​

Standards Body Timeline​

An open network of services​

Decentralized moderation​

Where moderation is applied​

How are labels defined?​

Running a labeler​

Infrastructure moderation​

In summary​

Program Details​

Initial AT Protocol Grant recipients​

Contact​

Footnotes​

Convo Detox​

SF IRL​

NSFW Image Detection​

Other Projects​

Additional Resources​

Early Access Limitations​

Account Migration​

Getting Started​

Can you share a bit about yourself and your background?​

What is Bridgy Fed?​

What inspired you to build Bridgy Fed?​

What tech stack is Bridgy Fed built on?​

What's in the future for Bridgy Fed?​

Privacy Notice​

Download a Repository​

On Bluesky's Main PDS Instance​

On Another Instance​

Parse Records from CAR File as JSON​

What's a CAR File?​

The Repository CAR File​

Downloading Blobs​

New Features

Breaking changes

Non-breaking changes

Deprecations

Migration

The `@atproto/api` package

The `@atproto/xrpc` package

To support the first labelers in our ecosystem and encourage more, we are launching a microgrants program for labeling services.

Program Details

Initial Labeling Grant Recipients

Contact

Footnotes

Current Work

Product Features

Protocol Stability Milestone

On the Horizon

Standards Body Timeline

An open network of services

Decentralized moderation

Where moderation is applied

How are labels defined?

Running a labeler

Infrastructure moderation

In summary

Program Details

Initial AT Protocol Grant recipients

Contact

Footnotes

Convo Detox

SF IRL

NSFW Image Detection

Other Projects

Additional Resources

Early Access Limitations

Account Migration

Getting Started

Can you share a bit about yourself and your background?

What is Bridgy Fed?

What inspired you to build Bridgy Fed?

What tech stack is Bridgy Fed built on?

What's in the future for Bridgy Fed?

Privacy Notice

Download a Repository

On Bluesky's Main PDS Instance

On Another Instance

Parse Records from CAR File as JSON

What's a CAR File?

The Repository CAR File

Downloading Blobs