Observations with Alecks

hive

The beginning

For just under a year, Podping has been available and in use — through dedicated podcast hosting companies and also self-hosting individuals — to efficiently notify interested parties of updates to RSS feeds that define podcasts.

It was a bit rocky in the beginning, mostly because understanding the design simplicity offered by a decentralized message bus and defining a software interface to write to it efficiently are two different tasks. But it worked all along and we largely got through it without incident, thanks to the transparency and resiliency of the Hive blockchain.

Unfortunately, since it's on a blockchain, some of the initial experiments proving out the concept of the project at large are still around. One of the most important concepts of a project like this is people can depend on a defined schema, even if it's implemented in a schemaless manner (JSON).

Learning to communicate intent of the Podping project overall was helped by collaborating and putting out a stable release. The 1.0 release of podping-hivewriter, the current primary software we use to manage writing “podpings” to the Hive blockchain, came out several months after the original vision of the project had been laid down in a lowly Podcasting 2.0 developer roundtable.

Intent

While a stable release contributed to our understanding the stability of the system, we had only just begun to realize the potential we had beyond merely a simple podcast update system.

After all, the original scope of Podping was to help reduce unnecessary polling of RSS feeds when they had no change in content. We had already accomplished this by allowing anyone to announce the following data on the blockchain within a given a “podping” event:

{
    "version": "0.3",
    "num_urls": 1,
    "reason": "feed_update",
    "urls": ["https://example.com/super-great-pod.xml"],
}

Simple! This tells a user that an update to the RSS feed that defines a fictitious “super great pod” podcast occurred.

This alone is already more valuable than it would first appear. Not only can one reasonably assume they can stop polling feeds that get submitted via Podping; they also automatically obtain access to an entire history of podcasts. Without compromise.

That would be enough for most people.

Evolution

But a few things had happened since the Podping project had been started.

While the Podcasting 2.0 community had largely agreed that the movement was for more than podcasts, no one really understood how to make that work.

Eventually, I had postulated that the easiest way to do this was to tell the consumer and shortly after wrote the podcast:medium specification which was eventually finalized in The Podcast Namespace.

In parallel, the Podcasting 2.0 community had been throwing around ways to formalize live streams within RSS feeds.

Coincidentally, both had been finalized in Phase 4. It just so happens that the type of the live stream depends on the medium.

Our intent had evolved.

Since the intention of having a medium is to tell the consumer, and it's to be expected that not all consuming applications will care about all types of mediums, we had decided it was important enough to include in the basic event types of Podping.

We had also realized decentralized live feeds weren't of much use to anyone without the ability to instantly notify consumers when a live feed actually starts with an indication of priority beyond normal feed updates.

podping-hivewriter version 1.1

Given the above information, in addition of some new context about how the particularities of how Hive functions internally, we made the decision as a team to improve upon the Podping events by including the above metadata directly in the event names (known as operation IDs in Hive).

These event names are changing from podping to the format of pp_{medium}_{reason}, prefixed with pp_ to denote podping.

Where {medium} can be one of, as of this time of writing, the following:

  • podcast
  • music
  • video
  • film
  • audiobook
  • newsletter
  • blog

And {reason} can be one of, as of this time of writing, the following:

  • update
  • live

Importantly, the podping-hivewriter project will default to the podping medium and update reason to remain compatible with the current scope of users. Official documentation for the above reasons and their meaning will be available on the podping-hivewriter Github project by the time 1.1 stable is released.

One may replace the pp_ prefix with pplt_ for “podping livetest,” which is what we use during development and continuous integration of the podping-hivewriter project. You can use these “livetest” events to test these changes as a consumer before anyone officially adopts them.

Definition

In addition to the event name changes above, we also decided to change the on-chain Podping format to continue to communicate intent.

In short, the new schema will use version “1.0” to help compatibility and is defined as follows:

{
    "version": "1.0",
    "medium": "<ex: podcast>",
    "reason": "<ex: update>",
    "iris": ["list", "of", "iris"],
}

Most noticeably, urls is being changed to iris. This indicates given RSS feeds can be identifiers besides HTTP URLs — perhaps IPFS CIDs or magnet links, for example — and the character set is “internationalized,” supporting any UTF-8 character. Note that this has been assumed by podping-hivewriter since the 1.0 initial release and this is merely a name change.

The addition of the medium and reason slugs to this schema is primarily for portability of data and flexibility of filtering. It is redundant to have it both in the schema and the event name, and that is intentional.

Given the above additions, it's safe to say the following definition of Podping holds true and is identified by the intent of the given data:

Podping is a mechanism of using decentralized communication to relay notification of updates of RSS feeds that use The Podcast Namespace. It does so by supplying minimum relevant metadata to consumers to be able to make efficient and actionable decisions, allowing them to decide what to do with given RSS feeds without parsing them ahead of time.

Looking forward

We have some more ideas to expand upon the Podping update reasons listed above. However, many of these will require new Podcast Namespace features as outlined here by Brian of London.

For example, we want to be able to allow hosts to use Podping as a way to tell consumers when a feed is changing hosts. In order to prevent abuse, we want to be able to tell consumers to expect this type of event to come from a known Hive account set within the RSS feed.

After all, feeds already get polled to oblivion. Anyone announcing a feed update via podping is relatively harmless, even if it's not their feed. A host change, on the other hand, is another story altogether. We are trying to be cognizant of that for new features.

The <podcast:podping> proposal also allows consumers to actually know when a feed is set to update via Podping, as opposed to guessing, helping to remove ambiguity.

Conclusion

In the last year we've turned the Podping project around from an experiment that happens to work well to a full-fledged project with defined scope.

Podping doesn't just send URLs around to applications in hope that they know what to do with them, nor to funnel a user into clicking on something. It provides context as to why they were sent and how relevant the changes are to applications.

Because we don't need new ways to send people URLs. People have been trying that for the last 16 years.

Discuss...

#podcasting20 #rss #podping #hive #blockchain #podcasts #music #films #audiobooks #videos

RSS Delivery

No one likes polling.

Ever since the dawn of the open RSS ecosystem, applications poll RSS feeds. That's how it works, right? They check a feed for updates on some kind of schedule. Maybe, if they feel like tackling an unsolvable problem out of sheer curiosity, they add some learning algorithm to figure out when the feed actually changes to save resources.

Polling, most of which is done in regular intervals, is the most reliable method of knowing when a feed is updated within a decentralized ecosystem no one controls. Even today this is how most RSS feed aggregation runs because it's the only way to avoid missing anything. This isn't a problem in its own right, but it's inefficient.

It becomes an issue when:

  1. A feed gets extremely popular, which can easily cause unwanted bandwidth costs or even take down smaller operations.
  2. A single host manages a significant amount of RSS feeds. They add up!
  3. Time-sensitive content is released. Say one wants to publish a breaking news story or a live steam. Polling will catch it late, and increased polling frequencies are frowned upon (often rate-limited) due to the above issues.

Until now, users and services have often relied on other options for updates, particularly for time-sensitive content:

  1. Outside systems like social media have always offered a means to notify users of updates, but it's not a great user experience for anyone. This helps #3, but still requires polling. Often it causes more polling with the user refreshing rapidly.
  2. Centralized systems, usually implemented by podcast services as back-ends for client applications — especially on mobile — offer a way to hide polling from users. It helps #1 & #2, and works, but it's still polling and hiding this from users can cause confusion.
  3. Otherwise entirely closed systems with their own notifications (YouTube, Twitch ... the list goes on). These usually solve the issues for users, but it's no longer open! I'm not going to discuss the philosophical implications of closed systems, but they still get real time updates wrong, because real time updates are hard.

What about WebSub?

Yes, WebSub helps with some of the above issues. It has two main problems:

  1. WebSub attempts to move the problem from polling feeds to maintaining WebSub subscriptions. It's great in theory. In practice it's just another problem to solve and it's ultimately just another unreliable link in the chain. Applications frequently fall back to polling.
  2. WebSub requires a server to obtain updates. This isn't an issue for podcast applications that run their own services, but what about standalone applications as intended by the spirit of open RSS? What about the last mile? Standalone applications still end up polling, and especially get left out of time-sensitive content.

The Last Mile

The last mile is a relatively new problem within the scope of the open RSS ecosystem. Every application used to poll for content. Live streams, outside of obscure internet radio, didn't even exist or weren't widespread enough to be relevant.

Fast forward to today, most podcast applications run their own services and/or depend on large aggregators to do the heavy lifting. Many trusted Apple for this task, until they couldn't, and the Podcast Index only stepped up as an alternative because it's still such a difficult problem. Even more, the aggregators still have to be polled! They don't have the resources to become a notification system en masse.

Podping.cloud

Enter Podping.cloud:

Podping is a blockchain based global notification system for podcasting. Feed urls are written by the publisher to the blockchain within seconds of a new episode being published. Anyone can monitor for those updates and only pull a copy of that feed when it shows up on the chain.

Podping.cloud architecture

What does that mean? It consists of two parts:

  1. An on-boarding mechanism for hosts to migrate to with ease — something similar to WebSub but simpler to manage.
  2. A standardization of data interchange mechanisms — the “podping” namespace — to broadcast podcast updates onto the Hive* blockchain.

*Note: This is really just a specification of a JSON schema. Hive is an implementation detail.

Decentralized podcast updates!

Given Podping, a standardized way to communicate podcast updates into Hive, anyone can listen for them on the Hive blockchain. An important characteristic of what the Hive community has developed is a standardized HTTP API for applications to utilize, as opposed to having to download and manage the Hive blockchain themselves (though they could if they wanted to).

What does this look like for an application? Below are a few real transactions from the Hive blockchain:

{"_id": "cabd5cc591662b4cb131fb546ae9189d104a00ee",
 "block_num": 54014222,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":3,\"reason\":\"feed_update\",\"urls\":[\"https://feeds.buzzsprout.com/981862.rss\",\"https://feeds.buzzsprout.com/1722601.rss\",\"https://feeds.buzzsprout.com/1287197.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:06Z",
 "trx_id": "ebd5b45772e119c38f75e246d4b70d60ee527716",
 "trx_num": 23,
 "type": "custom_json"}
{"_id": "6b7fe74973283bf99d76ef9b81ced193c43ebe84",
 "block_num": 54014229,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":2,\"reason\":\"feed_update\",\"urls\":[\"https://media.rss.com/njb/feed.xml\",\"https://feeds.buzzsprout.com/1749995.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:27Z",
 "trx_id": "394ecb5449f6a7d9472432915ed2241628e0716e",
 "trx_num": 20,
 "type": "custom_json"}
{"_id": "c6ed805969a3c86634f34352e44a232e907336e1",
 "block_num": 54014236,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":1,\"reason\":\"feed_update\",\"urls\":[\"https://feeds.buzzsprout.com/1575751.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:48Z",
 "trx_id": "b3a97ec7bf97604194755913953b28b6de21c403",
 "trx_num": 12,
 "type": "custom_json"}
 {"_id": "73c63d127e040049f0d6c2f118c1c00e46da17da",
 "allow_curation_rewards": True,
 "allow_votes": True,
 "author": "<redacted>",
 "block_num": 54015435,
 "extensions": [{"type": "comment_payout_beneficiaries",
                 "value": {"beneficiaries": [{"account": "hiveonboard",
                                              "weight": 100},
                                             {"account": "tipu",
                                              "weight": 100}]}}],
 "max_accepted_payout": {"amount": "1000000000",
                         "nai": "@@000000013",
                         "precision": 3},
 "percent_hbd": 10000,
 "permlink": "<redacted>",
 "timestamp": "2021-05-19T03:39:54Z",
 "trx_id": "7d4012b4b7e4901bbaf910a71166a78f5ac9b186",
 "trx_num": 22,
 "type": "comment_options"}
{"_id": "502ddccc9a1e4bb86b7379497e6b17e943c869d4",
 "block_num": 54014474,
 "id": "sm_find_match",
 "json": "{\"match_type\":\"Ranked\",\"app\":\"steemmonsters/0.7.89\"}",
 "required_auths": [],
 "required_posting_auths": ["nandito"],
 "timestamp": "2021-05-19T02:51:45Z",
 "trx_id": "b41cf36a88d798d15e31b1d2b718cd7fa4176b1b",
 "trx_num": 49,
 "type": "custom_json"}
{"_id": "1dc0e544c25978fcf868e40623e930d5bbdc97fa",
 "block_num": 54014474,
 "id": "sm_submit_team",
 "json": "{\"trx_id\":\"8a965173a51cf0457264196066febf3040d9c720\",\"team_hash\":\"946c2011bd2ec8844fcdc2bbf946a738\",\"summoner\":\"C1-49-SM0LILHETC\",\"monsters\":[\"C1-64-TPHW58AV34\",\"C1-50-1AMDZHJ7OG\",\"C1-47-S6HDP0EA5S\",\"C1-62-AQOTC401J4\",\"C1-46-1BP1BKEY3K\"],\"secret\":\"TZte3wNaM7\",\"app\":\"steemmonsters/0.7.24\"}",
 "required_auths": [],
 "required_posting_auths": ["postme"],
 "timestamp": "2021-05-19T02:51:45Z",
 "trx_id": "8ff17c754ff1d9621e9e0014439e720086ab5e85",
 "trx_num": 50,
 "type": "custom_json"}

A few interesting things to note here. It's nice to have all of the information, but it's a bit of a fire hose. How can we clean this up? We can hide it within our application, which might be acceptable for some servers... however it's still all unnecessary bandwidth. Particularly the last two non-Podping operations as well as the non-custom_json type.

Walking the last mile

In order for this to be efficient for client applications — especially mobile applications with limited resources — there's a couple things we need to address. Note: I am only familiar with the beem python library at this moment in time, so it's possible these two points are moot.

  1. Server side filtering of custom_json operations. Currently the python implementation, beem does this on the client side, hiding it from the user. It still pulls in every block even if you tell it to only look for the custom_json operation.
  2. Server side filtering of the custom_json id — even if we pulled in only custom_json operations, we're still getting other data from other applications. Really, we only care about id='podping'.

Given those two changes, we can size down the fire hose to the following podping operations:

{"_id": "cabd5cc591662b4cb131fb546ae9189d104a00ee",
 "block_num": 54014222,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":3,\"reason\":\"feed_update\",\"urls\":[\"https://feeds.buzzsprout.com/981862.rss\",\"https://feeds.buzzsprout.com/1722601.rss\",\"https://feeds.buzzsprout.com/1287197.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:06Z",
 "trx_id": "ebd5b45772e119c38f75e246d4b70d60ee527716",
 "trx_num": 23,
 "type": "custom_json"}
{"_id": "6b7fe74973283bf99d76ef9b81ced193c43ebe84",
 "block_num": 54014229,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":2,\"reason\":\"feed_update\",\"urls\":[\"https://media.rss.com/njb/feed.xml\",\"https://feeds.buzzsprout.com/1749995.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:27Z",
 "trx_id": "394ecb5449f6a7d9472432915ed2241628e0716e",
 "trx_num": 20,
 "type": "custom_json"}
{"_id": "c6ed805969a3c86634f34352e44a232e907336e1",
 "block_num": 54014236,
 "id": "podping",
 "json": "{\"version\":\"0.2\",\"num_urls\":1,\"reason\":\"feed_update\",\"urls\":[\"https://feeds.buzzsprout.com/1575751.rss\"]}",
 "required_auths": [],
 "required_posting_auths": ["hivehydra"],
 "timestamp": "2021-05-19T02:39:48Z",
 "trx_id": "b3a97ec7bf97604194755913953b28b6de21c403",
 "trx_num": 12,
 "type": "custom_json"}

This on its own is a huge improvement, but we can do more...

One step further

See the json attribute? That's our Podping schema! What if we only subscribe to one or two podcasts? We don't need the rest of the results.

Thankfully, there's a straightforward way to handle this: JMESPath

JMESPath is a query language for JSON.

Let's say we only want to get results from the podcast “Quality during Design” in the above list. We can test this in a quick way with the jp cli tool and the JMESPath contains function.

echo '{"version":"0.2","num_urls":3,"reason":"feed_update","urls":["https://feeds.buzzsprout.com/981862.rss","https://feeds.buzzsprout.com/1722601.rss","https://feeds.buzzsprout.com/1287197.rss"]}' \
| jp "contains(urls, 'https://feeds.buzzsprout.com/1722601.rss')"
true

Behold! We just defined a mechanism to query Podping json from a custom_json operation. In this example, the contains function being run by jp is returning true, telling us that this example indeed includes the feed URL we care about.

Why is this important? Couldn't we just parse the json object in our code and check for the URL?

Well, yes... but JMESPath provides us a way to run this query on the API server, without the API server requiring knowledge of our dataset. Meaning, given an implementation of this query on the Hive API, we would happily only get the results we need from the API.

If we only subscribed to one podcast we could get update notifications for it without having to parse through all new Hive blockchain events.

This is truly potential for a whole new generation of a decentralized notification system, and Podcasting 2.0 is driving it.

Hopefully some of the above ideas can be incorporated into the Hive ecosystem in a way that will benefit everyone.

Discuss...

#podcasting20 #podping #jmespath #rss #hive