Monitoring GitHub for new releases

Big sites like GitHub or GitLab are hosting a lot of projects and have numerous of releases a day. And while you as a person can watch a repository on GitHub, you can’t filter out new releases easily. At least not easily findable in the interfaces and checking all the repositories manually because they aren’t part of a build process is too much hassle and will fail in the end. So also for me with highlight.js as it has been updated from version 9.11.0 to 9.12.0 months ago.

Looking at some solutions people were writing about on StackOverflow for example was to parse the HTML and use that as a basis for actions to be executed. A quick check and grep of the output shown that we only have links to releases, but no structured data we can easily parse.

$ curl -s https://github.com/isagalaev/highlight.js/releases | grep -i releases\/tag
    <a href="/isagalaev/highlight.js/releases/tag/9.12.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.11.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.10.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.9.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.8.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.7.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.6.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.5.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.4.0">
    <a href="/isagalaev/highlight.js/releases/tag/9.3.0">

If we take the same URL and add the extension .atom to it, then GitHub presents the same data in a consumable feed format. Now we have structured data with timestamps, URLs and descriptions.

$ curl -s https://github.com/isagalaev/highlight.js/releases.atom
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xml:lang="en-US">
  <id>tag:github.com,2008:https://github.com/isagalaev/highlight.js/releases</id>
  <link type="text/html" rel="alternate" href="https://github.com/isagalaev/highlight.js/releases"/>
  <link type="application/atom+xml" rel="self" href="https://github.com/isagalaev/highlight.js/releases.atom"/>
  <title>Release notes from highlight.js</title>
  <updated>2017-05-31T02:46:46Z</updated>
  <entry>
    <id>tag:github.com,2008:Repository/1213225/9.12.0</id>
    <updated>2017-05-31T02:46:46Z</updated>
    <link rel="alternate" type="text/html" href="/isagalaev/highlight.js/releases/tag/9.12.0"/>
    <title>9.12.0</title>
    <content type="html"><p>Version 9.12.0</p></content>
    <author>
      <name>isagalaev</name>
    </author>
    <media:thumbnail height="30" width="30" url="https://avatars2.githubusercontent.com/u/99931?v=4&s=60"/>
  </entry>
...

This data can be used by a custom parser or RSS-readers like TT-RSS, but also used by platforms like IFTTT to trigger actions like adding it to a backlog or posting it to a Slack-channel.

Google Reader, farewell

For years I have been using Google Reader to keep up to date with websites as I could read them when I had time and I saw which sites where updated. Even if they where idle for months, so a dream for everyone who has a time management plan in place. Google Reader had a webinterface, an API for third-party applications like Feedly and Liferea, but also an app for Android so you could check things when you where waiting somewhere. Until now as Google announced it will stop Google Reader in the summer of 2013. Some people try to revert this decision with a petition, but to be honest I’m not going to wait and that may be sad as Google Reader was the reason I had a Google-account in the past.

That said, I already started to experiment with an alternative last summer and while it is still in development and misses some features the time has come to switch from Google Reader towards TT-RSS. As of today I imported all feeds into TT-RSS and removed also all feeds from Google Reader. The only feeds still in Google Reader are those for Google Listen and the time has come to start searching for an alternative. Google Listen is a podcast application, also discontinued and will also stop working this summer. Maybe I’ll move all podcasts back to my desktop, but it was handy to have them on my phone so I could listen to them in the car.

For now TT-RSS is a good self-hosted alternative with a webinterface, applications for Android and Liferea has also an option to use your TT-RSS installation. Maybe I should spend some time soon to get OPML support just like in Liferea so that Planet-feeds aren’t needed anymore and will make message deduplication easier, but for now it works and I can only say, Google thanks for all the fish. And to be honest I think my Google account will have the same destination as my Twitter account.

Faulty RSS-feeds

Taking a look at some logs from a RSS-collector two things raised my eyebrows. The first is how many feeds are being served by FeedBurner instead of directly being served by the website it self. The part that worries me is that a lot of those feeds also are about security, privacy and compliance. I think a lot of those people have something to think about in 2012.

The other thing that worries me even more is something I discussed with WordPress developers a couple of years ago and I know others who have done the same with other projects. A lot of projects learned to do input validation, but most of them still need to learn to do output validation. The parser I currently use appears to be very strict luckily and drops a feed when it doesn’€™t parse correctly. Here comes the funny part, other parses like from Google Reader seems to be more forgiving.

When I search for “libxml exploit” on Google Search I get 1.220.000 results back. I didn’t start searching for parsers currently in use, but this doesn’t look very promising. With current hash-issues in mind, how could this be used to be an attack vector? Keep in mind that a lot of sites use FeedBurner to take the load of there site. And yes, FeedBurner doesn’t really clean things up if I may believe my current logs. So the recipe looks like a good exploit to misuse, a high profile WordPress based website with FeedBurner enabled and watch the fireworks.

So maybe it is a good idea for 2012 to see if the parser I’m currently using is up to standard. This can become nasty very quickly if things go wrong. Maybe also a note to others, output validation matters together with input validation. The JavaScript-alert is still a funny one to deploy on websites.

Feeds farewell and thanks for all the fish

As my viewing port on the Internet has become an RSS-reader more and more during 2011 I also started to pay attention on the content presented. So during my Christmas break I’m going to remove some feeds from my RSS-reader. As side note, the compressed database dump grows with 1 megabyte between every 5 and 8 days now.

But the first feeds that have to go are websites or blogs that only present a snippet and hope you come to there website to continue reading the article. Some comments I have read why people do that is banners or hoping that you also read other content. For the first there are solutions to embed banners in your RSS-feed. The later is just b*llsh*t as that person is subscribed to your RSS-feed and how much more commitment do you want on reading your content?

What may be a problem is the experience people have reading your RSS-feed as a lot of sites, and yes I’m looking at you also WordPress, that do not include the right CSS in the feed. This is something that needs and can be solved. The other remark is notification and traffic and the question is if those are real issues with the use of a ping-servers and a distribution hubs. FeedBurner is one for example which can take the load of your website or blog. Load that was also there when they where forcing people towards your website.

I may sound hars, but I have to spent my time more wisely. With 125+ feeds in my reader and with a few of those being OPML-feeds it is really time to clean things up. It also makes me wonder how easy it would be to integrate certain features from Google Reader into TT-RSS to get figures how much you read and what you’re reading and what not. First the Christmas cleaning as it takes the backend about 30 days to stop fetching the feed after the last user unsubscribed.

RSS verandert de contentconsumptie

Al enige tijd merk ik dat mijn browsegedrag aan het veranderen is. Waar ik vroeger nog weleens een uurtje of twee ging zitten om websites af te struinen is dat tegenwoordig sterk teruggelopen. Veel websites hebben tegenwoordig een RSS- of Atom-feed waarmee ze artikelen of wijzigingen publiceren om zo oa zoekmachines te triggeren om hun nieuwe content te indexeren. Het is ook een ideale manier om op een makkelijke manier bij te blijven en te scannen op content die echt interessant is.

Ideaal eigenlijk om iemands blog te volgen die niet regelmatig updates maakt, maar ook om updates te ontvangen van je favoriete forum of nieuwssite. Sommige sites hebben eindelijk het licht gezien om het zo aan te bieden dat hun nieuwsbrief in je mailbox redundant is geworden. Je inbox kan in sommige gevallen dus een stuk rustiger worden en alleen nog maar mailtjes bevatten die belangrijk zijn.

Helaas kleven er ook nadelen aan deze verandering. Zo zijn er partijen die alleen snippets aanbieden via de RSS-feed om verschillende redenen. Soms lijkt de belangrijkste reclame inkomsten te zijn, maar oa Google Adsense bestaat ook voor RSS-feeds. Andere willen dat je het verhaal leest op hun website, want dan zou je geïnteresseerd kunnen raken in andere content. Was dat niet al iets waarom mensen je RSS-feed volgen?

Een ander nadeel is het gevaar van water proberen te drinken uit een brandweerspuit door te veel feeds proberen te volgen. De juiste tool is belangrijk en helaas kom ik elke keer weer terug bij Google Reader om elke keer dezelfde redenen. Overal beschikbaar waar een fatsoenlijke webbrowser beschikbaar is, trend analyse om feeds die je niet leest eruit te gooien met dank aan statistieken en als laatste de mogelijkheid om tags toe te voegen om later zo nog item terug te kunnen vinden. In die drie gebieden is Google helaas gewoon goed en heb ik voorlopig geen FOSS alternatief kunnen vinden, maar gezien verschillende ontwikkelingen zoals de Freedom Box en Twitter die gegevens moet opleveren over sommige gebruikers kan daar weleens verandering in komen.