Sunday, October 21, 2018

Google+ Migration - Part I: Takeout



For the last 7 years, I have been using Google+ as my primary social sharing site - with
automated link-sharing to Twitter. With Google+ going away, I am looking to migrate my public postings to a new site, where they can be presented in a similar way. As the target for the migration, I have chosen a local community-operated pod of the diaspora* network.

Migrating social media-data is particularly challenging. They are by definition an amalgamation of data from different sources: links, re-sharing, likes comments etc. - all potentially created by different users of the original social sharing platform. Also contrary to other data-sets (e.g. contact-lists, calendars or spreadsheets), there are no established, standardized data formats for exchanging social networking site activity in a platform independent way.

Without being an expert in copyright and data protection law, I am taking a very conservative approach to ownership and consent. Users of the original Google+ site were explicitly ok with the following cross-user interactions from the perspective of my post-stream:

  • re-sharing post of other users (while respecting the original scope)
  • other users liking ("plussing") my posts
  • other users commenting on my posts

Since none of these other users have ever granted my an explicit permission to replicate this content in a new form on another platform, I will only replicate my original content without any interactions, but in addition to public posts also include posts to communities, which I consider public. The purpose of this tutorial is to present some tools and methods that could be used to process and select data in a different way to implement a different policy.

For the technicalities of migration, I am making the following assumptions assumptions:

  • as input, only rely on data that is contained in the takeout-archive. This way the migration could be repeated after the original Google+ site is now longer accessible.
  • use the Python programming language for parsing, processing and re-formatting of the data.
  • use a bot with a Python API-library for diaspora* to repost (very slowly!) to the new target system.
While Python is highly portable, any examples and instructions in this tutorial will assume a unix-like operating system an be tested in particular on a current Debian GNU/Linux based system.

Ordering Takeout

For over 10 years, a Google team calling itself the Data Liberation Front, has been working on a promise that users should be able to efficiently extract any of the data they create online with Google services and take them elsewhere. The resulting service is takeout.google.com.

In order to get an archive suitable for processing, we need to request a takeout archive of the Google+ stream data in JSON format. Here are some basic instructions on how to request a new takeout archive.

For the purpose of this migration, we only need to select "Google+ Stream" in data selection. However, we need to open the extension panel and select JSON format instead of the default HTML. While the HTML export only contains the information necessary to display each post, the JSON export contains additional meta-data like access-rights in an easily machine readable format.

Given the high load on the service right now, archive creation for large streams can take a while or be incomplete. We should expect this process to become more reliable again in the next few weeks.


The next step will be to understand the structure of the data in the takeout archive.