kugelfish

Posts

The Fallacy of distributed = good

I have recently been looking for an alternative social media platform and started using Diaspora* via the diasporing.ch pod. Not unlike the cryptocurrency community, the proponents of the various platforms in the Fediverse seem to rather uncritically advocate the distributed nature of these platforms as an inherently positive property in particular when it comes to privacy and data protection. I tend to agree with Yuval Harari who argues in "Sapiens" that empires or scaled, centralized forms of organization are one of Homo Sapiens' significant cultural accomplishments. A majority of humans through history have lived as part of some sort of empire. Empires can provide prosperity and ensure lasting peace and stability - like the Pax Romana or in my generation, the Pax Americana. We often have a love/hate relationship with empires - even many protesters who are busy burning American flags during the day, secretly hope that their children some day will get into Har...

Google+ Migration - Part VIII: Export to Diaspora*

<- Part VII: Conversion & Staging The last stage of the process is to finally export the converted posts to Diaspora* the chosen target system. As we want these post to appear slowly and close to their original post date anniversary, this process is going to be drawn out over at least one year. While we could do this by hand, it should ideally be done by some automated process. For this to work, we need some kind of server-type machine that is up and running and connected to the Internet frequently enough during a whole year. The resource requirements are quite small, except for storing the staged data which for some users could easily be in multiple gigabytes, mostly depending on the number posts with images. Today it is quite easy to get small & cheap virtual server instances from any cloud provider, for example the micro sized compute engine instances on Google Cloud should be part of the free tier even. I also still have a few of the small, low power Rasbper...

Google+ Migration - Part VII: Conversion & Staging

<- Part VI: Location, Location, Location Part VIII: Export to Diaspora* -> We are now ready to put all the pieces together for exporting to Diaspora*, the new target platform. If we had some sort of " Minitrue " permissions to rewrite history on the target system, the imported posts could appear to always have been there since their original G+ posting date. However since we have only have regular user permissions, the only choice is to post them as new posts at some future point in time. The most straightforward way to upload the archive would be to re-post in chronological order as quickly as possible without causing overload. If the new account is not only used for archive purposes, we may want to maximize the relevance of the archive posts in the new stream. In this case, a better way would be to post each archive post on the anniversary of its original post-date, creating some sort of "this day in history" series. This would require that the ...

Google+ Migration - Part VI: Location, Location, Location!

<- Image Attachments Conversion & Staging -> Before we focus on putting all the pieces together, here a small, optional excursion into how to make use of location information contained in G+ posts. We should consider carefully if and how we want to include geo location information as there might be privacy and safety implications. For such locations, it can make sense to choose the point of a nearby landmark or add some random noise to the location coordinates. Many of my public photo sharing post containing the location of near where the photos where taken. Diaspora* posts can contain a location tag as well, but it does not seem to be very informative and the diaspy API currently does not support adding post a post location. Instead we can process the location information contained in the post takeout JSON files and transform it to extract some information which we can use to format the new posts. In particular, we want to include a location link to the corre...

Google+ Migration - Part V: Image Attachments

< - Part IV: Visibility Scope & Filtering Part VI: Location, Location, Location -> Google+ has always been rather good at dealing with photos - the photo functions were built on the foundation of Picasa and later spun out as Google Photos. Not surprising that the platform was popular with photographers and many posts contain photos. In the takeout archive, photos or images/media file attachments to posts are rather challenging. In addition to the .json files containing each of the posts, the Takeout/Google+ Stream/Posts directory also includes two files for each image attached to a post. The basename is the originally uploaded filename, with a .jpg extension for the image file itself and a jpg.metadata.csv for for some additional information about the image. If we originally attached an image cat.jpg to a post, there should now be a cat.jpg and cat.jpg.metadata.csv file in the post directory. However if over the years, we have been unimaginative in naming files...

Google+ Migration - Part IV: Visibility Scope & Filtering

<- Part III: Content Transformation Part V: Image Attachments -> Circles and with them the ability to share different content with different sets of people was one of the big differentiators of Google+ over other platforms at the time, which typically had a fixed sharing model and visibility scope. Circles were based on the observation that most people in real life interact with several "social circles" and often would not want these circles to mix. The idea of Google+ was that it should be possible to manage all these different circles under a single online identity (which should also match the "real name" identity of our governments civil registry). It turns out that while the observation of disjoint social circles was correct, most users prefer to use different platform and online identities to manage to make sure they don't inadvertently mix. Google+ tried hard to make sharing scopes obvious and unsurprising, but the model remained complex, ...

Google+ Migration - Part III: Content Transformation

<- Part II: Understanding the takeout archive Part IV: -> Visibility Scope & Filtering -> After we have had a look at the structure of the takeout archive, we can build some scripts to translate the content of the JSON post description into a format that is suitable for import into the target system, which in our case is Diaspora*. The following script is a proof of concept conversion of a single post file from the takeout archive to text string that is suitable for upload to a Diaspora* server using the diaspy API. Images are more challenging and will be handled separately in a later episode. There is also no verification on whether the original post had public visibility and should be re-posted publicly. The main focus is on the parse_post and format_post methods. The purpose of the parse_post method is to extract the desired information from the JSON representation of a post, while the format_post method uses this data to format the input text ...

Google+ Migration - Part II: Understanding the Takeout Archive

<- Part I: Takeout Part II: Content Transformation -> Once we the takeout archive has been successfully generated we can download and unarchive/extract it to our local disks. At that point we should find a new directory called Takeout with the Google+ posts being located at the following directory location: Takeout/Google+ Stream/Posts . This posts directory contains 3 types of files: File containing data for each post in JSON format Media files of images or videos uploaded and attached to posts, for example in JPG format Metadata files for each media-file in CSV forma with an additional extensions of .metadata.csv The filenames are generated as part of the takeout archive generation process with the following conventions: the post filenames are structured as a date in YYYYMMDD format followed by a snippet of of the post text or the word "Post" if there is not text. The media filenames seem to be close to the original names of the files when they we...

Google+ Migration - Part I: Takeout

Part II: Understanding the takeout archive -> For the last 7 years, I have been using Google+ as my primary social sharing site - with automated link-sharing to Twitter. With Google+ going away, I am looking to migrate my public postings to a new site, where they can be presented in a similar way. As the target for the migration, I have chosen a local community-operated pod of the diaspora* network. Migrating social media-data is particularly challenging. They are by definition an amalgamation of data from different sources: links, re-sharing, likes comments etc. - all potentially created by different users of the original social sharing platform. Also contrary to other data-sets (e.g. contact-lists, calendars or spreadsheets), there are no established, standardized data formats for exchanging social networking site activity in a platform independent way. Without being an expert in copyright and data protection law, I am taking a very conservative approach to ownersh...

The Internship

During the summer month, our offices are buzzing with young, enthusiastic people from all over the world - a sign that it's intern season. Internships are the closest that academic professions have to the apprenticeship model still common in Germanic countries. Students get to experience professional life for a few months during semester breaks and learn some practical skills that might improve their perspectives of employment, while employers get to build relationships with some of the most promising students before they officially enter the job market. Some employers complain that students don't leave university with the exact skillset that they are currently looking for in their entry level applicants. However the most important skills a good university should teach are the ability to reason, to learn and to understand the underlying scientific foundations of a given field. Many of the practical skills needed to excel in a certain profession are best acquired on the jo...

Startup Scene: Why are there no Unicorns in Switzerland?

Since the term has been coined a few years ago, unicorns have become the mythical creature of the venture capital industry: privately held (tech) startups with a valuation of more than a billion dollars. What is a Startup? While indeed extremely rare (about 200 globally), the concept of unicorns helps to clarify what people instinctively mean when they say "Startup" - specially when used as an anglicism in other languages. The most literal definition is a new company. But for that matter, most new companies are restaurants, gas stations and other small business. Or maybe being high-risk? By that definition restaurants qualify as well as many fail within a year. What about being innovative? Most successful innovation is created by large established organizations which have large R&D budgets and who can often attract the top talent in a given field. My favorite definition of a "Startup" is the one by Steve Blank : A startup is an organization formed to se...

When you come to a fork() in the code, take it!

Linux is a multi-user, multi-tasking based system, which means that even a computer as small as the Raspberry Pi, can be used by multiple users simultaneously and there can be multiple processes executing (seemingly) all at once. For example, here are all the processes currently running for the user pi: pi@raspberrypi ~ $ ps -fu pi UID PID PPID C STIME TTY TIME CMD pi 4792 4785 0 Mar11 ? 00:00:04 sshd: pi@pts/0 pi 4793 4792 0 Mar11 pts/0 00:00:04 -bash pi 6137 6130 0 00:30 ? 00:00:00 sshd: pi@pts/1 pi 6138 6137 1 00:30 pts/1 00:00:01 -bash pi 6185 4793 0 00:32 pts/0 00:00:00 tail -f /var/log/mess...