Categories
Microblog

Microsoft Office Online switches to OpenDocument as default format

So earlier today I was trying out Microsoft’s online office suite and noticed something interesting. Whenever you create a new Word, Excel, or PowerPoint file from the OneDrive interface, it automatically creates it using the OpenDocument file format (odt, ods, odp) as opposed to the Microsoft Office format (docx, xlsx, ppt). Interestingly if you create it from Office.com it uses the Microsoft format instead.

Categories
Blog

Spotify is trying to embrace, extend, and extinguish open podcasting

I really love podcasts. Not only do they provide great entertainment value as an alternative to audiobooks, but they are also one of the last open ecosystems on the web. Anyone can start a podcast by publishing an RSS feed on their website without having to rely on a central platform (thus nobody can “ban” your podcast). Once published listeners can consume their favorite podcasts from any RSS reader, including many specially made for podcasts like PocketCasts and Overcast.

This arrangement is beneficial to creators because it gives them full freedom of expression without having to worry about the censors on platforms like YouTube, and it gives them complete freedom of choice on how to monetize their work. It is equally beneficial to consumers who get to choose among hundreds of independently developed podcast apps to find the one with the best features for them. If a consumer wants to switch podcast players they can also do so while taking their subscriptions with them.

However, over the last few years Spotify has been making moves that could threaten this open ecosystem.

Embrace

In 2015, Spotify started embracing podcasts by enabling users to discover and subscribe shows right in the Spotify app. The feature works just like any other podcast client and scrapes RSS feeds found on the web.

Later in 2019, Spotify acquired the podcast networks Gimlet Media, Anchor FM, and Parcast. However, they did not limit access to podcasts produced on those networks so users could still listen using their client of choice.

Extend

In May 2020, Spotify announced that it acquired an exclusive license to The Joe Rogan Experience (a popular comedy podcast) for $100 million dollars. Starting in September 2020, Joe’s podcast will be removed from all 3rd party podcasting apps and made available only in Spotify’s own podcasts section.

If the Joe Rogan license is a commercial success then it seems likely that the shows from the other podcast networks that Spotify owns will also be made exclusive to their own apps.

Extinguish

If Spotify chooses to continue on their current path of exclusive content it will break interoperability with other podcast apps and force listeners of those shows to use the Spotify podcast client. I suspect that many listeners will also transfer their existing subscriptions into Spotify to avoid needing two separate podcast clients.

If Spotify gains enough market share then it will effectively become the de facto gatekeeper of podcasts (similar to how Google Play is the de facto gatekeeper of Android apps despite side loading and alternative app stores). Once that happens many of the benefits of podcasts will be destroyed. Creators will no longer have full creative freedom as they risk annoying the Spotify censors and having a large portion of their audience taken away from them. Consumers will no longer have choice in podcast clients if they want to listen to shows that are exclusive to Spotify.

I really hope that Spotify’s attempt to centralize the podcasting ecosystem around their apps is a colossal failure, however, the Embrace, Extend, and Extinguish strategy is quite effective and thus I fear they may succeed.

As a small and feeble attempt to protest this direction that Spotify is moving I have decided to cancel my Spotify Premium subscription.

Categories
Microblog

Switching to exFAT

I’ve decided that I’m going to be reformatting my 25 TB of external storage capacity (for storing datasets, backups, etc.) to exFAT. Most of it is currently ext4 or NTFS.

exFAT is great because similar to its predecessor FAT it has read-write compatibility with Linux, Windows, and macOS. But while FAT can only have files as big as 4 GB and partitions of 16 TB, exFAT can do 16 EB for files and 64 ZB for partitions. Lots more room to grow.

It’ll be a slow process since I can only format one drive at a time and need to copy the data to another drive and back again. So far I’ve converted 4 TB of data.

Categories
Microblog

Jitsi open source video chat

So my university has shutdown the campus for the remainder of the semester due to Coronavirus concerns and asked all students to attend classes remotely (mainly using Zoom for live-streaming lectures). I went looking for an open source cross platform video conferencing solution with a fast onboarding process to keep in touch with fellow students and found Jitsi to fit the bill.

It’s free, it’s FOSS, and there are no accounts required to create a chat session on their website. You just need to enter a name for your room, and they give you a link to share for people to join.

The only officially supported web browser is Google Chrome which kinda sucks. But it seems to work okay in Firefox except I couldn’t get it to detect any of my microphones (your usage may vary). Instead, I’m using it in Falkon and it works flawlessly.

Unfortunately, it also doesn’t appear that video chats are end-to-end encrypted which means whoever runs the server can see the raw footage (but you can self-host).

Overall it’s good enough and it looks like the public service is hosted by 8×8, which is a public VoIP company, so I’m not overly concerned about eavesdropping (due to the lack of end-to-end encryption). I’ll keep an eye out for better options but for now I’m sticking with Jitsi.

Categories
Microblog

Plasma Mobile on the PinePhone

Today, I tried out KDE Neon on my PinePhone “Brave Heart” and recorded the following video.

Here is a summary of some of the default apps:

  • Buho – the default note taking app. Notes can be tagged by color, keyword, and organized into “books”. It can also save URLs.
  • Discover – the same KDE software center available on the desktop.
  • Index – the file manager which draws inspiration from Dolphin.
  • KDE Connect – sync your Plasma Mobile phone with your Plasma Desktop.
  • Koko – the photo gallery and viewer. Has some issues with thumbnails.
  • Konsole – the same KDE terminal emulator available on the desktop.
  • Okular – the PDF reader for Plasma Mobile. It’s a different application from Okular for Plasma Desktop.
  • Phone Book – stores your contacts phone numbers, emails, etc.
  • Settings – settings app for Plasma Mobile which is currently missing some categories (ex: battery).
  • Wave – the default music player which don’t have any sound right now.
  • Phone – the dialer app for calling numbers and contacts.
  • Angelfish – the default web browser which has support for tabs, history, bookmarks, etc.
  • Calindori – the default calendar app but I couldn’t figure out how to add events.
Categories
Blog

Which Search Engine Has the Best Results

I was recently wondering which of the popular web search engines provided the best results and decided to try to design an objective benchmark for evaluating them. My hypothesis was that Google would score the best followed by StartPage (Google aggregator) and then Bing and it’s aggregators.

Usually when evaluating search engine performance there are two methods I’ve seen used:

  • Have humans search for things and rate the results
  • Create a dataset of mappings between queries and “ideal” result URLs

The problem with having humans rate search results is that it is expensive and hard to replicate results. Creating a dataset of “correct” webpages to return for each query solves the repeatability of the experiment problem but is also expensive upfront and depends on the human creating the dataset’s subjective biases.

Instead of using either of those methods I decided to evaluate the search engines on the specific task of answering factual questions from humans asked in natural language. Each engine is scored by how many of its top 10 results contain the correct answer.

Although this approach is not very effective at evaluating the quality of a single query, I believe in aggregate over thousands of queries it should provide a reasonable estimation of how well each engine can answer the users questions.

To source the factoid questions, I use the Stanford Question Answering Dataset (SQuAD) which is a popular natural language dataset containing 100k factual questions and answers from Wikipedia collected by Mechanical Turk workers.

Here are some sample questions from the dataset:

Q: How did the black death make it to the Mediterranean and Europe?

A: merchant ships

Q: What is the largest city of Poland?

A: Warsaw

Q: In 1755 what fort did British capture?

A: Fort Beauséjour

Some of the questions in the dataset are also rather ambiguous such as the one below:

Q: What order did British make of French?

A: expulsion of the Acadian

This is because the dataset is designed to train question answering models that have access to the context that contains the answer. In the case of SQaUD each Q/A pair comes with the paragraph from Wikipedia that contains the answer.

However, I don’t believe this is a huge problem since most likely all search engines will perform poorly on those types of questions and no individual one will be put at a disadvantage.

Collecting data

To get the results from each search engine, I wrote a Python script that connects to Firefox via Selenium and performs searches just like regular users via the browser.

The first 10 results are extracted using CSS rules specific to each search engine and then those links are downloaded using the requests library. To check if a particular result is a “match” or not we simply perform an exact match search of the page source code for the correct answer (both normalized to lowercase).

Again this is not a perfect way of determining whether any single page really answers a query, but in aggregate it should provide a good estimate.

Some search engines are harder to scrape due to rate limiting. The most aggressive rate limiters were: Qwant, Yandex, and Gigablast. They often blocked me after just two queries (on a new IP) and thus there are fewer results available for those engines. Also, Cliqz, Lycos, Yahoo!, and YaCy were all added mid experiment, so they have fewer results too.

I scraped results for about 2 weeks and collected about 3k queries for most engines. Below is a graph of the number of queries that were scraped from each search engine.

Crunching the numbers

Now that the data is collected there are lots of ways to analyze it. For each query we have the number of matching documents, and for the latter half of queries also the list of result links saved.

The first thing I decided to do was see which search engine had the highest average number of matching documents.

Much to my surprise Google actually came in second to Ecosia. I was rather shocked with this since Ecosia’s gimmick is that they plant trees with the money from ads, not having Google beating search results.

Also surprising is the number of Bing aggregators (Ecosia, DuckDuckGo, Yahoo!) that all came in ahead of Bing itself. One reason may be that those engines each apply their own ranking on top of the results returned by Bing and some claim to also search other sources.

Below is a chart with the exact scores of each search engine.

Search EngineScoreCount
Ecosia2.820871778555523143
Google2.653978159126363205
DuckDuckGo2.583777012214223193
StartPage2.557232704402523180
Yahoo!2.512204424103742622
Bing2.48093753200
Qwant2.32365747460087689
Yandex1.926519337016571810
Gigablast1.51381215469613905
Cliqz1.397241379310342900
Lycos1.209626787582842867
YaCy0.8980503655564582462

To further understand why the Bing aggregators performed so well I wanted to check how much of their own ranking was being used. I computed the average Levenshtein distance between each two search engines, which is the minimum number of single result edits (insertions, deletions or substitutions) required to change one results page into the other.

Edit distance matrix of different search results

Of the three, Ecosia was the most different from pure Bing with an average edit distance of 8. DuckDuckGo was the second most different with edit distance of 7, followed by Yahoo! with a distance of 5.

Interestingly the edit distances of Ecosia, DuckDuckGo, and Yahoo! seem to correlate well with their overall rankings where Ecosia came in 1st, DuckDuckGo 3rd, and Yahoo! 5th. This would indicate that whatever modifications these engines have made to the default Bing ranking do indeed improve search result quality.

Closing thoughts

This was a pretty fun little experiment to do, and I am happy to see some different results from what I expected. I am making all the collected data and scripts available for anyone who wants to do their own analysis.

This study does not account for features besides search result quality such as instant answers, bangs, privacy, etc. and thus it doesn’t really show which search engine is “best” just which one provides the best results for factoid questions.

I plan to continue using DuckDuckGo as my primary search engine despite it coming in 3rd place. The results of the top 6 search engines are all pretty close, so I would expect the experience across them to be similar.

Categories
Blog

Why I quit using Google

So I was recently asked why I prefer to use free and open source software over more conventional and popular proprietary software and services.

A few years ago I was an avid Google user. I was deeply embedded in the Google ecosystem and used their products everywhere. I used Gmail for email, Google Calendar and Contacts for PIM, YouTube for entertainment, Google Newsstand for news, Android for mobile, and Chrome as my web browser.

I would upload all of my family photos to Google Photos and all of my personal documents to Google Drive (which were all in Google Docs format). I used Google Domains to register my domain names for websites where I would keep track of my users using Google Analytics and monetize them using Google AdSense.

I used Google Hangouts (one of Google’s previous messaging plays) to communicate with friends and family and Google Wallet (with debit card) to buy things online and in-store.

My home is covered with Google Homes (1 in my office, 1 in my bedroom, 1 in the main living area) which I would use to play music on my Google Play Music subscription and podcasts from Google Podcasts.

I have easily invested thousands of dollars into my Google account to buy movies, TV shows, apps, and Google hardware devices. This was truly the Google life.

Then one day, I received an email from Google that changed everything.

“Your account has been suspended”

Just the thing you want to wake up to in the morning. An email from Google saying that your account has been suspended due to a perceived Terms of Use violation. No prior warning. No appeals process. No number to call. Trying to sign in to your Google account yields an error and all of your connected devices are signed out. All of your Google data, your photos, emails, contacts, calendars, purchased movies and TV shows. All gone.

I nearly had a heart attack, until I saw that the Google account that had been suspended was in fact not my main personal Google account, but a throwaway Gmail account that I created years prior for a project. I hadn’t touched the other account since creation and forgot it existed. Apparently my personal Gmail was listed as the recovery address for the throwaway account and that’s why I received the termination email.

Although I was able to breathe a sigh of relief this time, the email was wake up call. I was forced to critically reevaluate my dependence on a single company for all the tech products and services in my life.

I found myself to be a frog in a heating pot of water and I made the decision that I was going to jump out.

Leaving Google

Today there are plenty of lists on the internet providing alternatives to Google services such as this and this. Although the “DeGoogle” movement was still in its infancy when I was making the move.

The first Google service I decided to drop was Gmail, the heart of my online identity. I migrated to Fastmail with my own domain in case I needed to move again (hint: glad I did, now I self host my email). Fastmail also provided calendar and contacts solutions so that took care of leaving Google Calendar and Contacts.

Here are some other alternatives that I moved to:

Migrating away from Google was not a fast or easy process. It took years to get where I am now and there are still several Google services that I depend on: YouTube and Google Home.

Eventually, my Google Home’s will grow old and become unsupported at which point hopefully the Mycroft devices have matured and become available for purchase. YouTube may never be replaced (although I do hope for projects like PeerTube to succeed) but I find the compromise of using only one or two Google services to be acceptable.

At this point losing my Google account due to a mistake in their machine learning would largely be inconsequential and my focus has shifted to leaving Amazon which I use for most of my shopping and cloud services.

The reason that I moved to mostly FOSS applications is that it seems to be the only software ecosystem where everything works seamlessly together and I don’t have to cede control to any single company. Alternatively I could have simply split my service usage up evenly across Google, Microsoft, Amazon, and Apple but I don’t feel that they would have worked as nicely together.

Overall I’m very happy with the open source ecosystem. I use Ubuntu with KDE on all of my computers and Android (no GApps) on my mobile phone. I’ve ordered the PinePhone “Brave Heart” and hope to one day be able to use it or one of its successors as a daily driver with Ubuntu Touch or Plasma Mobile.

I don’t want to give the impression that I exclusively use open source software either, I do use a number of proprietary apps including: Sublime Text, Typora, and Cloudron.

Categories
Blog

How to Easily Migrate Emails Between Accounts

If you’ve decided to move to another email provider it’s possible to take all of your old emails and folders with you. The easiest way I’ve found to do this is using the mail client Mozilla Thunderbird.

Thunderbird new account dialog. File > New > Existing mail account.

With Thunderbird installed sign into both your old and new emails accounts. This is provider dependent but in general if you are using a popular email service like Gmail, Yahoo, Outlook, etc. then Thunderbird can auto discover the SMTP endpoints. If you have two-factor authentication setup on your email account you may need to create an app password.

If you are unsure here are the instructions for a few popular services:

When you set up your old account make sure you set Thunderbird to download the entire email history not just the last few months.

Account settings for you can set how many emails Thunderbird will download. Edit > Account Settings.

Once you are signed in to both accounts you should see all of your emails and folders in the old account. You may want to wait for Thunderbird to finish downloading emails if necessary.

To move emails, simply select the inbox of your old mail account, use Ctrl + A to select all the emails, then drag them to the new inbox. You will also need to drag each of the folders from the old email account to the new one.

If you’d like to just move a couple of emails you can select them individually and drag them to the new email account.

Categories
Microblog

A Trip Through New York (1911)

I find this video just remarkable. It’s only been 109 years and yet things are so much different now. I wonder what things will be like in a hundred more years. With any luck I’ll live to see.

Categories
Microblog

Orchid VPN

I have previously mentioned how I felt that Tor should offer some way for users to pay for bandwidth on its network to incentivize more nodes to join. Well, today I found out about Orchid which is a decentralized VPN that allows users to do just that.

It’s basically a marketplace for bandwidth between clients and VPN providers. Anyone can set up a node and act as exit point. From what I’ve read it seems like exit nodes can even choose what type of content will go through them: torrents, email, specific websites, etc. can all be blocked or allowed. The app will automatically pick providers that support the type of content that you’re trying to access.

Given this dynamic I would imagine that different types of content will start to cost more. For example, bandwidth providers who allow torrents will charge a premium due to the increased legal risk. On the other hand, providers who only allow access to known safe sites like YouTube, Reddit, etc. would be much cheaper.

Orchid even supports multiple hops within the network just like Tor. There are a few concerns I have:

  • Since it’s decentralized there is no way to ban exit nodes for logging peoples traffic
  • Everything is done in OXT which is Orchid’s native currency on Ethereum so it’s kinda a pain to pay for the service
  • Orchid uses its own VPN protocol not a standard one like OpenVPN or Wireguard.

For now, I’m going to continue using Private Internet Access as my VPN, but Orchid is something I’ll keep my eye on.