Sunday, December 15, 2013 Tuesday, December 10, 2013

PyLadies December 2013 meetup summary

The PyLadies Amsterdam “Tot ziens 2013” meeting was hosted by Travelbird, in Amsterdam.

This was a live summary and has not been reviewed by the speakers.

Guido Kollerie: Data Analysis with Pandas

Guido works for a company that deals with a lot of questionnaires and other big datasets. They used to process this data in Oracle, with a lot of complex queries that didn’t work very well for them.

As they were using Python, Guido started looking at Pandas, a library that helps you handle data. It’s built on top of numpy. It supports Python 3. Dependencies are not so well registered, so installation is not entirely trivial.

Series

Series is one of Panda’s data structures. Guido recommends using iPython notebook, for easy interaction with the data. Series are basically similar to Python dicts. Noteworthy is that in Panda, values are always strongly typed.

You can select a row on it’s key, like a dictionary, but also by it’s index or slice, like a list. However, in a similar way you can also do advanced queries against your series. Powerful in Pandas is that you can easily apply operations to your entire dataset, like multiplying each row with a particular integer, or something more flexible with a custom Python function. You can also use this to mix multiple series. Pandas handles cases like series of different length very well too, reducing error handling code needed.

Dataframes

Dataframes are basically a combination of series. It’s similar to a matrix type. Again, everything is strongly typed. Like with series, rows can be indexed by plain integers, or you can provide your own indexes, like in a dict.

Reading input and running queries

Dataframes can be constructed manually, but it’s also able to read CSV files or other formats directly into dataframes. Recently, support for reading dataframes from SQL was added as well. Any dataframe can be output as CSV or Excel as well, so it’s also possible to push data back into SQL.

On a dataframe, you can pick out a particular column or row quite easily. It’s similar to selections on series, but then a lot more powerful. The result of an operation on series can also be added to a new column in a dataframe.

Dataframes are somewhat similar to tables, and Pandas also supports some operations that can be done on SQL tables. For example, joins can be done with the merge function.

Practical example

Guido shows us an example using the number of students at various universities. Reading from CSV, he first uses rename to rename all the columns to their lowercase form, and filters a subselection of columns. Like SQL, Pandas also supports groupby on dataframes, similar to GROUP BY in SQL, including aggregate functions using sum().

To make a nice graph, the data will need to be sorted. Dataframes have a fairly flexible sort() function. You define the sorting in Pandas using the axis: 0 is for rows, 1 is for columns. This can be a little confusing at first. It takes a total of about six lines, from CSV file to nice looking graphs.

Next, we look at the differences between men and women at various studies. This also takes only a handful of lines: group by the name, e.g. psychology, calculate the difference between the sum of men and women, sort on the biggest difference, take the top and bottom five, and plot. And we have a plot with the ratio of men to women for the most extreme differences.

The example Guido shows is just a very tiny dataset, but it also performs very well on larger datasets. Queries on even millions of rows are nearly instant.

Other examples Guido shows, with just a handful of lines, is showing all different computer science studies and how many universities offer them, or pivoting the angle and grouping of data. This last feature can also be used for stacking and unstacking data. And there are many many more features to make working with your data much simpler and much faster, and save a lot of boilerplating and error handling code.

The documentation can be a little brief. Guido recommends “Python for Data Analysis: Data Wrangling with Pandas, NumPy and iPython”.

Erik Romijn: Clean Python code

I have yet to master the skill of making live summaries of my own sessions. However, the gist is quite simple: use your brain, don’t blindly follow rules and buy this awesome book. My slides are on speakerdeck.

Besma Mcharek: On learning how to combine flask, open data, raspberry pi and google-maps

Besma started as a Python developer, but also picked up Rails later. She was inspired to look at flask after the last PyLadies Amsterdam meetup. Flask can easily run on a RaspBerry Pi, making it very simple for a newbie to create a basic web application, hosted on affordable hardware.

Besma also teaches Python to random citizens who are eager to learn. Being a on a limited budget, Raspberry PI’s are great test platforms for her students. She was also inspired to build this application by a new Google Maps extension to Flask, and working on making technology and code more understandable to civil servants working on publishing open data.

In her application, she uses the flask-googlemaps package, along with flask itself. The extension makes it very simple to include a basic Google map inside, without requiring all the Google Maps boilerplating. She wanted to use data of something that really matters to her. She has a four year old daughter, and will have to choose a school for her soon. Using the existing open data for this was non-trivial, as the encoding was not utf-8. In the end, she managed to translate this to a basic table in a flask application as a first step.

Next was to display the data on a Google map. With the Google maps extension, this wasn’t very difficult. There are however 448 schools, including universities. So the first step is to filter on primary schools, of which there are 353. That’s still a lot, so selecting on nearby schools is a good next step. That’s not supported by Google Maps though, so she had to do it manually. As this only covers a very small area, she could stick to a simple algorithm for this. That filters it down far enough to be usable

She really enjoyed writing this, and it really made it a lot simpler to make a choice for a school - which would have been impossible with just the CSV.

Wednesday, October 16, 2013

Cocoaheads October 2013 meetup summaries

The CocoaHeads NL October 16 meeting was hosted by Ebuddy, in Amsterdam. We had sessions on AFNetworking and Cocos2DX.

This summary has not been reviewed by the speakers.

Ebuddy - Robert Atkins

Robert tells us about Ebuddy and it’s products. They make XMS, a mobile messaging platform that works on all platforms. They’ve just started offering APIs in the form of HTML/CSS widgets, and an XMPP API. Refugees from Google Talk are welcome. More info on their developer site.

AFNetworking 2.0 - Mattt Thompson

(Matt added an extra T to his name so that the domains and usernames would be available anywhere.)

Matt tells us how AFNetworking, which he built, is named after the Alamo Fire flower. Previously, that was the name of a startup now named Gowalla. They try to help people discover beautiful things in other places. Amongst other things, they hope to solve the problem of many Americans not knowing the locations of countries until they invade them. Facebook bought them and Mattt moved to Heroku. AFNetworking came from Gowalla originally and was then open sourced.

AFNetworking is the most watched project on GitHub, with 9200 stars, used in over 10.000 apps. This brings a lot of responsibility, and they try to be the best example for the community. They use continuous integration with Travis and automatic documentation on CocoaDocs. There’s a few official extensions, for example for S3 and OAuth. There is also an official, currently out of date, Xcode template, which also sets you up with CocoaPods and other good things. Mattt strongly recommends Alcatraz for installing Xcode extensions like this. There are also some third party extensions, and projects built on top of AFNetworking, like RestKit, NimbusKit and OctoKit

Last WWDC Mattt went to What’s New in Foundation Networking, where NSURLSession was introduced, in favour of NSURLConnection. Amongst other improvements, this allows many settings to be session-specific, like caching or cookies. It also includes tasks, which make it much easier to perform background tasks.

AFNetworking 2.0

AFNetworking 2.0 is a recent update. It requires iOS 6+ or Mac OS X 10.8+ and Xcode 5 and can use NSURLSession and NSURLConnection. The new serializers make it much simpler to implement custom serialization for requests or responses, with a lot less boilerplating than in older versions. Built-in serializers for requests are HTTP, JSON and property list. For responses, HTTP, JSON, property list and XML and images are built-in. There are extensions supporting MSgPack, CSV/TSV, vCard or vCal. This new architecture also opens up many new options for object serialization, reducing boring mapping and boilerplating code. Another example of what you could do is integrating Core Image filters for all downloaded images, by using a custom image response serializers.

AFHTTPRequestOperations will now be directly used, which simplifies code. The general aim is to do more composition and less inheritance.

All this still used NSURLConnection. To use the fancy new NSURLSession, you use AFURLSessionManager. A challenge with NSURLSession is that there are many delegate methods involved. AFURLSessionManager provides block-based callbacks for delegate methods, including the default expected implementation. It also makes it really simple to access per-task upload/download progress callbacks, and convenience methods for session management.

AFHTTPClient is dead. It just did way too many tasks. It created requests, created and managed operations, monitored reachability, and even more. It’s exploded into many other classes which can be composed together, dramatically increasing flexibility. For example, reachability has moved to the independent AFNetworkReachabilityManager.

AFHTTPSessionManager

The new AFHTTPSessionManager comes with expanded and refactored HTTP convenience managers. They are almost identical between the session and request operation managers. JSON is now the sensible default. Generally, a lot of changes in AFNetworking 2.0 are based on what developers asked most often. Matt shows some code samples. and they’re really nice and simple.

AFSecurityPolicy is new as a separate class, and makes it really simple to change SSL validation settings and enable things like SSL pinning. AfNetworkReachabilityManager monitors reachability, based on a domain or IP address. However, Mattt warns us that this is often misused. You should always try, and handle failure gracefully. And wifi networks can have much worse performance than LTE.

On the UIKit side, you can use AFNetworking to show the status of calls with the network activity indicator, a UIActivityIndicatorView or a UIProgressView. UIImageView and UIButton image background loading are also still supported. There’s also some helpful features for UIWebView loading, which offers rather limited functionality by itself.

Real-time networking

Mattt thinks real-time applications will soon become very important to all apps. He strongly recommends using Rocket. This is based on two standards: server-sent events with SUBSCRIBE, a W3C draft, and JSON patch. The initial load is still performed with a normal document GET request. However, in addition to that, we can subscribe to an event stream which will send little patches on the initial document.

For the real-time backend, Mattt discusses Helios. It’s written in Ruby, but also easy to set up for non-Ruby programmers. It ties in really well with Core Data and makes it very fast to get backends up and running. It comes with many advanced users, and a friendly admin panel.

AFNetworking 2.0 has a migration guide and there’s an extensive post on NSHipster about the ideas and concepts behind the update. By the way, an NSHipster book is now available for pre-order.

Appportable: cross-platform iOS development - Zac Bowling

Zac starts by telling us AFNetworking, and all the fancy features Mattt told us about, can work on Android. And all the iOS developers in the room, are now also Android developers.

Zac works on Appportable, which is a product that aims to make Objective-C the new cross-platform language. They think it’s plain silly to have two teams, two products, two codebases. But also to use further layers of abstraction. Various featured and editor’s choices Android games are actually iOS apps ported with Apportable.

Apportable is not source code translation, does not generate Java, and does not emulate iOS. Also, you do not write any Java. It’s based on a mix of using Apple open source code, and writing their own reimplementations. It uses the same clang as Xcode. It supports ARC, C++11, assembly, and the full Objective-C literals and runtime API, with only 6 MB of overhead. They have an expansive collection of Android devices, total cost $ 240.000, to test for issues on obscure hardware as well.

It’s usually a good idea to make some small modifications for Android, which is supported with a little json configuration. User interfaces are still a big challenge, which is why they focused a lot on games initially. They implement their own interface anyways, so user interface differences are not an issue.

Apportable includes BridgeKit, which supports many Android API’s in your Objective-C code. It means you never have to touch Java if you don’t want to - but you can if you want. Internally, BridgeKit is used to back many of the system APIs. They have a very wide range of framework support.

Apportable has free and paid plans, and currently using UIKit required the $1000/year/seat version. However, Zac and his partner explain that the motivation for this is that the paid plans come with support, and that with the current version support will just be required for UIKit. The current implementation is very new and therefore still harder to use.

Cocos2D

Cocos2DX is an open source cross-platform 2D engine and is currently maintained by Apportable engineers.

Zac gives us a demo of Cocos2D’s SpriteBuilder, in which he tries to rebuild Angry Birds in as little time as possible. SpriteBuilder is a bit like Interface Builder, but then for game scenes. One of the newer features it to basically attach physics nodes directly to sprites. This makes it simple to, for example, apply gravity to sprites. The demo shows that the general style is very much like Interface Builder, and makes it very understandable to manipulate physics.

Wednesday, September 25, 2013

Appsterdam lunchtime lecture summary 3: VAT on Google and Apple app sales

This is part of my summary series of the Weekly Wednesday Lunchtime Lectures, an initiative to allow people in Appsterdam to talk about technology and share knowledge, allowing participants to receive training in public speaking. The lectures cover a wide range of topics related to making apps on any platform, from technical to non-technical including computer languages, modeling, testing, design, marketing, business philosophy, startups, strategizing, and more.

Today’s lecture was by Mr Victor Alting van Geusau and Mr Drs Wilbert Nieuwenhuizen, experienced professionals in management and VAT, talking about VAT for both Google and Apple App Store sales. I wrote about this before, but I never knew whether this was actually correct.

In the meantime, Appsterdam has published the video of this lecture as well.

I am not a tax lawyer. This summary was not checked afterwards by the speakers, and comes without guarantees.

European VAT rules for B2B

Wilbert explains that all app sales are electronic services for VAT. The main rule for is that all electronic services are taxed with 21% VAT in the Netherlands. However, there are three exceptions. For any exception, it is important that the records prove that this is applicable. The three possibilities are:

  • Dutch buyer and Dutch seller: 21% VAT in the Netherlands, added on the invoice.
  • EU buyer and Dutch seller: VAT can be reversed. The invoice must contain the VAT numbers, full name and address of involved parties. This is still included in the Dutch VAT filing of the seller, which also has to file an ICP-listing.
  • Non-EU buyer: no VAT is due, as the place of supply is outside the EU. But, you do have to prove the goods were supplied to a non-EU buyer.

European VAT rules for B2C

Again, when selling to Dutch consumers, you will need to charge 21% VAT in the Netherlands. But also if the consumer is in the EU - the only exception is if they are not in the EU.

Misunderstandings, in general, about VAT

The first misunderstanding is that app store sellers only trade with businesses. However, this is not the case: the portal might only be the intermediary, providing a portal. The portal sells a service to the app maker, the app maker directly supplies to the consumer for VAT purposes. In other words, as you are selling a service to an EU consumer, 21% VAT applies. Of course, if the consumer is outside the EU, no VAT is charged.

The second misconception is understanding what your turnover is. You may only get a certain percentage of the consumer sale price. But, the full price has been supplied to the consumer, so the VAT should be filed based on the 100% price. Separately, the portal would charge you for their cut of the sales price.

However, Apple is different

For Apple, the situation is different, due to the setup of their contracts. For Apple, you are actually trading with iTunes SARL in Luxembourg. That seems to match up completely with my earlier post about Apple and VAT.

Apple does have a list of countries for which they do not take care of taxation, like China or Russia. Dutch VAT does not apply in that case, but Chinese taxes might. It is important that you are able to prove that those sales were made outside the EU.

Google App Store

For Google, the contracts are different. In that case, you are directly selling to consumers, so the problems mentioned earlier all apply. Unfortunately, the information Google supplies you is not sufficient to file taxes correctly. It is unclear which taxes are paid, whether the buyer is a consumer or company, and if they are a company the VAT number is not provided. Google even provides incorrect advice on their website about taxation of sales.

If this is discovered in an audit, this can result in taxes being levied over the past five years for the full revenue. This may result in additional tax over the past five years over all sales achieved, with added penalties. This can be far higher than the plain 21% VAT. Wilbert had a client which had to file for bankruptcy after this happened.

The safe option for the Google App Store is to just file and pay 21% VAT on all sales. Victor thinks the same problem probably also applies to the Amazon app store.

Thursday, September 5, 2013

WeerAPI: a ridiculously simple weather API

This week I was talking about open data at the Netherlands Environmental Assessment Agency, together with several others from the Appsterdam community. In the process, I found myself needing some realtime very basic weather data: what is the current wind direction in the Netherlands, and what is the current wind speed?

This data is collected and published by the KNMI for 36 places in the Netherlands. However, to my surprise, there is no structured format available. All they have is an HTML table. So despite thousands of open datasets having been published in the Netherlands, there is no structured way to find the current temperature on Texel.

I found this so incredibly ridiculous, that I went ahead and built it myself: the WeerAPI. It scrapes the KNMI website and has a single call for now, for the current conditions at 36 measuring stations. I enrich the data with the wind direction in degrees, wind speed in Beaufort and a very rough geographical location of the measurement stations. It’s free to use and open source, but you’ll have to credit the KNMI if you use the data. Updates come in every 10 minutes.

Wednesday, July 3, 2013

Appsterdam lunchtime lecture summary 3: Code for Europe

This is part of my summary series of the Weekly Wednesday Lunchtime Lectures, an initiative to allow people in Appsterdam to talk about technology and share knowledge, allowing participants to receive training in public speaking. The lectures cover a wide range of topics related to making apps on any platform, from technical to non-technical including computer languages, modelling, testing, design, marketing, business philosophy, startups, strategizing, and more.

Today’s lecture was about Code for Europe, by Ohyoon Kwon, Giovanni Maggini and Piotr Steininger.

Ohyoon, Giovanni and Piotr have been working the last six months on three different challenges, for stadsdeel Oost, West and Zuid. Code for Europe aims to solve local civic challenges with temporary agile teams, in a way that makes the solutions reusable for other cities.

In Amsterdam, they were hosted by the Waag society and the city of Amsterdam. They worked with three runners from the municipality, that presented the challenges. The aim is to create a collaborative environment, between municipalities, local communities and developers & designers.

Stadsdeel West

In stadsdeel West, they worked on social problems in deprived neighborhoods. One of the problems for the stadsdeel is the lack of a common information system to support practices across the various organisations.

The fellows went outside, into the neighborhood, together with experts from the stadsdeel. This was followed up with brainstorming sessions, looking at all the existing tools, how they worked and how they could be used, and then making a plan for further steps. The concept was first tested using existing software, and they made paper prototypes first to make sure their implementation will fit the needs.

Stadsdeel Oost

In stadsdeel Oost, the Indische buurt has a very high demand for community space, as the community there is very active. The fellows built a web application that allows citizens to manage facilities and reserve rooms.

The Indische buurt has four active community clusters, with about 22.000 citizens involved. The existing booking system for shared spaces consisted of flyers and paper calendars, posted in the shared facilities. Management was done through post-its and phone calls. The system was unreliable and unfriendly.

The fellows identified immediate needs, prototypes the system together with the user, and this resulted in an open source social neighborhood platform, which currently features room booking and room management.

A challenge in projects like this, is to ensure long-lasting maintenance of the project. It’s open source, so other people can already contribute. They’d like to see future fellows to continue the project. Next steps are to complete the booking portion, allow publishing of potential initiatives, and organizing a workshop with the people from the community.

Stadsdeel Zuid

In stadsdeel Zuid, the fellows worked to attract more tourists. They try to help the tourist crowd find the less commonly visited areas, outside the city center, and particularly in stadsdeel Zuid.

They built Take a Hike, which turns it into a fun and engaging game experience. It’s an offline playable scavenger hunt. They started with manned checkpoints and QR codes, but both had practical difficulties. Currently, it’s based on the compass. To validate the concept, they spoke to many people within the city and showed paper prototypes to people in Museumplein. It was a real challenge to finish the app in time for the June 22nd deadline, but they managed to publish it on time. The app is open source as well.

The app runs on a custom CMS backend which makes it easy to maintain the routes for the city. They also collect check-ins, to see how well the app is used and which places are most popular. The backend is built on Ruby on Rails and deployed on Heroku. Both iOS and Android are native apps, which was chosen to create a smooth experience, support proper multithreading and have simple sensor access.

On the launch day of the app, it rained most of the day. Few people were on the street, and in the end they decided to do a self-test. This still brought up several problems, like having markers inside closed buildings and problems with the Dutch version.

Tuesday, June 18, 2013

A basic guide to when and how to deploy HTTPS

Many web developers know about SSL, but it is very common to see it only partially deployed, or not deployed where it should be. This basic guide on when and how to deploy SSL will help you avoid the most common mistakes.

Key points

  • If you have any kind of confidential information, or if you have logins, even if they are just for admins, you should deploy HTTPS. The risks are not theoretical.
  • Never deploy HTTPS partially: use it for all content, or many risks are left open, like the interception of session IDs, which is almost as good as passwords.
  • When you deploy HTTPS, enforce all requests to be served over HTTPS, by redirecting any plain HTTP requests to HTTPS URLs.
  • Enable strict transport security (HSTS) to further reduce the opportunity for attacks.
  • Set the secure flag on your cookies, like the session cookie, to make sure they don’t leak out through plain HTTP requests.

What is HTTPS?

HTTPS refers to the layering HTTP on top of SSL/TLS. The end result is that the HTTP traffic, including requested URLs, result pages, cookies, media and anything else sent over HTTP, is encrypted. Someone interfering with the connection can neither listen in on traffic, nor change it. In addition to simply encrypting, the identity of the remote server is verified: after all, having an encrypted connection is a lot less useful if you don’t know who’s at the other end. The end result is that it becomes incredibly difficult to intercept the traffic. It might still be possible to know which websites a user is visiting, but no more than that.

When and why should I deploy HTTPS?

You should deploy HTTPS whenever your website has any kind of non-public information. That includes any website that has logins - after all, if it were public information, it would not need a login. It also includes logins only used by administrators, like in your typical Wordpress website.

You should deploy HTTPS because without it, even someone doing passive eavesdropping, i.e. just listening to the network traffic and not manipulating it, can read along with all HTTP traffic, including any passwords or authentication tokens.

This is not a theoretical attack. I have done this myself (with permission) several times - this is particularly easy on public hotspots. Public hotspots typically apply no wifi encryption, which makes it trivial to eavesdrop on all traffic. This is a very common setup in bars, hotels, trains, and other public places. In other words, if your users sometimes use your website from a public hotspot, and you do not use HTTPS, anyone in the vicinity can listen in on all their traffic. This isn’t the only case where eavesdropping might happen, but it is a very easy one.

What if I just use HTTPS for my login page?

No. Using HTTPS just for the login page will prevent your user’s passwords from being eavesdropped, but this is only part of the problem.

First of all, the less HTTPS on your website, the easier it becomes to do active interception: your login link might point to an HTTPS URL, but if I change that link before the user clicks on it, it will not help you. But, using HTTPS partially also leaves risks open for passive interception.

Verifying a username and password is only one part of authenticating users on the web: we also need to remember that a particular user was authenticated, and which account they authenticated with. The most common method is session cookies. Typically, this means the browser stores a long random string, the session ID, in a cookie. PHP for example, uses the PHPSESSID cookie for this. A database on the server side then knows that that random string belongs to a particular session, in which a particular user authenticated himself. If I somehow acquire the session ID of your session, after you login, I acquire all permissions you have: almost as good as having your password.

Knowing this risk, the session ID is very long and random, and has a limited lifetime, meaning I can’t just guess it: this is what makes it safe enough. But, due to the way cookies work, the browser includes the cookie in every request it makes to your website. So even long after login, every page I request, even if it is usually public, will result in my session cookie being sent by the browser. And if someone is eavesdropping at that point, they can still compromise my account.

The same can happen when you only place the administrator part of your website behind SSL: when you log in and later visit the non-SSL public part, the browser will still be sending the session cookie.

In short: as session cookies, which allow access to the user’s account, are sent in every request, simply securing the login page is absolutely insufficient.

How do I enable HTTPS properly?

Enforce HTTPS usage

Some websites buy an SSL certificate, configure it on their web server, and assume they’re done. But that just means you enabled the option of HTTPS - which users are unlikely to notice. To make sure everyone benefits from your HTTPS support, you should redirect all requests that come in on HTTP, to HTTPS. That means any user visiting your site will automatically be switched over to HTTPS, and from that point on their traffic is secure.

This still leaves a small gap: the first time the user makes a request to your website, they will use plain HTTP, and they may already transmit confidential information at that time. It also leaves a small man-in-the-middle hole open.

Strict transport security

For further tightening, enable HTTP strict transport security (HSTS). This is a special header that can be sent by the server, which indicates: for a defined time period, you must not access this website over plain HTTP, or access it over HTTPS when it has a dodgy certificate. Optionally, subdomains can be included as well.

It’s a simple server header, and trivial to configure. Note though that there is no way to revert the setting before the max-age has expired, so don’t make it too long. You use HSTS next to an HTTPS redirect, not in place of it.

Secure cookies

Cookies, including the session cookie, have an optional flag called secure. This basically means: “never send this cookie over a plain HTTP connection”. Enable this flag on your cookies, and they will not be sent with the HTTP request the browser does initially - but only once the connection switched to HTTPS, and can no longer be eavesdropped.

Can I just deploy SSL for authenticated users?

No. Once you’ve followed the guidelines above, at the moment a user makes a plain HTTP connection, you do not know whether they are authenticated. That’s the whole point: they should not transmit any secret information, like their session cookie, until they are on SSL.

Although I can imagine some ways to work around this, they might break at some point. As the cost of SSL is really quite low nowadays, it’s not worth it.

Monday, June 17, 2013

A dismal tale of gathering new open data

In September 2012, I started work on Bike Like a Local, my app for biking tourists in Amsterdam. It uses open government data, and won the 2nd prize in mobility at Apps for Amsterdam 2013.

In Amsterdam, many tasks, like bike parking, are delegated to local districts, the stadsdelen. At the time, I had bike parking data from DIVV and stadsdeel Centrum, but none from the other six stadsdelen. As a small experiment, I decided not to ask any of my contacts, but to simply ask the stadsdelen for the data through their website, the way someone would do it if they were new to open data, and see what would happen.

I started by emailing every stadsdeel on the general address listed on their website, or their contact form. I told them I was working on an app for Apps for Amsterdam, that stadsdeel Centrum had already published open bike parking data, and asked them to publish their records as open data as well.

I am rather stunned by the results:

  • The first to send either usable data or inform me they have no record of this, was Stadsdeel Oost, at just under three months from my initial request.
  • The first to send usable data was stadsdeel Nieuw-West, after 6.5 months from my initial request.
  • More than half of the stadsdelen sent no reply at all until I called my mail a WOB-request, which requires them to pay a fine if they persist in not replying. Considering this difference, I will now say “WOB-verzoek” in any question I ever ask the government again.
  • The slowest was stadsdeel Zuidoost, which took 9 months from my initial mail to inform me they have no record of bike parking facilities.

Stadsdeel Nieuw-West

Time until first human reply: 136 days (4.5 months)
Time until receiving requested data: 195 days (6.5 months)

Not seeing a reply on my mail from September 8, 2012, I emailed Nieuw-West again on November 29 (82 days), saying I had not heard from them, and that Bike Like a Local won a prize in Apps for Amsterdam in the mean time.

On December 3 (86 days after my first mail), they informed me that they had received my ‘letter’ of November 29, by writing this in a letter, printing it out on letterhead paper, scanning it, and sending me a PDF.

On January 22 (136 days) I got the first human reply. They apologised for taking so long and were quite enthusiastic about Bike Like a Local. On my suggestion, as their office is nearby, we arranged a meeting, and had a nice chat about my app, biking and tourists in this stadsdeel.

Finally, on March 22 (195 days) I received a copy of their bike parking data.

Stadsdeel West

Time until first human reply: 3 days
Time until receiving usable data: 249 days (8 months)

Stadsdeel West was the only one to respond to my initial email, responding after just three days with a human reply. We mailed back and forth a bit on what exactly I wanted to do.

However, it took until April 10 (214 days) before I received the data. Unfortunately, it was in the form of a giant PDF file, which was impossible to work or integrate with.

On May 15 (249 days) I received an Excel file of their bike parking facilities.

Wet openbaarheid bestuur

Before going into the story with the other stadsdelen, I should explain a bit of background: the wet openbaarheid bestuur. This law roughly dictates that anyone can ask the government for information they have, and that they may not withhold it without good reason. If a government does not reply to a WOB-request, they may have to pay a fine up to € 1200 to the requester. They have four weeks to reply, after that the requester will need to send a reminder, and two weeks after that the fine will start, starting at € 20 per day.

Stadsdeel Oost

Time until reply on first request: 86 days (almost 3 months)
Time until reply on second request: 30 days (1 month)

Stadsdeel Oost sent their first reply on December 3 (86 days), translated:

As you say in your email, you have not received a reply from stadsdeel Oost. The most important reason for this is that we do not have information about the location and amount of bike parking facilities.

On April 6, I contacted them again, on their general mail address, to ask whether they had designated any areas where bicycles are removed if they are not parked in a designated facility. I wanted to know this, as DIVV had told me this was a real issue for tourists.

Having received no reply, on May 2 (26 days) I asked the same question, making it clear that this was a WOB-request. I received a full reply on May 6 (30 days).

Stadsdeel Noord

Time until first human reply: 227 days (7.5 months)
Time until informing data does not exist: 227 days (7.5 months)

Stadsdeel Noord failed to respond to my mails from September 8 and November 29. On April 6 (210 days), I sent them a WOB-request, with the additional question: “have you designated any areas where bikes will be removed if they are parked outside of designated facilities?”.

On April 23 (227 days), they replied by letter in the same manner as Nieuw-West initially did: printed on letterhead paper, scanned, and then attached the PDF to an e-mail. The reason for this process is unknown to me, but at least it was a usable reply.

The letter stated that they were aware of fietsnietjes existing in three locations. It is possible that there are others, but they have no record of those. They have not designated any areas where bikes must be parked in facilities.

Stadsdeel Zuid

Time until first human reply: 219 days (7 months)
Time until receiving all requested data: 264 days (almost 9 months)

Stadsdeel Zuid failed to respond to my mails from September 8 and November 29. On March 26 (199 days) Jasper, who was helping me as part of the open for business program, mailed a contact in stadsdeel Zuid directly. There was no reply.

On April 15 (219 days), I sent them a WOB-request, adding the question for areas with strict bike parking enforcement, like I did for stadsdeel Noord. The same day they forwarded me their records of the strict parking enforcement areas. This was the first human reply. They informed me they were still working on the bike parking records.

On May 14 (248 days), I sent them a reminder as the four week deadline had expired. Finally, on May 30 (264 days), I received a copy of their bike parking records. Coincidentally, from the same person Jasper had mailed 65 days earlier.

Stadsdeel Zuidoost

Time until first human reply: 230 days (7.5 months)
Time until informing data does not exist: 275 days (9 months)

Stadsdeel Zuidoost failed to respond to my mails from September 8 and November 29. On April 6 (210 days), I sent them a WOB-request, adding the question for areas with strict bike parking enforcement, like I did for stadsdeel Noord.

On April 26 (230 days), they replied that stadsdeel Zuidoost does not accept WOB-requests by email, and asked me to send it by letter. This is a limitation allowed by the law, but I’m failing to understand how it took 20 days to come to this reply. On April 29 (233 days), I sent them the WOB-request by letter.

On May 28 (262 days), I sent them a reminder as the four week deadline had expired. Finally, on June 10 (275 days) I received a letter saying they do not have a record of either bike parking or areas with strict bike parking enforcement.

Conclusion

Other than gathering data to use in Bike Like a Local, my goal was to discover how easy it would be for someone without contacts in the open data scene, to get governments to provide new open data. The results are clear: it is very difficult and extremely slow.

There is absolutely no way someone building an independent business with open data based products, which is a main focus of the Dutch open data program, can wait this long every time they have a new question. Also, as far as I know, none of the data provided to me made it onto any of the open data portals. The problem is clear; but how do we improve the situation?

Wednesday, May 29, 2013

Appsterdam lunchtime lecture summary 2: Appsolute Value

This is part of my summary series of the Weekly Wednesday Lunchtime Lectures, an initiative to allow people in Appsterdam to talk about technology and share knowledge, allowing participants to receive training in public speaking. The lectures cover a wide range of topics related to making apps on any platform, from technical to non-technical including computer languages, modelling, testing, design, marketing, business philosophy, startups, strategizing, and more.

Today’s lecture was Appsolute Value, by Michael van den Berg.

Appsolute value

Appsolute value is an agency that focuses on multi-platform app development. Michael’s background is in large organisations, so he has seen how these organisations handle the challenges of the rise of mobile. For large organisations, it’s a big disruption of traditional IT - mobile technology is not just another channel.

Michael helps these organisations to develop business apps. In his view, what really changed with the rise of mobile, is the consumer, and what they want. They have more control, know what they want, and how and when they want it. In Michael’s view, the customer is no longer king, he is a dictator. Traditionally, companies have provided information and software in a controlled way, the way which the company thought was best for the customer, but now this is turning around.

Initially, Michael mostly saw companies work on pure mobile websites. This then moved towards native apps, because organisations wanted to be visible on platforms like the App Store. The native apps and mobile websites initially cost business millions - not so much in making the app itself, but in integrating them with their old large IT systems. Michael says it was also very expensive to create a native app for all of iOS, Android, Blackberry and Windows Phone. Although iOS and Android take most of the market, organisations he encounters often want all platforms.

In enterprise apps, BYOD is increasing the pressure on large organisations to allow employees to do their work with apps. To reduce cost, they move towards multiplatform development and hybrid web-native. Especially in the US, Michael sees multiplatform development being really embraced, like PhoneGap. This also dramatically reduces the post-implementation cost in his experience, for maintenance, experience and security.

Michael shows a graph from Forrester, which says that today, much of the effort in mobile is in building the initial applications. However, in a few years Forrester expects a majority of effort to go into re-inventing the business processes and backend systems.

Main requirements in mobile technology are security, performance, multi-OS and integration. In security, it’s important to be able to secure data against both employees and customers. Performance is always high priority for Michael’s clients, and this is where they still see issues with HTML5, but not everyone in the audience agrees and there is some active discussion.

Business process appification

Customers can typically use a whole range of apps, and will slowly move more from the web to using apps. Michael challenges use to think of how enterprises can use our apps. He uses the example of iKringloop, which sold their app to the city of Amsterdam. If we can make the lifes of employees of enterprises better, the enterprises might be interested in our app. People in organisations often simply don’t know how mobile can help them, until it is shown to them.

Tuesday, May 28, 2013

CitySDK @ WaagOpenSpace

CitySDK offers an API which makes it easy to build digital services that use open data. At CitySDK @ WaagOpenSpace, developers of the API talk about the API and what they plan to do with it. This is my summary of the evening.

Job Spierings: CitySDK introduction

Job Spierings started the introduction by reiterating that open data for the CitySDK is important for transparency; releasing social and commercial value; and participation and engagement. However, many open data users run into the familiar issues: different formats, sometimes proprietary; differing refresh rates in data; potential privacy issues, like cellphone numbers for workers in road works data; and legal issues caused by datasets having many different owners with different interests.

CitySDK aims to solve these issues for developers. In Tim Berners-Lee stars, they aim to be between four and five stars. CitySDK does not just take data from governments, but also uses crowd sourced data, to create what they call: crowd enhanced data.

THe project is set up to become a one-stop-shop for developers, with pan-european access. Current cities are Lisbon, Barcelona, Manchester, Amsterdam, Rome, Helsinki, Lamia and Istanbul. The whole project is also open source. Not all datasets are available in all cities. Maps, based on OpenStreetMap, are available in all cities, public transport is available in most.

They’ve built a few demo applications, like 'now', which shows departure times at nearby public transport stops. This has been done before, but unique about this app is that it works in cities in various European countries, while using only a single API. They are working on the Open City Dashboard, which can be used by municipalities to communicate about the state of the city to citizens.

Bert Spaan: Meet the City SDK API

Bert Spaan takes us into the CitySDK platform. He’s one of the backend developers of the CitySDK mobility API, and uses the example of Amsterdam Central Station. There are many datasets available today, like OpenStreetMap, the NS realtime departure times, and even Foursquare. But they are all very difficult to link. By using common identifiers, as Tim Berners-Lee intended, data can be linked.

CitySDK takes these identifiers from OpenStreetMap, using it as a geographic base layer. They then map other datasets, like GVB departure times, to these OSM nodes. OSM works well for this, because it’s very complete. Bert recently went to Manchester, and the same principles apply there.

Linking datasets is incredibly hard. There are many different spellings and names. Mapping on geographical proximity is also not sufficient. Some of the processing in CitySDK is automatic, but they still do some handwork. An example for this is the DIVV realtime travel times dataset. This data contains coordinates of the roads for which the speed is measured, but those coordinates do not exactly overlap with OpenStreetMap roads. They’ve developed algorithms to map these quite well.

Bert gives us a demo of the map viewer which is a really nice way to explore CitySDK data. I’m quite impressed: a nice example is linking the OpenStreetMap object of Artis to the ArtsHolland API, so that you can go directly and reliably from the OSM object to the next 'Knuffelen met kleine dieren'.

Laurens Schuurkamp: CitySDK globe

Laurens is a frontend developer at the Waag society. A year ago, he built a tool that aggregated all 32 open datasets available for Amsterdam at that time. But the biggest problem was that they could not be combined.

The built a new wonderful visualisation now, the CitySDK globe (requires WebGL). Again, the most impressive part is that most data they can show for Amsterdam, is available for Manchester and other cities as well.

Roel Obdam

Roel is a student that works on combining external datasets with the CitySDK. For example, one could take all tourism nodes from OSM, and then run another query against a different API that adds data about museums - from an API that is not included in CitySDK itself.

They use DBPedia for matching. Unfortunately, it’s not entirely reliable at this time. After querying both datasets, a user can define which attributes should be linked to each other. It’s not specific to CitySDK: it will be able to match any two datasets to each other.

Jonathan Carter: Towards a multimodal API

Jonathan from Glimworm IT talks about “Towards a multimodal API”, a report for the Department of Infrastructure, Traffic and Transportation of the city of Amsterdam, which Glimworm wrote together with myself and Braxwell.

The purpose of the project was to see how possible it currently is to make the perfect multimodal API, which areas need improvement, and what DIVV can do. We focused on route planning and real time monitoring of planned routes. One of the results is a proof of concept which does multimodal planning with different data sources. For example, you could go from Woerden to the Rijksmuseum by car all the way, bike all the way, OV all the way, or bike to a station and then use OV, or take a car to any of the park and rides and then OV, and so on.

The biggest challenge is prediction. If the planner sends someone to a park and ride, because that’s good for both the traveller and the city, and the park and ride is full, the traveller will be very disappointed, and might never consider this again. This problem extends to pretty much all data: realtime info focuses on the state right now, but there are no good predictions for the future. This is especially critical when a traveller has to switch between modes of transport.

Rein van Oost: Expected Time of Arrival

Rein presents “Expected Time of Arrival”, their idea for an app on the CitySDK. They’ve been making open data apps since 2008. They were asked to come up with a practical application of the CitySDK data.

Their idea, “Estimated Time of Arrival”, uses the CitySDK mobility data. The concept is to share your current position, direction and destination with others. The app estimates the time of arrival at the destination, so other people don’t have to ask you when you will be home or whether you will be on time.