Device emulator: Still work in progress

In a previous post I explained the reasons why all devices should have their code source available under a license that at least permits to modify it. For devices that are meant to be integrated with third party software, an emulator should also be available.

I see this problem all the time: VoIP vendors obviously want their phones integrated in whatever software VoIP provider are deploying, but do not provide any reasonable solution to repetitively test the integration of these devices. It starts at development time: A developer can certainly get on its desk the minimal 3 phones that are needed to test most call scenarios, but it gets quickly annoying to have to repetitively pick up and answer these phones to test the code. Even if all the annoyances that come with this did not matter (noise, cost, space on desk etc…), we live in a world were automatic regression testing is (or should) be mandatory, and there is not much of it that can be done when using a real phones. In the beginning of VoIP we only had VoIP adapters (into which you plug a Plain Old Telephone Service – POTS), and it was relatively easy to use a voice modem in place of the POTS to build an emulator, but now these kind of adapters are only used for fax machines and all the other devices are VoIP phones. To automatize the usage of these phone we need a way to press buttons, inject and extract audio and sometimes video streams without any user intervention. Some people develops hardware solutions based on various micro-controllers, Arduino and others, but the point is that providing a reliable emulator should be the responsibility of the hardware vendor.

And these days it is not as difficult as it was in the past to do so. More and more embedded designs are built around Embedded Linux which has its own emulator, QEMU. For example all the Android devices provided by Google can be emulated this way, making it easy to repetitively test an application for these devices. A few years back I used this to show a demo that involved three Android phones, all simulated in a laptop and real devices were only used to demo the quality of the audio. That made the demo easier to follow, and far easier to set up.

But the point of an emulator is that it should be running the exact same software than the real device, so the first thing that an vendor must do is to add its hardware into the QEMU emulator. And that is unfortunately a task that can be easily done only by the vendor, as a lot of the times the datasheet for the hardware is only available under NDA. For the Nephelion project – which will be distributed with emulator and source code for its device – we were not able to build an emulator using the same exact binary because of this reason. We have plans to switch to hardware that will be easier to emulate, but there is still another major issue that will prevent us to get an emulator that is 100% identical to the real hardware:

The Nephelion device is a USB gadget, but QEMU does not permit to connect the gadget inside the guest and make it visible in the host as if is was a real device. There was some experimental patches doing this a long time ago around the Openmoko project, but it seems that these patches were never integrated into QEMU.

We are using usbip, which makes our device emulator again different from the real thing as the code inside the device now has to detect that it is running on an emulator so it can start the usbip daemon and bind the dummy hcd to it. And that’s without even talking about the fact that usbip does not yet support USB 3.0.

So for now we have an emulator that is probably good enough for development and testing but there is too many differences with the real device to consider it a complete replacement that can detect the same range of problems.

A better analogy in Oracle vs Google: API is jargon

As a software developer I know that in Oracle vs Google, Google and Judge Alsup are right and Oracle and the White House are wrong. But perhaps we need a better analogy to explain to non-developers that APIs are not copyrightable.

A word of caution: I am definitively not a lawyer, and you will not find even the beginning of a legal argument there. This is just an attempt of explaining by analogy what I think APIs are, not only as someone who make a living from using them, implementing them and when forced to do so, inventing new one, but also as someone who will greatly suffer if the minefield that is software development is made even more dangerous if API are deemed copyrightable.

When I read some of the arguments from Oracle and in particular the book chapter titles analogy, I felt it contrived and only superficially explaining what APIs are – the analogy breaks down quickly when you try to look at it more deeply. So I came up with something a little bit better, which is that an API is a jargon.

Jargon, as defined by The Concise Oxford Dictionary is “words or expressions used by a particular profession or group that are difficult for others to understand.” Note that it is not a set of new words that are specific to a group or profession, but words that are used with a different meaning for the purpose of improving communication of ideas inside that group or profession.

As an example taken from the Computing group or profession, the word “protocol” as a very specific meaning. In the same dictionary it has this separate definition: “Computing; a set of rules governing the exchange or transmission of data between devices.” Note also that even inside a group or profession, sub-groups can develop their own jargon, making it difficult to communicate to each other, but greatly improving it inside these sub-groups: The IETF and the ITU can be a good example of that.

An API is very close to that: Ordinary words or arrangements of words (expressions) that have a different meaning. If we take the java.net package as an example, each of the word used (can be verbs or nouns, etc…) have a very different sense than is used generally: Socket.isConnected() does not mean to test that an electrical socket has a plug into it, it means something different in this jargon (namely that an identifier is associated with some state resulting of a successful protocol exchange).

And if we want to push the analogy a bit further, one can imagine a dictionary that explain each of the words and expressions used by a jargon (The New Hacker’s Dictionary is a good example of that). The definition of each word or expression would be copyrightable, and is equivalent to the implementation of an API. Someone can write a different dictionary for the same jargon, but could not copy word for word the definitions for the first one – that would be the equivalent of Google, taking the jargon known as the standard Java library (developed by Sun and many others through the JCP), and writing a new implementation.

As I said there is not a legal argument there, but the question that can now be asked is this: Although it is perfectly clear that in a jargon dictionary the text of the definitions is copyrighted, is the list of words and expressions defined by this dictionary also copyrighted? I believe that the same answer is applicable to APIs.

The better-than-open-source pledge

The choice of the license (free, open, or other) under which a piece of software is released is up to its copyright owner. But there is nothing that prevents a licensee to go beyond what was asked by the licensor.

A long time ago I was involved in discussions with a device vendor and my main requirement, as the person who had to make it work with our software, was to get the firmware source code under a license that would permit us to modify it and to install it in the devices that will be used by our customers. That seems the most obvious thing in the world: If there is a bug in the firmware, I want to be able to fix it immediately and under my own responsibility. Even if I was clear that the license I wanted would not permit any form of redistribution outside the devices that would have been paid for, this request was rejected. I never knew the real reasons but in my experience there is two main reasons companies refuse to share their code source: They do not own it in the first place, or they are ashamed of showing the source code to other developers.

Anyway, I still firmly believe that it should be illegal to sell a device without the source code under the kind of very restrictive license described above (an emulator should also be provided by the vendor, but that will be a discussion for another post). Now that I am working on my own device(see the Nephelion Project for more details), I plan to put my money where my mouth is and release all the source code under such licensing terms, so my future customers can do with it exactly what I want to do with the devices I buy. The closest license I could find is the AGPL3, so this is what I will be using.

As a producer of software, I highly value my own work, so if on the one hand I want my customers to be able to fix bugs and customize the software to their needs, on the other hand I still want some kind of payment for my work on the software. The AGPL3 license provides this as a form of “pay it forward” by forcing anybody that modified my software, for any reason, to publicly release these modifications under the same licensing terms.

Open Source licenses (as opposed to Free Software license) do not force this sharing. This is why consumers of software like these licenses – as consumers of software, we undervalue them because we do not want to spend any money or effort on it – or let investors know that we did not really write any new software.

What I think is wrong with this is that I should not ask others to release their modifications to my software and at the same time do not release my own modifications to software I use. As I said at the beginning, the license is the choice of the copyright owner, and only the copyright owner, and I am not saying that they should change this choice. But there is absolutely nothing that prevents me to treat a software released under an open source license as if it was released under a free software license, i.e. to publicly publish my modifications even if there is nothing in the license that force me to do so.

And this is why I pledge that, for any software that I will release under any FOSS license, I will treat all the modifications I made to all the dependent software used by that software as if they were licensed under the AGPL3 license, and release them publicly under a license that is compatible with the license of the dependent software.

What the Internet is doing with your packets

When working with Internet transports, either for analyzing them or designing a new one, it is important to keep in mind that the Internet can do a lot of strange things with your packets.

Disclaimer: I did a similar presentation at my former employer but I did not keep anything, notes or slides, when I left so this blog entry is a re-creation from memory.

For this discussion, we will reason about a very simplified Internet that still have all the properties of the real thing. Our first simplification will be that there is only 10 endpoints in this network, numbered from 0 to 9 (the real Internet has 6e57 possible addresses if IPv4, IPv6 and NATs are taken in account). Our second simplification will be that we can send only 26 different messages between these endpoints, from a to z (the real Internet can exchange 10e3612 different messages). With these limitations, we can now use a simple notation based on regular expressions (regex) to express sending a message from endpoint 0 to endpoint 9:

messages -->  regex

Here “messages” is a list of messages that are sent by endpoint 0 one after the other, and “regex” is a description of what endpoint 9 receives from endpoint 0.

So the question we are trying to answer here is “what regular expression can fully describe what endpoint 9 can receive if endpoint 0 sends message a, then message b to it?”

A first description could look like this:

ab --> ab

Meaning that if endpoint 0 sends message a then message b, then endpoint 9 will receive first message a and then message b.

Obviously that is not true as not only there is no guarantee that a message sent will be received, but dropping messages is one of the fundamental mechanism used by the Internet to ask the sender to reduce its sending rate. So a second description can take this in account:

ab --> a?b?

But there is also no guarantee that when sending two messages in succession they will arrive in the same order. The reason for this is that two messages can take different paths through routers, and so the first message can be delayed enough to arrive after the second one. Let’s try a better description:

ab --> (a|b|ab|ba)?

But if the Internet can drop a message, the Internet can also duplicate it. This is a rare condition that can happen for different technical reasons, but the fact is that one should be ready to receive multiple identical messages:

ab --> (a|b)*

Now that seems to cover everything: Messages can be delayed, dropped or duplicated.

The reality is that the complete answer to our question is this:

ab --> (a|b|.)*

What is means is that, in addition of messing with messages sent by endpoint 0, the Internet can make endpoint 9 receive messages from endpoint 0 that endpoint 0 never sent.

This is one of the most often forgotten rule when analyzing or designing a transport, which is that it is really easy to fake a message as originating by another endpoint.

Nothing in the Internet Protocol can be used to detect any of these conditions, so it is up to the transport built on top to take care of these. Acknowledgments, timeout and retransmissions can take care of the lost messages; sequence number and bufferization can take care of the delays and duplications; and some kind of cryptographic signature can permit to detect fake message.

Improving standard compliance with transclusion

Inserting fragments of a standard specification – IETF RFC or other – as comments in the source code that implement it seems to be a simple way to assure a good conformance. Unfortunately doing so can create legal issues if not done carefully.

These days I am spending a lot of time implementing IETF’s (and other SDO’s) protocols for the Nephelion Project. I am no stranger to network protocol implementation, as this is mostly what I am doing since more than 25 years, but this time the very specific code that is needed for this project is required to be as close as possible to the standard. So I am constantly referring to the text inside the various RFCs to verify that my code is conformant. Obviously copying the text fragments as comments would greatly simplify the development and at the end make the translation between the English text that is used to describe the protocol and the programming language I use to implement it a lot more faithful.

At this point I should insert the usual IANAL but, at least to my understanding, that is something that is simply not possible. My intent is to someday release this code under a Free Software license, but even if it was not the case, I believe that all software should be built with the goal of licensing it in the future, this license being a commercial one or a FOSS one. The issue here is that the RFCs are copyrighted and that modifying is simply not permitted by the IETF Trust and, in my opinion, rightly so as a standard that anybody can freely modify is not much of a standard. But publishing my code under a FOSS license would give everyone the right to modify it (under the terms of the license), and that would apply too to the RFC fragments inserted in the source code.

So the solution I use to at the same time keep the specification and the implementation as close as possible and to not have to worry about code licensing is to use transclusion. Here’s an example of comment in the code source for the UDP module:

% @transclude file:///home/petithug/rsync/ietf/rfc/rfc768.txt#line=48,51

The syntax follows the Javadoc (and Pldoc, and Scaladoc) syntax. The @transclude tag indicates that the text referenced by the URL must be inserted in the source code but only when displayed in a text editor. Here’s what the same code looks like when loaded in VIM (the fragment for RFC 768 is reproduced here under fair use):

% @transclude file:///home/petithug/rsync/ietf/rfc/rfc768.txt#line=48,51
% {@transcluded
% Source Port is an optional field, when meaningful, it indicates the port
% of the sending process, and may be assumed to be the port to which a
% reply should be addressed in the absence of any other information. If
% not used, a value of zero is inserted.
% @@@}

(I chose this example because, until few days ago, I did not even know that using a UDP source port of 0 was conformant).

The @transcluded inline tag is dynamically generated by a VIM plugin but this tag will never appear anywhere else than in the VIM buffer, even after saving it to the disk. The fragment syntax is from RFC 5147, and permits to select the lines that must be copied (An RFC will never changes, so hardcoding the line number in the code source cannot break in the future).

The plugin can be installed from my Debian repository with the usual “apt-get install vim-transclusion”. The plugin is kind of rough for now: only the #line=<from>,<to> syntax is supported, hardcoding the full path is not very friendly, curl is required, etc… But that is still a huge improvement over having to keep specification and implementation separate.

What I did these last 4 months: Project Nephelion

Between the end of my employment at Jive Communications and the Christmas break I did a lot of research on an idea I got four years ago after an especially frustrating network protocol debugging session. I put some notes on my lab book, wrote some code, assigned a codename (Project Nephelion) and forgot about it until I had, at Jive, yet another frustrating debugging session – although for completely different reasons. So I started analyzing everything that make this kind of debugging difficult and, after talking with friends that are in the same line of work, I decided to spend few months looking for solutions to improve the situation and see if I can build a business around it.

During these 4 months I designed the architecture for a device and its associated software and, even if I was not able to get answers to all my questions, at the beginning of 2015 I decided to build a prototype of the device and work on a first version of the software. The goal is to bring the prototype with me to the IETF meeting in Dallas (March 22 – 27, 2015) to show it to fellow protocol designers and implementers to gather feedback.

There is still a lot to do before I can explain in details what this product will do (I still have to fill at least one more patent application) but I just posted a description of the problems that this product may (or may not) solve on the website of the project:

The art of debugging network protocol problems

Meanwhile, and as a teaser, here’s the list of books that I bought and read specifically for this project during the exploratory phase back in 2014:

An Undecidable Problem in SIP

A few years back, one of my colleague at 8×8 made an interesting suggestion. We were at the time discussing a recurring problem of SIP call loops in the Packet8 service and his suggestion was to write a program that would analyze all the various forwarding rules installed in the system, and simply remove those that were the cause for the loops. I wish I remember what I responded at the time and that I had the insight to say that writing such program is, well, impossible, but that’s probably not what happened.

Now “impossible” is a very strong word and I must admit that I have spent most of my career in computers thinking that nothing was impossible to code and that, worst case, I just needed a better computer. It just happens that there is a whole class of problems that are impossible to code – not just difficult to code, or not that the best possible code will take forever to return an answer without using a quantum computer, but that it is impossible to write a program that always return correct answers for these problems. The SIP call loop problem is one of them.

To make sense of this we will need to define a lot of concepts, so lets start with a SIP call. SIP is, for better or for worse, the major session establishment protocol for VoIP. A SIP client (e.g. a phone or a PSTN gateway) establishes a call more or less like how a Web browser contacts a web site, with the difference that in the case of SIP the relationship with the server lasts for the duration of the call. One of the (deeply broken) features of SIP is that there may exist intermediate network elements called SIP Proxies that can, during the establishment of a call, redirect the call to a new destination. In this case the server, instead of answering the call itself, creates a new connection to a different destination which can itself creates a new connection and so on. This mechanism obviously can create loops, especially when the forwarding rules in these servers are under the control of the end-user – which was the problem we encountered at 8×8. There is many mechanisms a SIP proxy can use to decide if, when, and where to redirect a call, but for the sake of simplicity we will consider only a subset of all these possibilities, and assume that all the SIP proxies involved use the Call Processing Language (CPL), a standard XML-based language designed to permit end-users to control, among other things, how a SIP proxy forwards calls on their behalf. Jive’s users can think of a CPL script as equivalent to the dialplan editor, but for just one user and with the additional constraint that it is not possible to create loops inside a CPL script.

Here an example of CPL script taken from the standard (RFC 3880) that will forward incoming calls to the user’s voicemail if she does not answer the calls on her computer:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <location url="sip:jones@jonespc.example.com">
      <proxy>
        <redirection>
          <redirect/>
        </redirection>
        <default>
          <location url="sip:jones@voicemail.example.com">
            <proxy/>
          </location>
        </default>
      </proxy>
    </location>
  </incoming>
</cpl>

One interesting feature of CPL is that it is extensible, so even if only a subset of all the capabilities of a standard compliant SIP proxy can be implemented using baseline CPL, it is possible to add extensions to be able to use any legal feature of the SIP standard. As an example, the following CPL script (also taken from RFC 3880) rejects calls from specific callers by using such an extension:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <address-switch field="origin" subfield="user" xmlns:re="http://www.example.com/regex">
      <address re:regex="(.*.smith|.*.jones)">
        <reject status="reject" reason="I don't want to talk to Smiths or Joneses"/>
      </address>
    </address-switch>
  </incoming>
</cpl>

For the rest of this discussion, we will consider that we can only use CPL extensions that are legal in a standard SIP proxy.

Now that we established the perimeter of our problem, we can formally formulate the question as follows: If we consider a VoIP system made only of endpoints (phones) and SIP proxies that run CPL scripts, and with the complete knowledge of all the (legal) CPL extensions in use, is it possible to write a program that takes as input all these scripts and an initial call destination (known as a SIP URI) and that can tell us if the resulting call will loop or not?

That seems simple enough: Try to build an oriented graph of all the forwarding rules, and if the graph is not a directed acyclic graph, then it is looping.

Let’s say that my former colleague took up the challenge and wrote this program. Using Java, he would have written something like this:

boolean isLooping(Map<URI, Document> configuration, URI initial) {
  // some clever code here...
  }

The “configuration” parameter carries the whole configuration of our SIP system as a list of mappings between an Address-Of-Record (AOR, the identifier for a user) and the CPL script that is run for this user. The “initial” parameter represents the initial destination for a call (which may contain the AOR of a user). The Java function returns true if a call to this user will loop, or false if the call is guaranteed to reach someone or something, other then the originator of the call, without looping.

Now let’s change the subject for a little bit and talk about something called the Cyclic Tag System (CTS). CTS is a mechanism to create new bit strings from an initial bit string and an ordered list of bit strings called productions, using the following rules:

1. Remove the leftmost bit of the current bit string.
2. If the removed bit was equal to 1 then concatenate the next production in the list to the current bit string (restarting at the first production when all are used).
3. Continue at rule (1) unless the current bit string is empty.

The following example from Wikipedia uses an initial bit string of 11001 and a list of productions of { 010, 000, 1111 }. Following the rules above, we generate the following bit strings:

11001
1001010
001010000
01010000
1010000
010000000
10000000
...

Simple enough.

Now, let’s say that we want to implement CTS in SIP. The CPL language is not powerful enough for that but we can define a very simple extension (still conforming to the SIP standard) that permits to copy a character substring into a parameter for the next destination of a call, e.g.:

<location m:url="sip:c.org;p={destination.string:2:10}" />

Here we extract the “string” parameter from the original destination and copy ten characters starting in second position into the new destination. With this new extension, we can now write three CPL scripts that will implement the CTS example above:

Script attached to user `sip:p1@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}010” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p2@jive.com`:


<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}000” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p3@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}1111” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

On our Java function isLooping(), these three scripts would go on the first parameter (indexed by the address of the proxy) and the initial parameter would contain “`sip:p1@jive.com;bitstring=11001`”.

It is trivial to write a program that can generate the CPL scripts needed to implement any list of productions. No need to even write a program, a simple XSLT stylesheet can do the job.

The reason we implemented CTS in SIP is that in 2000, Matthew Cook published a proof that CTS is Turing-Complete. Turing-Complete is a fancy expression that means that it is computationally equivalent to a computer, which basically means that any computation that can be done on a computer can also be done by the CTS system – obviously doing it multiple orders of magnitude slower than a computer, but both a computer (or a quantum computer, or a Turing machine, or lambda calculus, or any other Turing-Complete system) and the CTS system can compute exactly the same things. Because only a Turing-Complete system can simulate another Turing-Complete system, by showing that we can implement any CTS production list in SIP we just proved that SIP is equivalent to a computer. In turn that means that it is possible to convert any existing program unto a list of CPL scripts (with our small extension) and an initial SIP URI. So knowing this and knowing that the Java bytecode is also Turing-Complete (it is trivial to implement CTS in Java), we now can write this new function:


Map<URI, Document> convert(Runnable runnable) {
  // complex, but definitively implementable
  }

The function takes a Java class (i.e. a list of bytecodes) as parameter and returns a list of { SIP URI, CPL scripts } mappings as result. The class to convert implements Runnable so we have a unique entry point in it (i.e. its run() method), an entry point that by convention becomes the SIP URI on the first mapping returned by the function.

Now that we have these two basic functions, let’s write a complete Java class that is using them:

class Paradox implements Runnable {
  boolean isLooping(Map<URI, Document> configuration, URI initial) {
    // some clever code here
    }

  Map<URI, Document> convert(Runnable runnable) {
    // complex, but definitively implementable
    }

  public void run() {
    Map<URI, Document> program = convert(this);
    if (!isLooping(program, program.keySet().iterator().next())) {
      while (true);
      }
    }
  }

What we are doing here is converting the whole class into a SIP proxy configuration, and passing it to the code that can predict if this program will loop or not. Then we do something a bit tricky with the result, which is to immediately exit if we detect that it will loop. But because this is exactly the code that was under the scrutiny of the isLooping() implementation, it should have returned false, which should have executed the “while (true);” code. But executing the “while (true);” code means that the code is looping, which again contradicts the result we have.

Both cases are impossible which can have only one explanation: The isLooping() implementation is unable to find if this program loops or not. Thus, we have shown that it is possible to write a Java program that loops but cannot detect that it is looping. And because we previously proved that SIP with CPL is Turing-Complete, we know that if it is possible to write such a program in Java, it is theoretically possible to do so in SIP configuration. Because of this, we now know that it is impossible to write a program that can reliably predict if a SIP configuration loops or not. More precisely, for any implementation of isLooping(), it is always possible to find an instance of the “configuration” and “initial” parameters for which this version of isLooping() will return an incorrect response.

Now that we have the answer to our question, let’s have a better look at what really happen in a VoIP system. A SIP call never really loop forever (well, unless the PSTN is involved, but that’s a different story), because there is a mechanism to prevent that. Each call contains a counter (Max-Forwards) that is decremented when it traverses a SIP proxy, and the call ends when this counter reaches zero. Max-Forwards is a little like CPU quota. It does not improve the quality of a program; rather it just prevents it from making things worse. Putting aside the fact that we are in the business of establishing communications between people, not finding fancy ways to prevent that, the Turing-Completeness of SIP still gets in the way as, for the same reasons than before, it is also impossible to write a program that will reliably predict if a call will fail because the Max-Forwards value will reach zero.

This is a sad state of affairs that the only way to reliably predict a possible failure is to let this failure happen, but the point of this article is that this is not really the fault of the programmer, but just the consequence of the limitations of computation in this universe.

Note that at least some SIP systems are not necessarily Turing-Complete – here we had to add an extension to our very limited system based on CPL to make it Turing-Complete. But it is far easier to prove that something is Turing-Complete than proving it is not, so even if I could not find a way to prove that a pure CPL system is Turing-Complete, that does not prove it is not. But worse, we saw that there is not much difference between a Turing-Complete system and one that is not, so even a small and seemingly irrelevant modification to a system can make it Turing-Complete. So basically writing a program that tries to reach definitive conclusions on the behavior of a system moderately complex – like SIP – is an exercise in futility.

Many thanks to Matt Ryan for his review of this article.

Keeping work and personal computers separated

I try as much as possible to keep my personal stuff separated from the work stuff. Even if both California and Utah laws are clear that what I develop on my own time and my own hardware is mine (as long as it is not related to my day job – that’s the big difference with French laws where I own what I developed on my own time, even if it is on my employer’s business or computers), that did not prevent one of my former employers to try to claim ownership on what was rightfully mine. Because it is very expensive to get justice in the USA, getting things as separated as possible from the beginning seems like a good idea.

The best way to do that is simply to not work during one’s free time on anything that could have a potential business value – these days, I spend a lot of time learning about cryptography, control system engineering and concurrent systems validation. But keeping things separated still creates some issues, like having to carry two laptops when traveling. I did this twice for IETF meetings, and it is really no fun.

The solution I finally found was to run my personal laptop as an encrypted hard drive in a virtual machine on the company laptop. My employer provided me with a MacBook, which has nice hardware but whose OS is not very good. I had to put a reminder in my calendar to reboot it each week if I did not want to see it regularly crashing or freezing. Mac OSX is a lot like like Windows, excepted that you are not ashamed to show it to your friends. Anyway here’s how to run your own personal computer on your employer’s laptop:

First you need a portable hard drive, preferably one that does not require a power supply. I use the My Passport Ultra 500GB with the AmazonBasics Hard Carrying Case. Next step is to install and configure VirtualBox on your laptop. You will need to install the Oracle VM VirtualBox Extension Pack if, like me, you need to use in your personal computer a USB device that is connected to the employer laptop (in my case, a smart-card dongle that contains the TLS private key to connect to my servers). Next step is to change the owner of your hard drive (you unfortunately will have to do that each time you plug the hard drive):

sudo chown <user> /dev/disk2

After this you can create a raw vdmk file that will reference the external hard drive:

cd VirtualBox VMs
VBoxManage internalcommands createrawvmdk -filename ExternalDisk.vmdk -rawdisk /dev/disk2

After this, you just have to create a VM in VirtualBox that is using this vdmk file. I installed Debian sid which encryption, which took the most part of the day as the whole disk as to be encrypted sector by sector. I also installed gogo6 so I could have an IPv6 connection in places that still live in the dark age. Debian contains the correct packages (apt-get install virtualbox-guest-utils) so the X server in the personal computer will adapt its display size automatically to the size of the laptop.

To restore the data from my desktop, I configured VirtualBox on it too, so I could also run the personal computer on it. Then, thanks to the same Debian packages, I was able to mount my backups as a shared folder and restore all my data in far less time than an scp command would take.

And after all of this I had a secure and convenient way to handle my personal emails without having to carry two laptops.

Things I learned last week (9)

Maildir conversion with Thunderbird

Because a full backup of my desktop takes more than 7 hours, I do only incremental backups during the week. This also permits to have more than two weeks of daily backups online, which is really helpful when a file is accidentally deleted or modified. I try to keep these incremental backup as small as possible by using various tricks. For example I use a snapshot of each of my Virtualbox instances during the week, and merge them the day before the full backup. Similarly, I do a run a garbage collection (which compress the objects into a pack file) of all my git repository just before a full backup, so an automatic garbage collection is not triggered before an incremental backup, and so on. But even with these tricks, each incremental backup is at least 10 Gb, which is mostly because of my mailer, which use mbox as storage, i.e. one big file for each folder, so receiving one email will trigger the backup of the full folder.

There is a better type of storage in this case, maildir, where each email is stored in a different file (in fact an even better storage would be to have one file per email, but to have all the current emails packed in one file when the “Compact Folders…” command is run). Changing an account storage to maildir is simple under Thunderbird: Search for the storeContractId property for the account, and change it from “@mozilla.org/msgstore/berkeleystore;1” to “@mozilla.org/msgstore/maildirstore;1”. The problem is that there is currently no automatic process to also convert the existing folders to the new format. Here’s a small script that does the conversion of an existing directory containing mbox files to a new directory containing maildir files:

#!/bin/bash
find $1 -type d -exec bash -c 'mkdir -p $0/${1#*/}' $2 {} ;
dir=$(pwd)
find $1 -type f -name *.msf -exec bash -c 'f=${2%.*}; ./mb2md -s $0/$f -d $0/$1/${f#*/}' $dir $2 {} ;

The mb2md program should be downloaded from http://www.ulduzsoft.com/2012/02/your-personal-gmail-like-mail-system-converting-emails/ but you need to apply the patch from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578084 (the mb2md program installed from the Debian package mb2md has some issues with malformed mbox).

After converting the directory, you have to copy some of the files in the original repository (msgFilterRules.dat, popstate.dat, rules.dat). Modify the storeContractId property and restart Thunderbird or Icedove and your emails should be available after the index files are rebuilt.

Things I learned last week (8)

Android devices

Murphy’s Law adapted for Android development: The udev rules of the computer you are currently using never contain the USB ID of the Android device under test. Probably because the list of Android devices is continuously growing, it is not possible to download a set of rules that stays up to date but, as Google provides a web page containing this list of USB IDs, it is easy to write a XSLT document to process the HTML table:

<!-- generate-rules.xslt -->
<?xml version="1.0" ?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />
  <xsl:template match="text() | comment()" />
  <xsl:template match="table">
# Generated by generate-rules.xslt
    <xsl:apply-templates />
  </xsl:template>
  <xsl:template match="tr[1]" />
  <xsl:template match="tr">
# <xsl:value-of select="td[1]" />
SUBSYSTEM=="usb", ATTR{idVendor}=="<xsl:value-of select="td[2]" />", MODE="0666", GROUP="plugdev"
  </xsl:template>
</xsl:stylesheet>

The rules file can then be automatically generated, for example in a cron job, with a command like this:

curl https://developer.android.com/tools/device.html | xsltproc --html generate-rules.xslt - >51-android.rules

You probably need to reload udev after this.

Installing CPN Tools under Linux

I recently upgraded CPN Tools to version 3.9.3 and unfortunately this time the instructions for the installation under Linux were missing some steps:

  • Download the JRE for Windows from “https://www.java.com/en/download/manual.jsp” and install it under Wine with “wine jre-7u25-windows-i586.exe”. Install the Linux simulator.
  • After this download the new version of CPN Tools and install with “wine cpntools_3.9.3.exe”
  • Go to the simulator direction (~/.wine/drive_c/Program Files/CPN Tools/cpnsim) and make the right files executable with “chmod 775 cpnmld.x86-linux run.x86-linux”
  • In the CPN Tools directory modify the default.xml file so the “enterss.ogpath” contains “statespacefiles/”

There is 2 background jobs to start before the GUI, first is the simulator. To keep an eye on the log, I run the following script in a console:

<#!/bin/bash

trap 'kill $pid; wait; exit' INT TERM
cd ~/.wine/drive_c/Program Files/CPN Tools/cpnsim
./cpnmld.x86-linux -d 2098 ./run.x86-linux cpnmld.log &
pid=%+
tail -F cpnmld.log

The second background job is the simulator extensions server that can be started with the following command:

java -jar ~/.wine/drive_c/Program Files/CPN Tools/extensions/SimulatorExtensions.jar &

The CPN Tools GUI can finally be started with the following command:

wine ~/.wine/drive_c/Program Files/CPN Tools/cpntools.exe

One thing that does not work is the Save state space report: For some reason the open syscall receives the Windows name of the file (“C:…”) instead of the equivalent Linux name. But the state space queries are working fine, so that’s not really a big issue.