Contact

jWebSocket Headquarter

Innotrade GmbH

An Vieslapp 29

52134 Herzogenrath

Germany

Publications – Overview

Introduction WebSockets for Android

09/2010

Whether on-line games, or online collaboration, streaming, chat, remote control or monitoring applications - real-time communication has long ago moved into the mobile devices. HTML5 WebSockets are the ideal basis for bidirectional high-speed data exchange in real time.

Alexander Schulze - English translation by Predrag Stojadinovic

Why WebSockets?

The multitude of WebSockets benefits can be expected to gradually replace traditional mechanisms such as AJAX, XHR, comet or polling. First, the communication is streamlined, because WebSockets are based on TCP instead of on HTTP. The protocol overhead will be reduced, apart from the initial handshake, to two bytes per packet. This speeds up the application and reduces the volume of data - a cost argument particularly important for Mobile Apps.

Since WebSockets are bidirectional, client and server can share a single data channel in both directions. In contrast to HTTP and its traditional request/response mechanism for which one channel is used to send and another one is required to receive data, WebSockets provide real full-duplex communication on one TCP connection - perfect conditions for all server push and streaming services in the Internet.

In addition, a WebSocket server can handle twice as many concurrent connections compared to a HTTP WebServer - another argument for WebSockets in terms of resources and infrastructure investments.

WebSocket Frameworks

Meanwhile, there are a whole series of WebSocket servers based on different platforms such as PHP, Perl or Java. Since Google uses its own Dalvik Java VM on the Android devices and since Android apps are developed in Java, this article builds on the jWebSocket project.

jWebSocket is an OpenSource Framework, which not only provides a Java Server and a Browser Client but also Clients for stationary and mobile devices. On the one hand this allows a quick introduction into the WebSocket technology and on the other hand a deep insight into the internals. Nevertheless, the described communication methods are in principle valid for other frameworks too.

Meanwhile, WebSockets are supported by many browsers, like the latest versions of Chrome, Safari and Firefox. The other browsers will follow and for the older browser versions jWebSocket provides a  Flash plug-in which is cross browser compatible and completely transparent for the application.

Interoperability

But of course, WebSockets are not limited to browser applications. In mobile apps under Java ME and Android there is only a single library file needed to be integrated - a Java WebSocket client with an API similar to the Web client. Once all partners agree upon a common data format like JSON or XML, even the various stationary and mobile clients can intercommunicate in real-time.

WebSockets offer such a high degree of interoperability. In terms of flexibility WebSockets are superior to conventional approaches, as they are not subject to protocol requirements and are also not subject to specific data formats. However, what on the one hand implies wide freedom for the data exchange between server and client, one the other hand requires high carefulness on the processing as well as high responsibility for the security on the communication.

Therefore, let's start with how to establish a connection.

Persistent connections

One thing was taken over from the existing methodology: only the clients can initiate a connection to the server. The server cannot initiate a connection to a client on its own. Unlike HTTP, where by specification a connection is closed automatically after the request is received and the results is sent, WebSockets are designed to maintain a permanent connection.

The connection is opened explicitly by the client, and also explicitly closed again - at least in theory, because as we will see later, certain instances are located on the route between the server and client which want to have a say in this matter. In the meantime, however, server and client can have a long coffee break.

WebSocket Server

Basically, the core of a WebSocket server consists of just a simple socket server, which accepts incoming TCP connections on a configured interface, i.e. IP address and port. In principle, two models are supported:

  • the traditional model where a separate thread is created per connection.
  • the new features of Java called NIO

The thread-based model offers a simpler implementation, but has limited scaling capabilities, while with the NIO model, it is just the opposite.

jWebSocket Thread/NIO-Model

jWebSocket covers both requirements by providing the in house TCP-Engine as well as the embedded JBoss Netty engine. For the application this is completely transparent.

Thread/NIO Model

Extensibility

The jWebSocket server itself provides only rudimentary basis functions. In addition to the protocol handling this includes the interpretation and generation of the so-called tokens, which will be covered later. User specific business logic is implemented in the form of plug-ins and listeners. In the jWebSocket package there are already several plug-ins available for the authentication and streaming services as well as for the connection management and the Remote Procedure Calls (RPC).

jWebSocket Server Extensions

WebSocket Client

The client is responsible for making the connection. Normally, the client uses an URL, consisting of a protocol, host, port, path, and optionally one or more additional parameters.  The socket client extracts the host and the port from the URL and addresses the server using this information.

Instead of the standard http and https protocols in WebSocket the ws and wss protocols are used in the URLs, with the additional s in each case referring to the encrypted SSL connections. The host is specified either by name or IP number and the port is separated by a colon. A complete WebSocket URL is created as follows:

  ws[s]://host[:port/][path/][arguments]

The path and the optional arguments are sent to the server in the subsequent handshake.

Handshake

Once the TCP connection is successfully established, the client sends its initial handshake in accordance with the IETF WebSocket Draft 76 in the following generalized form:

  GET {path} HTTP/1.1
  Upgrade: WebSocket
  Connection: Upgrade
  Host: {hostname}:{port}
  Origin: http://{host}[:{port}]
  Sec-WebSocket-Key1: {sec-key1}
  Sec-WebSocket-Key2: {sec-key2}

  8 Bytes generated {sec-key3}

The server has to answer as follows:

  HTTP/1.1 101 WebSocket Protocol Handshake
  Upgrade: WebSocket
  Connection: Upgrade
  Sec-WebSocket-Origin: http://{hostname}[:{port}
  Sec-WebSocket-Location: ws://{hostname}:{port}/

  16 Bytes MD5 Checksum

A client's request to a chat service of a WebSocket server for instance looks like this:

  GET /services/chat/;room=Foyer HTTP/1.1
  Upgrade: WebSocket
  Connection: Upgrade
  Host: jwebsocket.org
  Origin:http://jwebsocket.org
  Sec-WebSocket-Key1: 4 @1  46546xW%0l 1 5
  Sec-WebSocket-Key2: 12998 5 Y3 1 .P00

  ^n:ds[4U

The server responds as follows:

  HTTP/1.1 101 WebSocket Protocol Handshake
  Upgrade: WebSocket
  Connection: Upgrade
  Sec-WebSocket-Origin:http://jwebsocket.org
  Sec-WebSocket-Location:ws://jwebsocket.org/services/chat

  8jKS'y:G*Co,Wxa-

The fields Sec-WebSocket-Key1Sec-WebSocket-Key2, and the 8 bytes Security-Key3 are randomly generated by the client. From these fields the server generates a 16-byte checksum, in order to show that it has really read the handshake. If the answer does not match the given calculation procedure, the client cancels the connection.

A detailed explanation, especially for the generation and verification of the security keys, can be found in the IETF WebSocket specification at http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76.

Data Exchange frames

When the handshake is completed client and server can exchange text-based or binary data packets. Text packets will be wrapped in a frame of the format

  0x00 <UTF-8 Character data> 0xFF
 
 

Within the text-frames there must not be any 0xFF characters, since this marks the end of the frame. Binary packages are not yet allowed under the current WebSocket protocol version 76. It is expected, however, that the short frame format 

  0x80-0xFF<Length> <Binary data bytes with the above specified length>
 
 

will be established. Up to the time of writing this article the binary data has to be encoded by the sender and decoded by the receiver.  Base64 encoding is the recommended solution for this process. All major programming languages including JavaScript, Java and C provide this encoding and corresponding reference implementations can easily be found on the web. As an example, see the second part of this article, the transfer of image data.

Browser Clients

In most new browsers the WebSocket protocol is already implemented, therefore the client-side web application does not need to implement the low-level protocol. In Java Script there is a very simple WebSocket class available with three self explanatory events and the two methods send and close

        var lWebSocketClient = new WebSocket("ws://jwebsocket.org:8787");
        lWebSocketClient.onopen = function(aEvent) {/*...*/}
        lWebSocketClient.onmessage = function (aEvent) {/*...*/}
        lWebSocketClient.onclose = function (aEvent) {/*...*/}
        lWebSocketClient.send("Hello World!");
        lWebSocketClient.close();
      

Firstly, an instance of the WebSocket client is generated. Internally, it will automatically connect and initiate the handshake. The onopen listener is called after the connection to the WebSocket server was successfully established and the handshake was answered correctly.  Otherwise, the onclose is called immediately.

Once a data packet arrives from the server, the WebSocket object delivers an event to the application via the onmessage callback. The wrapping frames are removed by the WebSocket object and the message passed without the 0x00 and 0xFF characters.

Differences from XHR

Developers who are used to the XHR, Comet or Polling implementations with complicated readyState- or buffer handling, will be positively surprised by WebSockets: any packet will always be delivered completely, provided a stable connection is available. If the connection is broken, which in mobile networks depends on the signal strength and quality, this will be reported via the onclose event. The application can then initiate a new connection.

Because of the single-thread property of JavaScript, the data packets should not be too large, as in any complex analysis of the data, there is a risk that the UI freezes for that time. Optionally, one can use Web Workers to fix this problem. When porting existing applications it is important to also note that, in contrast to the exchange of data via XHR, the WebSockets approach is based on a non-blocking principle, i.e. the code keeps running after sending a request without waiting for an eventual result. Especially for sequential queries in which the subsequent code depends on the result of a previously posted request, some developers try to cheat a little with asynchronous=false in the open method of the XMLHttpRequest object. With WebSockets, such requests must be implemented asynchronously.

Data Formats

When the client sends a data packet to the WebSocket server, the server must not only accept the message, but also understand it. For client-2-client communication, such as a chat application, the WebSocket protocol contains no information as to how the recipient of a message is defined. Also, the WebSocket protocol specifies no standards for any commands to the server, this is implementation specific. To allow the server to process the client messages successfully, both parties need to agree on a common language.

JSON

Here, one should select a data format that is widely supported on all platforms. For the browser it makes sense to use JSON (JavaScript Object Notation). Many of the new browsers already natively support the JSON object, and for the older versions there is a library at http://json.org/js. The minimized version is only 4KB small and it properly bypasses the security risks of the eval method. For the server side code as well as for Java on Android there is also a lightweight and free JSON library available at http://json.org/java.

XML

XML is a very good alternative, especially for existing applications, where users already share their data in this format and the porting work should be kept low. However, in contrast to JSON mind the unfavorable relation of meta data to user data in XML.

Abstraction levels

Both JSON and XML formats meet the requirements for everything from simple single values all the way to very complex data structures. In jWebSocket the incoming data packets are converted by a layer of abstraction to an internal data format, the so-called tokens. The answers, just like the requests, are also passed through the abstraction layer, which converts the tokens to the appropriate format for the client.

In this way it is possible to have clients with different data formats communicate with each other.

jWebSocket Format-Abstraction

Communication via tokens

In jWebSocket, the tokens form the basis for the entire WebSocket network communication. After the successful connection and handshake, the server generates a welcome token including a Session-Id. In case of disconnection and re-connect, the server keeps this Session-Id. Prior authentication may be required for a new connection - a similar mechanism as in HTTP.

Excluding rare cases, normally a server expects an authentication from its users, in order to authorize certain operations or services. The client sends a login token, a data object that contains user name and password of the user as well as the Session-Id. Using login and logout, the user can change its identity during an existing connection.

Event Broadcast

Optionally, all connection and authentication events can be broadcast through the WebSocket network using appropriate tokens. For a chat application for example, the list of participants can be updated without delay at all clients. This, at least, is where the days of Comet and Polling approaches are numbered.

Threading Models

On the jWebSocket server, client requests can be executed either sequentially or within their own threads. In the first case, the server answers a request first before it gets the following from the receive buffer. While this ensures that the sequence of answers corresponds to the questions, with a high frequency of requests, there will be a performance penalty. In the second case, a separate thread is instantiated per request. This uses the power of the server much better, but results in non-sequential processing times of incoming requests.

Request and Response

Unlike with HTTP, a WebSocket client cannot count on an incoming packet being a direct response to a sent request. Finally, the server can send packets at any time from any streams or broadcasts of other clients, without waiting for the complete processing of a previous request. Also, the server is not necessarily the only service provider in a given WebSocket network. A request can be sent to another client that also offers certain services – and this would be completely transparent to the server.

In order for the sender to properly sort the incoming answers, it assigns a unique token ID to each token. The processor of the token returns the token ID with the response, allowing the sender to make an unambiguous assignment. The jWebSocket Client Library provides an OnResult callback to all function calls. This means that the developers do not need to care about the proper order of incoming answers.

Connection management

In mobile networks especially, the disconnections are not uncommon – for example these could be caused by the user moving through different radio cells, by resource sharing of the service provider or even by bad weather. A developer of mobile communication apps should take appropriate precautions. The ways to handle this particular problem are described in the following sections.

Physical connection

In the jWebSocket Framework, each client receives a unique Client-Id from the server. This represents the physical connection and consists of the IP address and remote port on the server, regardless of any authentication. Usually the Client-Id changes for a subsequent connection and thus cannot be used to re-identify a client but for the internal message routing only.

WebSocket Sessions

The assigned Session-Id is saved on the client, for a certain period of time, for example in a cookie or just temporarily for a single browser session in a local variable. In case of a reconnect within a configurable timeout period, a client can thus be recognized by the server as a returner and the clients previous status can be restored. Unlike with HTTP, the Session-Id does not need to be transmitted with every request to the server, but only when the connection is being established. The connection is open continuously, allowing many transactions during a single TCP connection.

Authentication

When the client transmits the authentication token to the server, the login name and password of the user are included. With jWebSocket, it is possible to log on to multiple devices or multiple times to a device under the same user name. For certain accounts, such as the administrator, this may be prohibited in the configuration. If a user agrees that his login to a WebSocket network is public, then the other users can request the list of connected Client-Ids from the server and get in touch with any client. To protect yourself against unwanted contact, you can block messages from non-authenticated users or from specified users through a personal block list which is stored on the server.

Service Clients

Should a single specific client be addressed in a WebSocket network, for example because it offers particular services, that client must register to the WebSocket server with a system-wide unique and persistent Node-Id. A node in the network is recognized via its Node-Id. According to the uniqueness of a Node-Id, any connection attempt from another node but with the same Node-Id is rejected by the server. The biggest advantage of this client service is that a WebSocket network can be easily extended with any new services, without having to restart the server. For security, the administrator can allow only pre-configured Node-Ids and even assign a particular IP numbers to each Node-Id.

Send vs. Broadcast

jWebSocket provides the send and broadcast methods for the transmission of tokens to other clients. While send sends the message to a specific client, the broadcast is distributed to several or even all connected clients. send expects either a Client-Id or a Node-Id to be specified, and it indicates a direct connection. broadcast, however, expects a group of users or a user with multiple connections, if that user has logged on under the same name in several stations.

WebSockets under Android

In many browsers WebSockets are already implemented, but the Java JDK so far supports only HTTP. So if someone wants to develop mobile apps for Android and use WebSockets, they must implement the entire protocol in Java or use an existing WebSocket library. The jWebSocket Framework contains a Java client and is distributed under the LGPL open source license and is therefore perfectly suitable for the impatient Rapid Prototyper. It is also suitable for developers that want to examine the WebSocket technology and want to extend it themselves.

Java implementation

The Java client in a WebSocket network performs exactly the same tasks as a Web client and therefore at the lowest level provides the same methods and events. Like for a browser these allow to establish a connection, to pass incoming messages or connection status changes up the application, to send or broadcast data packets as well as to terminate the connection. All described communication mechanisms are therefore equally applicable to Java clients running on Android.

Dalvik Virtual Machine

Unlike older  Java ME devices, the Android platform provides the entire functionality of Java 5 including Generics and Collections.  Extensive additional libraries for the integration of audio, video and GPS functions and interfaces to various Google services like Mail, Calendar or Maps provide a quick entry into the App Development with Android.

With the Dalvik Virtual Machine, Google provides its own Java VM that has been optimized for mobile devices in terms of memory requirements. At first sight this is a benefit, however, whoever believes that any existing third-party Java libraries can be embedded in Android apps easily as .jar-files will be quickly disappointed.

Incompatibilities of the VMs

The byte-code generated by Android SDK is considerably more compact - but not compatible. Developers with previous experience with Java ME may already have painfully noticed, that individual effort comes up per platform, at least if their full potential is supposed to be utilized. Android also ensures that the good old Java paradigm "Write-once-use-everywhere" becomes forgotten.

Use of libraries

So if you are planning to use third-party libraries for your Android Apps, you must ensure that these libraries are either provided as Android libraries or that they including the source code, together with appropriate documentation. Google's Android documentation is exemplary and can be found at http://developer.android.com/. There,  in addition to the SDK to download, you will also find a full reference, a developer guide, many extensive tutorials and videos. The jWebSocket Framework source code is fully available and integrates only open source libraries and can therefore be used seamlessly under Android.

Create your own applications

Anyone who has ever developed in Java and who invests a little time to browse through the easy to understand Android tutorial will find out very fast that mobile apps are very comfortable to develop. In this magazine you will find the second part of this article, which presents lots of code examples and two WebSocket live applications on Android.

Copyright © 2013 Innotrade GmbH. All rights reserved.