Skip to content

Lemon Digital Production

Lemon Blog

Wednesday, 25 November 2009

Opportunities to earn revenue with photo tagging service

Google-backed startup, Pixazza, has announced it's opening up it's photo tagging service which means very soon any website can be turned into an online store.

Seizing on the global thirst for everything celebrity, the Mountain View based company has recently trialled their technology on a handful of celebrity gossip sites like Just Jared and Celebuzz. For example, on Just Jared you can see a photo of singer Rhianna wearing the latest outfit and instantly "Get the Look". You simply roll over the image and links appear to similar clothing products which you can then go and buy.

Pixazza has a team of taggers standing by, going through the photos on popular websites like Celebuzz, identifying the products contained in them, and 'tagging' them with information about those products.

Every time someone clicks on a link or makes a purchase, Pixazza and it's partners earn money. It's a simple, obvious affiliate marketing idea, and if it takes off could be something that we see elsewhere -not just on celebrity gossip sites. After all, there are literally billions of photos on the web today and millions more get uploaded every day. Many of these photos show something that somebody somewhere might be interested in identifying and for which there’s a related product or service that can be purchased. So there's virtually no limit to the number of websites that Pixazza and similar services could be applied to.

James Everingham, Pixazza's CTO, has stated that thousands of websites have asked to use Pixazza, although it is currently being tested on about a dozen. If those websites find that they can make money connecting web surfers and retailers through their photos, there may come a day when you no longer have to scour the internet to identify the product you want in some random picture. It might be 'tagged' for you, with a convenient link that screams "Buy me!"

Wednesday, 18 November 2009

Microsoft gets involved with HTML 5

HTML 5 is coming. It won’t be here tomorrow but the HTML 5 specification that has been ‘under construction’ since June 2004 could be more than just the next major revision of the hypertext markup language – it could be a game-changer that makes rich internet application (RIA) plug-ins like Flash, and Silverlight and JavaFX unnecessary.

As the web has evolved from a collection of “pages” to a collection of “applications”, RIA technologies like Flash have grown in prominence because the functionality and user experiences required to create increasingly sophisticated internet applications surpass what can be done with basic HTML. HTML 5 is being designed to change that and is expected to provide new capabilities, including:
  • The native display of audio and video content through a standard interface.
  • A “canvas” that supports 2D drawing on a web page.
  • Drag-and-drop support.
  • Support for running scripts in the background.
  • Local data storage permitting applications to “work offline”.
  • New form controls for common elements such as dates, times, emails and URLs.
If everything goes according to plan, supporters say that HTML 5 will not only bring HTML into the 21st century, it will reduce our reliance on proprietary technologies like Flash and Silverlight and make it easier for developers to develop sophisticated applications that work across browsers.

Of course, to accomplish this, all of the browser makers will need to play along. Earlier this month, Microsoft signaled that it’s taking internet standards more seriously as a posting it made to the W3C mailing list indicated that the Internet Explorer team is reviewing the HTML 5 specification and would “share...feedback and discuss this in the working group”.

Whether Microsoft’s participation in the HTML 5 working group truly evidences a willingness to work for standards remains to be seen. Indeed, Microsoft’s posting noted that “At this stage we have more questions than answers”.

Right now, HTML 5 is years away and therein lies the problem. By the time the HTML 5 specification has been finalized, it’s almost certain that the market will have evolved even further.

Already, proprietary technologies are entrenched. Companies have made significant investments in RIAs like Flash and Silverlight and in the case of Flash in particular, penetration is so high as to make the technology ubiquitous.

Because of this, the question for consumers, developers and technology companies is whether the HTML 5 specification really matters. While its virtues are very appealing in theory, the slow speed at which the HTML 5 spec is being hammered out demonstrates that building a specification and doing it with broad-based consensus is a time-consuming process that really can’t keep up with the commercial needs of the web.

While we can hope for the best with HTML 5, the reality is that business will go on as usual and proprietary technologies will continue to be developed and adopted because the individuals and companies that use the internet can’t wait around.

Real-time on the web: addressing performance, scalability and availability - 1 of 4

The web is abuzz over 'real-time' -- the concept of websites and web applications providing instantaneous response to online interaction and access to streams of constantly-updating data and information.

Today's focus on real-time services is a reflection of the evolution of interactivity on the internet but applications that are heavy on interaction present unique challenges for developers when it comes to performance, scalability and availability. While simple applications that primarily pull content from a database and display it to users are can be made highly-efficient using techniques such as caching, interactive applications that are designed to be used in real-time can be much more difficult to maintain and scale.

Twitter, arguably the purest example of a popular 'real-time' internet service, is the perfect example of that. It has been plagued by performance and downtime issues for some time now and it's not hard to see why: at any given moment, there are thousands upon thousands of Twitter users posting and pulling content, with constant polling to the web servers for updates from Twitter clients, the website and through the API. This means lots of database reads and writes and a mountain of HTTP traffic. It's a developer's worst nightmare: a steady flow of resource-intensive database writes coupled with an almost never-ending flurry of database reads.

When it comes to dealing with performance, scalability and availability for real-time web applications developers now need to think about the following key issues (amongst others no doubt, but these are the ones I am focusing on) when designing applications:
  • HTTP polling and providing a real-time experience for users will increase the number of requests to to the server unless a connection can remain open. Traditional web servers don't provide a solution for this and opening socket connections are not really a viable solution as firewalls will typically block this from within corporate networks.
  • Database read and write performance and avoiding the locks that will ensue as a result of the high volume of writes to tables. Typical RDBMS databases are simply not ideal storage solutions when high volumes of read and write requests are required.
  • CPU or IO intensive operations that need to be queued and processed separately to ensure the web servers remain responsive to "normal" requests.
  • Autoscaling to support unexpected loads in a cost effective manner.
I have decided to address each of the issues in separate blog post, so in this blog post, I will aim to talk about concurrency, HTTP polling and providing a real-time experience for web visitors over an HTTP connection.

HTTP Push - Polling, Streaming and Sockets


The issues we have with why web servers struggle with real-time lie primarily with the antiquated HTTP protocol which is unfortunately a legacy we're going to be stuck with for some time. Google and others are fortunately looking at solutions. Google has recently published a proposal for a new protocol called SPDY, which seems to address some of the biggest faults of HTTP's suitability to today's web applications. With the current HTTP protocol, connections are not persistent (meaning every interaction with the server requires a new request, new headers, new response headers, authentication etc.) and most of the communication is typically uncompressed. SPDY sets about addressing these two key issues, with the persistent connection being the one relevant to HTTP polling.

Currently, when a user visits a web page which provides a real-time experience, what is typically happening is that the web browser is in fact polling the web server every second or so to say "is there an update?". These requests are pretty lightweight, but immediately present a massive problem when thousands of visitors use the website at the same time. Assuming you poll the server every 2 seconds, and you have 1,000 visitors on your site at any one time, the server(s) would receive approximately 500 requests per second asking "is there any update?"
Whilst 500 is a digestible amount of requests, increasing the number of visitors to 50,000 during a peak period meaning 25,000 requests per second would quickly bring down any small web farm. The issue again lies with the fact that HTTP does not allow data to be pushed back to the browser, so the browser has no option but to keep polling and overloading the server with unnecessary requests.

A number of approaches have been taken to solve this problem, with Google's recent suggestion being the most sensible way of fixing this without "hacking" the HTTP protocol, however SPDY protocol is a long way away and is not something we can rely on. The common ways to work around the HTTP issues are as follows:

  • Long Polling allows the browser to open a connection to a web server and keep the connection open for an extended period of time waiting for data to be sent to the browser. As as data is sent, a new Long Polling request is opened to the server waiting for the next event to be sent from the server.
    Tornado is a web server built specifically to provide this type of long polling functionality and was built by FriendFeed which is now released as an open source server. For those of you using Nginx, you can configure Nginx as a Comet server using this beta plugin. The plugin allows your standard web application to pull and push content, and let the plugin do all the hard work distributing data to the clients.
  • Streaming allows the browser to open the connection to the web server and keep it open for as long as the user is on the website. This solution has numerous problems around browser support and the inability to detect the state of the connection. Whilst this is an option using the iFrame or XMLHttpRequest method, we do not recommend this approach.
  • Socket Connections are achieved through the use of a plugin such as the common Adobe Flash. Flash has complete support for raw socket connections providing a facility for your application to open a bi-directional asynchronous connection to a server, however this is not done over HTTP. As a result of this being a raw socket connection, users behind strict corporate firewalls will often not be able to connect using these socket connections which means a socket connection is probably only viable for consumers using the application at home. Server solutions, commonly used by Flash based game developers include ElectroServer and SmartFoxServer (which is based on Red5 which is the open source alternative to Adobe Flash Media Server)

Concurrency

Once you've figured out your solution to support push from the server to the browser, your next challenge may very well be how to allow your website visitors to experience a truly real-time experience and interact with other visitors. A great example of this is Google Spreadsheets which allows you to work with your Google Doc in real-time, updating the document and receiving updates in real-time (e.g. if you update a formula, you see other cells update once Google pushes the changes back to you), and also to chat to other users editing your document at the same time. Providing an application which allows this type of message queuing and dispatching between users can be extremely difficult and is typically addressed with languages that are more adept at handling concurrency.

Erlang, a language developed by Ericsson way back when to help them with virtually unlimited scale and conurrency, is designed from the ground up to deal with concurrency. There are no shared variables which ensures there are no locking issues which is the typical hell that developers need to deal with when trying to write applications that handle concurrency gracefully. Erlang avoids locking issues by supporting the idea of messages so that each function or method simply passes messages to other functions or methods. Using this message passing and queuing system as an integral part of the language, applications can easily scale by dding more servers capable of receiving and dispatching messages, and issues around concurrency will never materialise. However, saying all this, Erlang is not ideal as a web server and typically Erlang based solutions use a proxy server to handle HTTP requests and push HTTP requests back to the browser. See Alexey's post on how he is trying to build a system which can cope with a million long poll requests using Erlang and Nginx. You'll also need to learn a language and syntax so have fun ;)
Facebook use Erlang to drive their chat along with a Comet solution, if it works for them I'm sure it will work for you.

Scala is another functional language which is also more suitable for concurrent programming, and has been employed by Twitter to help them scale their concurrency issues.

Summary

Whilst there are numerous ways to address the inadequacies of the HTTP protocol and the inevitable and complex concurrency issues with most programming languages, building true real-time solutions is not easy and it's not likely to be easy for some time to come. New methods to address all of these issues are constantly being discovered and suggested, and even Google who are clearly fed up with the restrictions imposed upon them by the protocol are trying to find an alternative solution. If anyone is driven to make it work they are fortunately with their entire business relying on an increasingly usable web experience.

Further reading