RSS feeds and XML are great things. Through the use of standard APIs, they allow you to pull out data from any website that supports these protocols, and reuse the collected material in your own applications. But what if you want to do the same with a website that has neither RSS feeds, XML, nor a dedicated API ? Well, you use dapper!

What is dapper?

dapper is a service that allows you to extract and use information from any website on the Internet. For those familiar with web services, you can think of dapper as an API maker. For the rest of you, Dapper allows you to build web applications and mashups using data from any website without any programming. Basically, what dapper does is providing an easy mechanism to process unstructured information from HTML, clean it, transform it and re-emit as structured XML, which can be then reused into an host application.

How dapper works

The idea behind dapper is to create an automatic, visual way of extracting information from HTML pages. It works by taking a few sample pages as input and then letting users visually specify the information that should be extracted. Each page is treated like a record in a database. The system runs a quick similarity analysis between sample pages. Even though the analysis is very quick, there is a non-trivial tree-matching algorithm - fine-tuned for HTML - that powers this aspect of dapper. After analyzing the pages, dapper presents the user with a highlighter tool for selecting attributes of a record. The resulting set can then be exported in various formats (RSS, XML, plain HTML, ...) and reused at will.

Legal considerations

Since dapper can be used to extract contact from virtually every website that exists, its terms of use won't allow the system to be held liable for any sort of copyright infrigement, and will immediately comply with any verified request by the lawful owner of the content to cease using his content.

So remember: before you crawl the web to find some food for your apps, keep in mind that using a free app to do so doesn't mean the data is free as well. Always read the licencing carefully.

That said, dapper is certainly one of the most useful tools we've featured in our columns!

Get it!

ยป Dapper