created over 4 years ago | Tagged:
Global information behemoth Thomson Reuters today announces the latest version of its Calais web service, delivering on earlier promises with respect to ‘Linked Data’ and firmly staking out the company’s intention to be a significant player in the shifting market for timely and authoritative information.
I’ll take a more in-depth look at the importance of authoritative sources in the emerging Linked Data ecosystem in this related post, and concentrate on the specifics of the Calais 4.0 release here.
I write about Thomson Reuters‘ release of Calais 4.0 over on ZDNet today, and wanted to use this post to explore some of the broader context within which Calais should increasingly be considered.
This re-engineering of Calais will deliver the functionality that users have come to rely upon, whilst ensuring Thomson Reuters’ ability to continue to scale in a timely and cost-effective manner on the back of Amazon’s Web Services offering.
In addition to this strengthening of the core offering, Calais 4.0 includes five substantive developments. First, the company has followed through on earlier talk about ‘Linked Data,’ ensuring that any of around 25 entity types (company names, geographic areas, album titles, etc) discovered in content submitted to Calais will now be returned to the submitter with a ‘dereferenceable URI‘ that may be followed by either people or software in order to discover further information.
The URI resolves to a Calais-hosted page of RDF with pointers to the Linked Data community’s usual suspects; DBpedia, MusicBrainz, GeoNames, the CIA Factbook, etc.
More unusually, and importantly, the second development sees the document include pointers to Thomson Reuters own content such as the (current) stock ticker, Board membership data, etc.
As the Press Release notes, “In keeping with its commitment to the Linked Data standard, Thomson Reuters has also made a subset of its core data assets available for public use on the Web. The collection of business information represents the first contribution to the ‘Linked Data cloud’ made by a major publisher. It enables developers to programmatically query and use fundamental facts on hundreds of thousands of publically-traded companies, including company descriptions, stock tickers, management teams, locations, boards of directors and more.”
Thirdly, Calais 4.0 includes a ‘metadata transport layer’ to simplify the process of exposing and sharing large bodies of semantically rich data. Tague suggested that 2-300,000,000 persistent and dereferenceable URIs are available today (and capable of servicing tens or hundreds of millions of hits per day) for content previously submitted to Calais, with many more to come as the service scales.
Fourth, Calais is making its first move beyond English language content, and version 4.0 now supports entity extraction in French. French-language relationship and event extraction will follow shortly, as will other languages.
Fifth, and finally, the Calais team is publishing an RDFS version of their schema, giving developers far more flexibility as to the ways in which they integrate the Calais web service into their own applications.
Latest News * Calais 4.0 has arrived! On the one year anniversary of our debut, we are extremely pleased to announce the debut of Calais 4.0. With more than 9,000 of you processing 1+million documents per day, it was time to take Calais to the next level.
Calais: Connect. Everything. We want to make all the world's content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph - we call our piece of it Calais.
Calais is a rapidly growing toolkit of capabilities that allow you to readily incorporate state-of-the-art semantic functionality within your blog, content management system, website or application
SemanticProxy, created by the Calais team, makes it easy to generate rich semantic metadata for individual Web pages. Simply paste the page's URL into the tool. With SemanticProxy, publishers, bloggers, developers and site owners of all kinds can:
* Build browser plugins that expose the semantic content of Web pages for their visitors * Scan a set of Web pages and build a local RDF store for querying and display * Watch an information-rich Web site and trigger alerts on selected events * Embed the functionality in an online publication to enhance its search and navigation