Tools

Articles

Neo4j Graph Database

06.06.2014

What is Neo4j?

Neo4j is an open-source graph database supported by Neo Technology.
Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also known as a Property Graph.

Main features:

• intuitive, using a graph model for data representation
• reliable, with full ACID transactions
• durable and fast, using a custom disk-based, native storage engine

• massively scalable, up to several billion nodes/relationships/properties
• highly-available, when distributed across multiple machines
• expressive, with a powerful, human readable graph query language
• fast, with a powerful traversal framework for high-speed graph queries
• embeddable, with a few small jars
• simple, accessible by a convenient REST interface or an object-oriented Java AP


What is a Graph Database?

A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way.

Let’s follow along some graphs, using them to express themselves. We’ll “read” a graph by following arrows around the diagram to form sentences.


A Graph contains Nodes and Relationships

A Graph –[:RECORDS_DATA_IN]–> Nodes –[:WHICH_HAVE]–> Properties.

The simplest possible graph is a single Node, a record that has named values referred to as Properties. A Node could start with a single Property and grow to a few million, though that can get a little awkward. At some point it makes sense to distribute the data into multiple nodes, organized with explicit Relationships.









Query a Graph with a Traversal

A Traversal –navigates–> a Graph; it –identifies–> Paths –which order–> Nodes.

A Traversal is how you query a Graph, navigating from starting Nodes to related Nodes according to an algorithm, finding answers to questions like “what music do my friends like that I don’t yet own,” or “if this power supply goes down, what web services are affected?”










Indexes Look-up Nodes or Relationships

An Index –maps from–> Properties –to either–> Nodes or Relationships. It –is a special–> Traversal.

Often, you want to find a specific Node or Relationship according to a Property it has. This special case of Traversal is so common that it is optimized into an Index look-up, for questions like “find the Account for username master-of-graphs.”



NEO4J.ORG »


IKS Semantic CMS

04.07.2011

If you believe that semantics is the key to smart content — to content enriched and structured to promote findability, reuse, and task-focused knowledge extraction — and if you believe in open source, then the IKS Semantic Project may be for you.

IKS stands for Interactive Knowledge Stack. It provides a framework for semanticizing managed content. Why is that important? Because current content management systems lack the capability for semantic web enabled, intelligent content, and therefore lack the capacity for users to interact with the content at the user’s knowledge level.

IKS is open source, designed to integrate with open-source web and enterprise content management systems. The project is in early stages, the work of a consortium that consists of seven academic research groups and six industrial partners, companies active in the content management space. It is funded by European Union research program grants and was accepted into the Apache incubator program, as Apache Stanbol, in November 2010.

IKS is being developed and supported by a community of cooperating-competing project participants and is designed for adoption by those participants and by a broader set of content-management and search providers and users. The converging-diverging needs of a diverse community are best served with a strong technical and business architecture that allows for coordinated development of components and capabilities, and IKS has one.

IKS capabilities are expressed as seven "industrial benchmarks" per early product documentation:

• Semantic search
• Content creation and presentation service (intelligent authoring)
• Workflow service (business processes and content)
• Multi-channel publishing (customizing content)
• Product configuration service (complex content aggregation)
• Event distribution service (spatio-temporal, semantic content, "making events visible in ambient environments")


iks-project »


Graph databases

27.04.2011

One of the most important categories of the SemanticWeb is NoSQL databases, ie. document - oriented database and a graph database.
Graph database based on graph theory and provide store of information on relationships between the entered data. The most obvious example is the connection between people on social networks, while the links between search results and their attributes in the system for recommendation, which is another example.
It is also known that the standard relational databases are not suitable for storing data on relations between the entities, due to the complex and complicated queries can be slow to execute with unexpected results, while the graph database designed for this type of work.
Here are some graph-oriented databases and systems.

Pregel is Google's graph of the system, which is used to perform complex algorithms that relate to Web page links to data sets.
Pregel currently handles billions of nodes and links, while the number and limits continually increase. Evaluation of this tool is difficult, but as they say at Google, have not yet encountered the problem of species or the practical application of graph theory, which could not be resolved. Pregel processing large graphs much faster than alternative solutions, and the software interface easy to use. For example the implementation of PageRank functionality required only 15 lines of code, programming is intuitive.

Neo4j is one of the most popular databases in the category of graphs and only the open source option. It is a product of Neo Technologies, and the product was transferred to the community version (GPL model). This database is based on the Java platform, while the rest can be used by drivers, including Ruby and Python.

FlockDB was created by Twitter, with the aim of carrying out analytical relationship between users. There is no stable version but there is some controversy whether it is really a graph database.
The biggest difference between FlockDB and other graph-oriented databases such as Neo4j,OrientDB is access to processing nodes, which can pass through several times (graph traversal). Instead of that approach, Twitter is focused to provide immediate connections for a given node (account). For example, Twitter doesn't want to know who follows a person you follow. Instead, it is only interested in the people you follow. By trimming off graph traversal functions, FlockDB is able to allocate resources elsewhere.

AllegroGraph is a graph-oriented database is built based on the W3C specifications for the Resource Description Framework (RDF). It is designed to support projects such as Linked Data and Semantic Web. It also supports SPARQL, RDFS + + and Prolog.
AllegroGraph is the property of Franz Inc., Which has developed numerous products based on LISP, and its customers as Ford, Kodak, NASA and the Department of Defense.

GraphDB is a graph-oriented database is built for. NET platform by the German company Sones. Publicly available version can be downloaded with APL 2 license, while the enterprise version of the commercial. This database is available via the Amazon Cloud-service and Azure.

InfiniteGraph is a graph database developed by the company Objectivity. The goal of creating this database is the ability to create graph databases with large scalability. According to some texts, the database is being used by the CIA and Department of Defense.


Posted by: Dejan Petrovic


AlchemyAPI

06.04.2011

AlchemyAPI using statistical language processing algorithms for learning, content analysis, extract the target content about people, places, companies, titles, language, and more.

This tool can be used to analyze the content of Internet sites, recorded HTML or text files.

Ectract names and entities

Identifying people, companies, organizations, cities, geographical features and other entities within the standard HTML pages and text content is the main feature of this tool. This tool has advanced recognition system entities, supports multiple languages ​​and offers great opportunities to eliminate duplication, which can not be found in other solutions.

Concept Tagging

Automatic tagging of documents and text in a similar manner as they do people have enabled the advanced concept of making abstractions and annotation document with great precision.

Keyword / Term Extraction

Extraction of important terms and keywords from HTML pages and text content is made possible by advanced statistical and linguistic algorithms that analyze the contents and indicate the most important words and phrases.

Sentiment Analysis

Identification of positive, negative or neutral ll within HTML and text content is provided algorithms that find the mood at the level of documents, entities, and keywords.

Topic Categorization / Text Classification

Automatic classification issue web page and text documents has enabled the rapid classification taxonomy for some unstructured content.

Automatic Language Identification

Alchemy tool has a very robust system of identification for 97 different languages. This provides automatic routing, filtering and organization of data per language.

Text Extraction / Web Page Cleaning

Automatic cleaning web page of links, advertisements and other unwanted content easily enjoy better indexing, relevance and search data.

Getting structured data

Alchemy tool allows you to obtain structured information, such as price, product descriptions, etc.. from any web page. Adding visual information such as text labels, position, structure allows the end user easily navigate the results of the analysis.

Working with in micro and RSS / Atom formats

Alchemy identifies the use of micro format, eg. hCard formats within Web pages, and the Geo format width / length koridinate and so on.
The tool can work with RSS / Atom formats, and links and text extracts from them.

Entity name exctraction
Keyword exctraction
Text categorization
Langugage detection
Text exctraction
Structured data exctraction
Concept tagging
Sentiment analysis

AlchemyAPI »

Posted by: Dejan Petrovic


.Net Frameworks

07.03.2011

If we would be cynical, we could say that the best .Net development tool for Semantic Web and working with ontologies in fact IKVM (a tool for porting Java applications on .Net platform). However, as the market Semantic Web grows, so does the need for tools outside the Java environment, such as .Net.
We want to introduce the following .Net tools which will append the list over time.


dotNetRDF

07.03.2011

DotNetRDF goal is to create Open Source .NET libraries using the latest version of .Net Framework, which offers a powerful and simple interface for working with RDF. The goal is an effective way of working with RDF data ..

dotNetRDF offers support for various database RDF store as simple as SQL Server and MySQL to the following:
 • AllegroGraph
 • 4store
 • Fusek
 • Joseki
 • Sesame
 • Uniform HTTP SPARQL Protocol for RDF Graph Management compliant stores
 • Talis Platform
 • Virtuoso.

The necessary components to work with dotNetRDF tools are:

 • MySQL 6.0.3 Connector.Net available from the MySQL Developer site.
 • JSON.Net available from CodePlex
 • Virtuoso ADO.Net Provider available from the Virtuoso Wiki
 • HtmlAgilityPack from CodePlex.

dotNetRDF is written in C # programming language and offers a simple and powerful interface for working with RDF (Resource Description Framework) data. It offers lots of classes that aim to read and write RDF data, as well as search and perform queries against them.
This library is done primarily at the level of nodes, graphs and triples and a limited support of reasoning and support for OWL ..

dotNetRDF is primarily a tool to work with:
 - graphs
 - nodes
 - triples
 - triples store
 


dotNetRDF »

Posted by: Dejan Petrovic


Semantic MediaWiki

15.02.2011

Semantic MediaWiki is a fully functional tool with many functions that can transform information from Wikipedia into a powerful and flexible database. All data that are created through the SMW, can be easily published on the Semantic Web, and allow other systems to use these data.
Semantic MediaWiki is an addition to the MediaWiki platform - it is a wiki application that is best known for improving Wikipedia, and this helps to search, organize, tagging, browsing, evaluation and sharing of information from Wikipedia.
Traditional wiki page contains text, which the computer can not understand or evaluate. Therefore, SMW adds semantic annotations that allow data to wiki pages to function as a database.
Semantic MediaWiki was founded in 2005. and is currently in its development for more than 10 developers and is used in hundreds of sites. There are a number of extras to view, input and review data, so that the term Semantic MediaWiki, often used to categorize a whole group of application..

Semantic MediaWiki is a project that is funded as part of the EU Framework Programmes.

SMW has introduced a new markups in a classic wiki text and allow adding semantic markup. This allows simpler data structure with Wikipedia as well as easy navigation and retrieval.
Some of the advantages of using the SMW.
- Automatically generated list (this list is much more precise, but the lists that are entered manually on the Wikipedia, for example: the list of major cities, sorted by the number of people ...)
 - Various data display formats such as views in calendars, time periods, graphs and maps.
 - Improved data structures, thanks to the semantic markups
 - Dramatically improved data search
 - More language data consistency, ie. display the same information regardless of the language used in the query and of course the detection of inconsistencies (diversity) in the same terms in different languages (eg different number of inhabitants of a city)
 - External use, the interface and export data in CSV, JSON and RDF formats. This allows other applications to use source of SMW.
 - Integrate and combine data across SMW can be done without requiring separate installations of the database, simply SMW to use both external data sources (databases, RSS, web services) in combination with existing semantic information on Wikipedia to be a central source of information for users .
 


Semantic MediaWiki »

Posted by: Dejan Petrovic


S-Match, framework for semantic matching

04.02.2011

S-Match is a semantic tool, which offers several algorithms for finding pairs and semantic matching between the data nodes. It also offers the basis for the development of new algorithms that can go in that direction.
S-Match uses structures such as database schema, classification, ontologies help to find connections between data nodes that correspond to each other.

S-Match can be applied as a solution in many areas such as:
 - Integration of information from different sources
 - expanding and improving the knowledge base
 - sharing of information between nodes
 - integration of digital libraries
 - improvement of web service
 - web search engines
 - a variety of agents that deal with information

Version we are talking about is 2011.01.10.

Currently S-Match implementation contains several algorithms for finding pairs, such as implementations of the S-Match algorithm, which is versatile and adaptable for general use and many applications, as well as the minimum semantic algorithm to find pairs, using the structure of input data and produces minimum and maximum mapping, which is assessed at the end by hand, and an algorithm that preserves the existing data structure and is useful for the database.

Interface comes in the form of the command line and GUI.


S-Match »

Posted by: Dejan Petrovic


Open Calais

22.12.2010

Open Calais in our opinion, is the best semantic online web service for *English-speaking countries.

This service generates a rich collection of metadata (data describing the contents) of a given text. Calais uses everyday language processing methods, continuous machine learning and other methods, to give document analyzed and found as a result of the entities within a given text. In addition to the entities Open Calais goes a step further, by the events and facts, which are hidden in the meaning of the text. Results Calais web service can be used to search data, storage, clustering and categorization of news, blogs, catalogs and any other content.

Open Calais gives us the possibility of classifying documents by people, companies, events, products, etc..

These classifications could be viewed as maps, graphs or networks, which can significantly improve navigation on the site, to offer a classification of the context of content, that offer better tagging and organizing content and to prevent duplication of content.

Some applications of this service are:

 - Commercial optimize web content for SEO (Search Engine Optimization)
 - Media monitoring
 - Increased participation of readers and their improved navigation
 - Data management in companies
 - Assistance in content creation


Implementation of this service can be seen in our laboratory on the Lab..

 * Supported languages other than English are French and German ..


Open Calais »

Posted by: Dejan Petrovic


Facebook`s Open Graph

06.10.2010

Facebook provides a platform that allows web sites and applications to share information about different users to adjust the range of services and goods interests and tastes of people who have not even visited the site.

For example, if visitors go to a site to buy, it will offer goods and services tailored to their interests, tastes and desires.

Open Graph

On Facebook, users are connected to people they know, with public figures, with services and products they love. This platform will enable web sites and applications to share information with others.
For example, if you go to the site via Facebook profile, which for example makes recommendations about entertainment and going out, the site will have access to all information available to the public, such as favorite foods, bands, etc., so that on the basis of these data, site can supply information about restaurants and favorite music.

Tools and Plugins

Facebook has developed tools and accessories, which can be easily incorporated into Web sites and applications that enable users to share news, information and interests with your friends without logging in and reporting on those sites.
Like for example the button that you express an opinion on the fact the site also allows you to see which of your friends or share your opinions.
This information, links between users, groups and their interests and opinions of the information that is available via the extras that Facebook already offers.

Graph API

Facebook Graph API is relatively simple. It is necessary to insert a single line of HTML code to allow plugins to the social network.
Graph API allows developers to Facebook's search of information about users and objects. Customer data received in the form of a URL.

List of objects:

Objects

Group
A Facebook group

Album
A photo album

Link
A shared link

Event
A Facebook event

Post
An individual entry in a profile's feed

Application
An individual application registered on the Facebook Platform

Note
A Facebook note

Checkin
A check-in made through Facebook Places.

Photo
An individual photo

Status message
A status message on a user's wall

Page
A Facebook Page. The Page object supports Real-Time Updates for all properties except the ones marked with '*'.

User
A user profile.

Subscription
An individual subscription from an application to get real-time updates for an object type.

Video
An individual video


Graph API overview »

Posted by: Dejan Petrovic


Ontology editors

16.09.2010

An ontology describes the concepts and relationships that are important in a particular domain, providing a vocabulary for that domain as well as a computerized specification of the meaning of terms used in the vocabulary. Ontologies range from taxonomies and classifications, database schemas, to fully axiomatized theories. In recent years, ontologies have been adopted in many business and scientific communities as a way to share, reuse and process domain knowledge. Ontologies are now central to many applications such as scientific knowledge portals, information management and integration systems, electronic commerce, and semantic web services.

Protégé:
Protégé is a free, open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. At its core, Protégé implements a rich set of knowledge-modeling structures and actions that support the creation, visualization, and manipulation of ontologies in various representation formats. Protégé can be customized to provide domain-friendly support for creating knowledge models and entering data. Further, Protégé can be extended by way of a plug-in architecture and a Java-based Application Programming Interface (API) for building knowledge-based tools and applications.

TopBraid Composer:
Eclipse-based, downloadable, full support for RDFS and OWL, built-in inference engine, SWRL editor and SPARQL queries, visualization, import of XML and UML, came from TopQuadrant.
Three versions are available - Free Edition, Standard Edition* and Maestro Edition.*


Protégé » | TopBraid »

Posted by: Dejan Petrovic


Protégé - owl

16.09.2010

The Protégé-OWL editor is an extension of Protégé that supports the Web Ontology Language (OWL). OWL is the most recent development in standard ontology languages, endorsed by the World Wide Web Consortium (W3C) to promote the Semantic Web vision. "An OWL ontology may include descriptions of classes, properties and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. These entailments may be based on a single document or multiple distributed documents that have been combined using defined OWL mechanisms" .

The Protégé-OWL editor enables users to:
•Load and save OWL and RDF ontologies.
•Edit and visualize classes, properties, and SWRL rules.
•Define logical class characteristics as OWL expressions.
•Execute reasoners such as description logic classifiers.
•Edit OWL individuals for Semantic Web markup.

Protégé-OWL's flexible architecture makes it easy to configure and extend the tool. Protégé-OWL is tightly integrated with Jena and has an open-source Java API for the development of custom-tailored user interface components or arbitrary Semantic Web services.


OWLClasses
OWLClasses
Properties
Individuals
RDFClasses
OWLViz
SWRLTab

Protégé-owl » | Protégé - Wiki »

Source: Stanford University


Protégé - Frames

16.09.2010

The Protégé-Frames editor enables users to build and populate ontologies that are frame-based, in accordance with the Open Knowledge Base Connectivity protocol (OKBC). In this model, an ontology consists of a set of classes organized in a subsumption hierarchy to represent a domain's salient concepts, a set of slots associated to classes to describe their properties and relationships, and a set of instances of those classes - individual exemplars of the concepts that hold specific values for their properties.

The Protégé-Frames editor provides a full-fledged user interface and knowledge server to support users in constructing and storing frame-based domain ontologies, customizing data entry forms, and entering instance data. Protégé-Frames implements a knowledge model which is compatible with the Open Knowledge Base Connectivity protocol (OKBC).

Features of Protégé-Frames include:
•A wide set of user interface elements that can be customized to enable users to model knowledge and enter data in domain-friendly forms.
•A plug-in architecture that can be extended with custom-designed elements, such as graphical components (e.g., graphs and tables), media (e.g., sound, images, and video), various storage formats (e.g., RDF, XML, HTML, and database back-ends), and additional support tools (e.g., for ontology management, ontology visualization, inference and reasoning, etc.).
•A Java-based Application Programming Interface (API) that makes it possible for plug-ins and other applications to access, use, and display ontologies created with Protégé-Frames.


classes-tab
forms-tab
instances-tab

Protégé-Frames »

Source: Stanford University


TopBraid Composer

16.09.2010

Free Edition
TopBraid Composer is a professional development environment for W3C's Semantic Web standards RDF Schema, the OWL Web Ontology Language and the SPARQL Query Language. The Free Edition is an entry-level tool for creating and editing RDF/OWL files and running SPARQL queries over them. The Free Edition also provides support for defining business rules and integrity constraints using the SPARQL Rules (SPIN). The Free Edition does not include Support and Maintenance. Users of the Free Edition can send their questions to the TopBraid Composer mailing list, but priority will be given to customers of the commercial editions.

Standard Edition
The Standard Edition greatly extends the Free Edition. It provides visual editors for RDF graphs and class diagrams. While the Free Edition can only be used to work with files, the Standard Edition provides scalable database backends (such as Jena SDB/TDB, AllegroGraph, Oracle 11g and Sesame) as well as utilities for importing-from and exporting-to XML, Excel, RDBMs and other data formats.  Beside SPARQL Rules (SPIN), the Standard Edition also supports other inference engines including Jena rules and the Semantic Web Rule Language (SWRL) as well as OWLIM and Pellet.

Composer provides a comprehensive set of features to cover the whole life cycle of semantic application development. In addition to being a complete ontology editor with refactoring support, Composer also can be used as a run-time environment to execute rules, queries, reasoners and mash-ups. Based on Eclipse, Composer can also be extended with custom Java plug-ins. This supports the rapid development of semantic applications in a single platform.

Maestro Edition
TopBraid Composer - Maestro Edition (TBC-ME) is the most comprehensive version of TopBraid Composer. It is optimized for developing web applications and services based on the TopBraid Live platform.

Most notably, TBC-ME can be used to develop and execute SPARQLMotion scripts including web services for processing data chains and creating integrated data services. TBC-ME includes its own internal web server for testing applications resulting in significantly improved turn-around times in the application development. TBC can also be used to run TopBraid Ensemble and assemble Ensemble-based applications. To run TopBraid Ensemble out-of-the-box within TBC-ME, open a web browser and go to http://localhost:8083/tbl/. For other features unique to the Maestro Edition, consult the comparison table below.


Feature Comparison

Capability

Free

Standard

Maestro

Load, Edit and Save RDF/XML, N3 and N-Triples files

Yes

Yes

Yes

Define ontologies using form-based editors

Yes

Yes

Yes

Define ontologies using a graphical editor

--

Yes

Yes

Create and execute SPARQL Queries

Yes

Yes

Yes

Create and execute SPARQL Rules (SPIN)

Yes

Yes

Yes

Create and execute  SWRL and Jena rules

--

Yes

Yes

Import and convert to RDF data from XML, UML, Spreadsheets, RSS/Atom Feeds, and Relational Databases

--

Yes

Yes

Generate XML Schemas from RDF/OWL

--

Yes

Yes

Roundtrip between XML and RDF/OWL (import – export)

--

--

Yes

Work with RDF databases

--

Yes

Yes

Work with different reasoners and configure inference options

--

Yes

Yes

Constraint Checking to validate user input (using SPARQL Rules (SPIN))

Yes

Yes

Yes

Query relational databases in real time

--

Yes

Yes

Visualize RDF data using graphs, diagrams, maps, matrixes and calendars

--

Yes

Yes

Merge and re-factor RDF data across different namespaces and data sources

basic

Yes

Yes

Run TopBraid Ensemble and other web applications developed for the TopBraid Live platform

--

--

Yes

Generate arbitrary (e.g., HTML) documents from data in your ontology using an integrated semantic Java Server Pages engine -- -- Yes

Define and execute RDF data processing chains (SPARQLMotion)

--

--

Yes

Generate business intelligence reports and inserted into web pages to drive semantically enriched web applications

--

--

Yes

Convert e-mails into OWL, supporting semantic analysis and classification of emails

--

--

Yes


TopQuadrant »

Source: TopQuadrant


Tools for development

16.09.2010

Tools for developing and working with ontologies are libraries that call framevorks, which integrate into our development environment. They have to provide the basic functions of working with file formats used in ontologies such as RDF, RDFS, and OWL1 OWL2, which are defined and standardized by the W3C.
The most common primary functions are: writing, reading and parsing the ontology, and ontology instancing and serialising in memory and database. The best tools are written in the Java programming language, almost all are open source under a liberal licenses.

Posted by: Dejan Petrovic


SERF - Java

17.09.2010

Stanford Entity Resolution Framework

Overview
The goal of the SERF project is to develop a generic infrastructure for Entity Resolution (ER). ER (also known as deduplication, or record linkage) is an important information integration problem: The same "real-world entities" (e.g., customers, or products) are referred to in different ways in multiple data records. For instance, two records on the same person may provide different name spellings, and addresses may differ. The goal of ER is to "resolve" entities, by identifying the records that represent the same entity and reconciling them to obtain one record per entity.

In their approach, the functions that "match" records (i.e. decide whether they represent the same entity) and "merge" them are viewed as black-boxes, which permits generic, extensible ER solutions. This generic setting makes ER resemble a database join operation (of the initial set of records with itself), but there are two main differences: (a) in general, they have no knowledge about which records may match, so all pairs of records need to be compared using the match function, and (b) merged records may lead us to discover new matches, therefore a "feed-back loop" must compare them against the rest of the data set.

Some of the challenges they are addressing in the SERF project include:

Performance: Entity resolution algorithms must perform a very large number of comparisons. They identified simple and reasonable properties of the match and merge functions that enable efficient processing, and developed optimal algorithms.
 
Distribution: As ER is a compute-intensive process, they develop algorithms for distributing the ER workload across multiple processors. When available, they exploit domain knowledge in the distribution of ER.

Secondary storage: They are developing optimizations to efficiently perform ER when the dataset resolved by one processor does not fit into main memory to fetch and write records to disk as efficiently as possible.

Numerical confidences: They consider numerical confidences associated with data records, and extend their framework to manipulate and combine these confidences as records are matched and merged. New algorithms are needed to perform ER efficiently when confidences are involved.

Negative information: ER can be viewed as a non-monotonic incremental process where previous match or merge decisions may be reconsidered as further records are processed. Maintaining the history of record derivations is key to managing these revisions consistently and efficiently .

Blocking: They are developing iterative blocking techniques to significantly enhance the ER performance as well as accuracy. When processing a block, they exploit the ER results of previously processed blocks.

Joint ER: They are developing ER techniques that resolve multiple domains at the same time for better accuracy.

ER Measures: They explore a configurable ER measure (inspired by edit distance) that can accurately evaluate ER results.

Trio-ER
: The Trio-ER system is a new variant of the Trio system tailored specifically as a workbench for entity resolution.

Evolving Rules
: When writing ER applications, the rule for comparing records may change frequently with better understanding of the data, schema, and application. They investigate how to efficiently update an ER result given a new rule for comparing records.

Pay-As-You-Go ER
: Many ER applications need to resolve large data sets efficiently, but do not require the ER result to be exact. They investigate techniques for maximizing the ER quality with minimal work.


SERF »


Source: Stanford University


Jena - Java

15.07.2010

Jena is Java toolkit for developing semantic web applications based on W3C recommendations for RDF and OWL. It provides an RDF API; ARP, an RDF parser; SPARQL, the W3C RDF query language; an OWL API; and rule-based inference for RDFS and OWL.

The Resource Description Framework (RDF) recently became a W3C recommendation, taking its place alongside other Web standards such as XML and SOAP. RDF can be applied in fields that deal with ad-hoc incoming data, such as CRM, and is already being widely used in social networking and self-publishing software like LiveJournal and TypePad.

Java programmers will increasingly benefit from having the skills to work with RDF models.

The Jena Framework includes:

A RDF API
Reading and writing RDF in RDF/XML, N3 and N-Triples
An OWL API
In-memory and persistent storage
SPARQL query engine


Jena project »

Source: Stanford University


OWL API - Java

08.06.2010

The OWL API is a Java API and reference implementation for creating, manipulating and serialising OWL Ontologies. The latest version of the API is focused towards OWL 2.

The OWL API is open source and is available under the LGPL License

The OWL API includes the following components:

• An API for OWL 2 and an efficient in-memory reference implementation
• RDF/XML parser and writer
• OWL/XML parser and writer
• OWL Functional Syntax parser and writer
• Turtle parser and writer
• KRSS parser
• OBO Flat file format parser
• Reasoner interfaces for working with reasoners such as FaCT++, HermiT, Pellet and Racer


OWL API Home »

Source: W3C


Reasoners

16.09.2010

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with.

Posted by: Dejan Petrovic


Pellet

04.08.2010

For applications that need to represent and reason about information using OWL, Pellet is the leading choice for systems where sound-and-complete OWL DL reasoning is essential. Pellet includes support for OWL2 profiles including OWL2 EL. It incorporates optimizations for nominals, conjunctive query answering, and incremental reasoning.


Pellet supports reasoning with the full expressivity of OWL-DL (SHOIN(D) in Description Logic jargon) and has been extended to support the forthcoming OWL2 specification (SROIQ(D)), which adds the following language constructs:

qualified cardinality restrictions
complex subproperty axioms (between a property chain and a property)
local reflexivity restrictions
reflexive, irreflexive, symmetric, and anti-symmetric properties
disjoint properties
negative property assertions
vocabulary sharing (punning) between individuals, classes, and properties
user-defined dataranges

Pellet also provides reasoning with the following features from OWL Full:
inverse functional datatype properties

Pellet provides all the standard inference services that are traditionally provided by DL reasoners:

Consistency checking
Ensures that an ontology does not contain any contradictory facts. The OWL2 Direct Semantics provides the formal definition of ontology consistency used by Pellet.

Concept satisfiability
Determines whether it’s possible for a class to have any instances. If a class is unsatisfiable, then defining an instance of that class will cause the whole ontology to be inconsistent.

Classification
Computes the subclass relations between every named class to create the complete class hierarchy. The class hierarchy can be used to answer queries such as getting all or only the direct subclasses of a class.

Realization
Finds the most specific classes that an individual belongs to; i.e., realization computes the direct types for each of the individuals. Realization can only be performed after classification since direct types are defined with respect to a class hierarchy. Using the classification hierarchy, it is also possible to get all the types for each individual.


Multiple Interfaces to the Reasoner

Pellet provides many different ways to access its reasoning capabilities:

An Web-based demonstration page OWLSight
A command line program (included in the distribution package)
Programmatic API that can be used in a standalone application
The reasoner interfaces in the Manchester OWL-API and Jena
Direct integration with the Protégé ontology editor
A DIG server that allows Pellet to be used with different clients such as Protégé-OWL Editor

SPARQL-DL Conjunctive Query Answering

Pellet includes an ABox query engine which supports answering conjunctive ABox queries with or without non-distinguished variables. Answering queries that contain cycles through non-distinguished variables with respect to an OWL-DL ontology is an open problem and is not presently supported. Cycles through distinguished variables can be handled and several optimizations have been implemented in Pellet’s query engine. In the presence of non-distinguished query variables, the rolling-up technique is used to answer the queries.

Queries can be formulated using SPARQL. The SPARQL query forms SELECT, CONSTRUCT, and ASK are supported; DESCRIBE isn’t, neither are OPTIONAL or FILTER. It’s possible to answer such queries using Jena’s SPARQL query engine on a Pellet-backed inference model.

Datatype Reasoning

OWL allows ontologies to use simple built-in datatypes from XML Schema. XML Schema has a rich set of basic datatypes including various numeric types, strings, and date-time types.

OWL2 includes support for embedding the definitions of user-defined data ranges in OWL ontologies as in this example ontology. Pellet supports reasoning with all the built-in datatypes defined in XML Schema plus any user-defined data ranges that extend numeric or date/time derived types.

SWRL Rules Support

Pellet has an implementation of a direct tableau algorithm for a DL-safe rules extension to OWL-DL. This implementation allows one to load and reason with DL-safe rules encoded in SWRL and includes support for some SWRL built-ins.

Ontology Analysis and Repair

OWL has two major dialects, OWL DL and OWL Full—the former is a subset of the latter. All OWL ontologies are encoded as RDF graphs.

OWL DL imposes a number of restrictions on RDF graphs, some of which are substantial (e.g., that the set of class names and individual names be disjoint) and some less so (that every item have a rdf:type triple). Ensuring that an RDF document meets all the restrictions can be a relatively difficult task for authors. Many existing OWL documents are nominally OWL Full, even though their authors intend for them to be OWL DL.

Pellet incorporates a number of heuristics to detect OWL Full ontologies that can be expressed as OWL DL ontologies and can repair them accordingly.

Ontology Debugging

Detection of unsatisfiable concepts in an ontology is a simple task. However, the diagnosis and resolution of the bug is generally not supported at all. For example, no explanation is typically given as to why the error occurs (e.g., by pinpointing axioms in the ontology responsible for the clash) or how dependencies between classes cause the error to propagate (i.e., by distinguishing root from derived unsatisfiable classes).

Pellet provides support for both kinds of tasks by pinpointing axioms that cause an inconsistency and the relation between unsatisfiable concepts.

Incremental Reasoning

An important challenge for Pellet is to reason with changing knowledge bases.
Incremental reasoning means the ability of the reasoner to process updates (additions or removals) applied to an ontology without having to perform all the reasoning steps from scratch. Pellet supports two different incremental reasoning techniques: incremental consistency checking and incremental classification. These techniques are applicable under different conditions and provide benefits for different use cases.


Clark&Parsia »

Source: Clark&Parsia


Fact++

21.07.2010

FaCT++ is the new generation of the well-known FaCT OWL-DL reasoner. FaCT++ uses the established FaCT algorithms, but with a different internal architecture. Additionally, FaCT++ is implementated using C++ in order to create a more efficient software tool, and to maximise portability. New optimisations have also been introduced, and some new features added.

FaCT++ his partially supporting OWL2. The missing bits are:
•Top/Bottom Object and Data property semantics: only the names and the hierarchy positions are in place at the moment.
•No support for keys
•Partial datatype support. At the moment, the only supported datatypes are Literal, string, anyURI, boolean, float, double, integer, int, nonNegativeInteger.

University of Manchester » | Fact++ Home »

Source: University of Manchester


HermiT

02.06.2010

HermiT is reasoner for ontologies written using the Web Ontology Language (OWL). Given an OWL file, HermiT can determine whether or not the ontology is consistent, identify subsumption relationships between classes, and much more.

HermiT is the first publicly-available OWL reasoner based on a novel “hypertableau” calculus which provides much more efficient reasoning than any previously-known algorithm. Ontologies which previously required minutes or hours to classify can often by classified in seconds by HermiT, and HermiT is the first reasoner able to classify a number of ontologies which had previously proven too complex for any available system to handle.

HermiT uses direct semantics and passes all OWL2 conformance tests for direct semantics reasoners.


hermit-reasoner.com »

Source: hermit-reasoner.com