Semantic Web – An Introduction
Abstract
World Wide Web in its current form has
lot of information, which can be inferred and used only by the humans. Semantic
Web is the means to give structure to this meaning contained in the web so that
machines can make use of this information and create even more information and
knowledge without human intervention. It is the means to give human like
intelligence, which in scientific terms is called as Artificial Intelligence,
to every software application running in the web. Semantic Web in its entirety
has the promise to empower software applications to draw conclusions and
inferences - given pieces of information just like humans do. Just like web
technologies/standards such as HTTP, HTML, URLs provided the foundation to
share information in the way it is done today, Semantic Web
technologies/standards provides the foundation for software applications to be
more intelligent or smart.
Key concepts of Semantic Web.
URIs
Unique Resource Identifier was a
groundbreaking concept as it allowed to uniquely identify a html document in
this vast sea of www, URIs led to even more groundbreaking concept known as 'hyperlinks' which allowed navigation
from one page to another. This resulted in the creation of huge directed graph
of web documents. It is this hyperlink that empowered Google to do what it is
able to do today i.e. search relevant web pages in the web. Imagine how the
world would have been without Google!!
Semantic Web extends this concept from
web pages to data contained in the web pages. URI is the stepping-stone towards
achieving the goal that semantic web envisions. Semantic Web aims to have
Unique Identifiers for every resource that is present in the web - be it a web
page OR an article in the web page OR a person referred in the article OR a
person's attribute (say name, date of birth) describing the person. Having this
implemented in its entirety WWW will be transformed from huge directed graph of
documents to huge directed and named
graph of data, which is called as Linked
Data.
Linked Data
Every software application has some
persisted data that it uses to provide meaningful information to its user. This
data is conventionally stored in databases, either structured (RDMS) or
unstructured (nosql stores). RDMS stores data in the form of mxn tables with links b/w the tables in the form of foreign keys while
nosql stores in it tuples. These tables store raw data while information is
extracted by means of sql queries or application logic. In both cases
application have to have a strong understanding of the data being stored i.e.
they need human intelligence. There can’t be a generic query or logic built so
that information can be automatically extracted without human intelligence.
Considering a database schema as follows:
Person:
Id
|
Name
|
DOB
|
1
|
Ram
|
30-04-1976
|
2
|
Shyam
|
30-09-1987
|
Location:
Id
|
Person (foreign key to Person Id)
|
Location
|
Date
|
10
|
1
|
Miami
|
15-08-2013
|
Photograph:
Id
|
taken_by (foreign key to Person Id)
|
Title
|
taken_at
|
100
|
1
|
Pic1
|
15-08-2013
|
200
|
2
|
Pic2
|
15-08-2011
|
With the above data one can deduce the below information:
- Ram was at Miami on 15-08-2013.
- Pic1’s photographer was Ram.
- Pic2’s photographer was Shyam.
All the above information could be
extracted because there existed a foreign key (link) b/w location:person & person:id
and photograph:taken_by & person:id. These links are based on
the schema (created while designing the schema) and not on the data itself.
With such a limitation of RDMS to create only static links, RDMS fails to
establish many more links that are dynamic and could be established among data.
In the above example there exists a link b/w Photograph:taken_by[id=1] and Location:Person[id=1].
This link is nothing but the information – Pic1
was taken at Miami, which any human can deduce but with the current schema applications
can’t.
Moreover links b/w pieces of data (by
means of foreign keys) is achieved using Ids, in the above example 1 and 2, which is understood only by the application that created the
data - thus restricting the data within the boundary walls of the application.
Such links can’t be used by other applications.
Linked Data promises to solve these
problems by giving a common standard, known as RDF, of modeling data that is in the form of graph. Each node and
link b/w two nodes in the graph is given a unique identifier (URI).
Applications can now create links on the basis of data and not just based on
the schema, information can be extracted without human intelligence by writing
generic logic to discover more links in the graph using existing links. Since
the Links and Nodes are published on the web any application running on the web
can make use of them.
In a nutshell Linked Data provides a
common standard for storing application data and breaks the data-barriers.
RDF
RDF - Resource description framework is
W3C specification for modeling resources on the web. Information is modeled in
the form of subject-predicate-object
expressions. In RDF terminology such an expression is called a statement; also
know as triplets. We as humans use natural language to pass on information
among each other where the smallest unit of information is a statement. Similarly
RDF statement is the smallest unit of information that can be understood by a
machine. For e.g. the statement - ‘Pic1’s photographer was Ram’ is modeled in
RDF as (Pic1 - the subject, photographer - the predicate, Ram - object). This triplet means Ram
has an attribute named 'clicked' whose value is ‘Pic1’. Each resource is
identified by a unique URI. In the above triplet example Ram, photographer and Pic1
can be identified by ‘www.sush.org/ram’, ‘www.sush.org/vocab/photograph#photgrapher’ and ‘www.sush.org/pictures/pic1’ respectively. Any other attribute that is added to Pic1 by any
application gets linked to the resource (www.sush.org/pictures/pic1) and thereby facilitating applications to discover more attributes
for the resource.
In a nutshell RDF provides to application
what grammar provides to natural language.
RDFs & OWL
Grammar provides guidelines to structure
statements and hence makes its easy for humans to communicate using natural
language. Just having the structure is not enough, one has to have a rich vocabulary,
which can be used to denote common understanding of a concept for e.g. the word
- ‘mammal’ is used to denote all animals that give birth to their child.
Similarly, RDF Schema (RDFs) is the
means to create vocabulary, which can be used to create statements/triplets. It
is the means to structure the abstraction of a concept so that it could be
inferred and shared by machines. In the aforementioned example ‘www.sush.org/vocab/photograph’ is a vocabulary denoting the abstraction of the concept - photograph
and the property ‘photographer’ is
one among the many attributes of the concept.
Having vocabularies defined and published
in a structured manner, applications will not only be able to infer meaning of
a given piece of data but also be able to reconcile disparately represented
data that have similar meaning. In the above example the statement ‘pic1’s photographer
is Ram’ has the similar meaning to the statement ‘pic1’s shutterbug is Ram’.
These two statements imply that the word photographer
and shutterbug are synonyms. With
RDFs it is possible to model such concepts and publish on the web such that
applications can infer the similarity of meaning. Applications can increase
their vocabulary by using other application’s vocabulary; build a common
vocabulary so that they can interact with each other without the need of human
intelligence.
OWL is an extension to RDFs and aims to
overcome the limitations of RDFs, details of which are out of scope for this
document.
In a nutshell RDFs & OWL facilitates
common vocabulary, thesaurus to be published and consumed on the web and
thereby enabling applications to increase their knowledge base.
Applications
Form Fill & validation
Almost all web application takes input
data from the user using forms. Data contained in these forms depend on the
domain of the application for e.g. an online ticketing system will have a form
like:
- Passenger Name: _______________
- Date of Birth : _______________
- Gender : _______________
- Email Id : _______________
- Mobile No. : _______________
- Date of Journey: _______________
- From : _______________
- To : _______________
In this form the fields numbered from 2
through 5 is dependent on the field #1, these fields are more or less constants
w.r.t to the passenger name. Should it not suffice for the user to fill only
field #1 and rest are automatically filled by the application given that all
such data is already available somewhere on the web? Such a thing is not
possible today because application’s do not publish their form data, even if
they publish that is done in the form of web-services which can’t be understood
by other applications. Enabling forms semantically will empower applications to
auto-fill forms. When applications publish their data using standard
ontology/vocabulary data will always be sane and validation code would no more
be required.
Auto Tagging
The concept of hashtag is a very good means of grouping similar
objects but tagging has to be done by humans using their intelligence.
Applications can’t automatically tag similar things.
Consider the below example:
Ram goes on a holiday to Miami and takes few cool pictures using his
smartphone at Miami. His smartphone has location services turned on which
detects the current location of Ram as Miami and publishes it as RDF on the web
in the form of a triplet as: (Ram,
location, Ram-location), (Ram-location, time, 15-08-2013) & (Ram-location,
place, Miami)
Ram then takes a pic using smartphone which records the new pic’s data
in the form of triplet as: (BeachPic,
taken-on, 15-08-2013), (BeachPic, taken-by, Ram). This smart phone then
discovers the information - (Ram,
location, Ram-location), (Ram-location, time, 15-08-2013) & (Ram-location,
place, Miami) about Ram’s location on the web. Combining all these
information application infers that the pic was taken at Miami and hence tags
the hashtag #Miami to the pic.
Recommendation
Recommendation is given based on knowledge, which is nothing but an
action taken on an event that has already happened in the past. E-commerce
sites such as amazon.com shows a list of products saying ‘people who bought A also bought these’ when one buys A on the
e-commerce platform. Such a thing is possible based on the knowledge gained by
extracting information from the buying data collected over time by the
e-commerce application. Such knowledge extraction requires complex database
queries or application logic, which requires human intelligence.
If A then B like knowledge
may not always be restricted within the domain of the application as in above
example buying behavior of consumers. With Semantic Web technologies not only
such knowledge can be inferred by applications themselves following the nodes
of a graph but also knowledge outside the boundaries of domain and application
can be found.
Consider the following RDF statements generated in the same order as
mentioned:
- (Ram, booked, Ticket1) (Ticket1, destination, Agra)
- (Ram, visited, Tajmahal)
- (Shyam, booked, Ticket2) (Ticket2, destination, Agra)
- (Shyam, visited, Tajmahal)
- (Ganga, booked, Ticket3) (Ticket3, destination, Agra)
- (Ganga, visited, Tajmahal) can be automatically inferred by application and can recommend TajMahal as a good place to visit to the user when she books a ticket to Agra.
Interoperability/Integration
SOA is the widely accepted concept to build web applications wherein
applications expose web services to achieve integration and interoperability.
Though such a method does serve the purpose but demands a significant
development cost. By publishing data using Semantic web technologies
applications do not have to worry about publishing web-services for
interoperability or integration.
Consider an example:
Ram books a ticket from Pune to Mumbai on 15-08-2013 and a return
ticket from Mumbai to Pune on 17-08-2013 on a online ticketing system (OTS).
Ram’s calendar app, which manages Ram’s management automatically, discovers
this information published by OTS, learns that Ram is not present at Pune
during 15-08-2013 through 17-08-2013 and closes all appointment slots of Ram at
Pune during the period. There is no need to write integration/stitching code
consuming web services of various OTSes to achieve the functionality. Only
certain generic rules have to be written which will provide the desired
functionality irrespective of which OTS Ram uses to book his ticket.
Conclusion
In a nutshell Semantic web technologies fits well for applications
that take data directly from the user by means of web-forms OR indirectly by
logging user’s activity and throws back some information extracted out of the
collected data. Example domains are - Content Management Systems, e-learning
portals, e-commerce platforms, blogging platform, social networking platforms
etc.
If this document
and Google was semantic web enabled, Google could have answered queries such as
‘document written on semantic web by sushant’.
Labels: Semantic Web
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home