AsTMa, Asymptotic Topic Map Notation

v0.999, 2001-07-25
Draft

Robert (\rho) Barta
Bond University

Contents

0. Changes

From 0.99 to 0.999
Syntax of association changed: Now association types are also in ()'s ---
From 0.9 to 0.99
.

1. Introduction

Since the stabilisation of XTM, an XML-based notation for Topic Maps the interest in authoring Topic Maps has increased.

While the automatic generation of topic maps towards XTM can easily be achieved, manual authoring is tedious and error-prone, even if one uses XML aware development tools, such as XML-editors. This may change when Topic Map authoring tools will be available on a broad basis, but this still has to come. Server-side solutions are certainly more powerful, but are too slow and inconvenient to use at this stage of writing.

In the following we suggest a textual notation, AsTMa, sufficiently rich to prototype medium sized topic maps. This notation is heavily influenced by LTM, Ontopias Linear Topic Map Notation where the need for a simplified notation was already acknowledged. Moreover, AsTMa has the following design objectives:

Minimum of effort:
A converter should be able to interpret the intention of the author in a specific context reducing the verbosity of the language.
Minimum of special characters:
Banning of [(&^%$}] delimiters should increase the usability of the language. This also reduces the need to escape these special characters once they belong to the information.
Asymptotic regarding to XTM:
The language should not have a built-in syntax-barrier making it impossible to reach the same expressiveness as XTM.
Keep things together:
The author should NOT be forced to split up (topic) information into several fragments, which have to be merged via TNC by a followup Topic Map processor.

At this stage AsTMa does not fulfill all of the above objectives.

This document has no formal status. It is a technical report of the local university.

2. Tutorial

The setting assumes that the AsTMa text will be either directly understood by Topic Map processing software or that a specialized processor will convert the AsTMa stream into an XTM stream.

First we present the core concepts in a short tutorial, before we turn to a semi-formal language specification. You can find the running example used throughout the tutorial at the Bond Topicmap Server. You might want to peruse an online converter.

2.1 Basics

AsTMa is line oriented. This means that pertinent information is terminated with the end of the line. A single line containing
   filesystem (software)

already defines a topic (as explained below). If there is more to a topic (or an association) this information will be on follow-up lines:

   filesystem (software)
   bn: File System

An empty line, thus, separates items like topic and associations. On any line white-spaces are silently ignored. Any line also can contain comment like

   filesystem (software) # more information will follow

Such comments will be discarded by any processor and are only for internal documentation purposes. If you would like to have a comment in the processor output, then this comment MUST begin at the start of the line:

AsTMaXTM
# I will survive and (hopefully) 
#     the line structure will not
#        be broken
<!--  I will survive and (hopefully)
          the line structure will not
             be broken -->

Comments on consecutive lines will be treated as one comment. Any non-comment line signals the end of a group. Also, any '-->' occurrence within a comment will be converted into '- - >'.

2.2 Topics

The line
   filesystem (software)

indicates the definition of topic with id filesystem which is an instance of another topic, software:

AsTMaXTM
filesystem (software)
<topic id="filesystem">
   <instanceOf>
     <topicRef xlink:href="#software"/>
   </instanceOf>
   <baseName>
     <baseNameString>filesystem</baseNameString>
   </baseName>
</topic>

As we did not provide a base name, the topic id 'filesystem' is also assumed to be the basename. While this heuristic approach works fine for some words, it does not with others, say,

   linux-distribution (software)

Any AsTMa processor is free to apply any other heuristics, such as:

AsTMaXTM
linux-distribution (software)
<topic id="linux-distribution">
  <instanceOf>
    <topicRef xlink:href="#software"/>
  </instanceOf>
  <baseName>
     <baseNameString>linux distribution</baseNameString>
  </baseName>
</topic>

substituting dashes by blanks, looking up 3rd-party databases or leaving it as it is. Of course, the author can enforce a particular base name:

AsTMaXTM
RedHat-Linux-sparc (linux-distribution-port)
bn: RedHat Linux for SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>RedHat Linux for SPARC</baseNameString>
  </baseName>
</topic>

On a similar take, you can also specify occurrences for topics:

AsTMaXTM
linux (os)
bn: Linux kernel
oc: http://www.kernel.org/
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#os"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.kernel.org/"/>
  </occurrence>
</topic>

in the case for resource references or also for inline data (aka resourceData):

AsTMaXTM
linux-port-on-sparc (linux-port)
bn: SPARC Linux port
oc: http://www.sparc.org/linux.shtml
in: The kernel and kernel modules \
    are 64-bit on sparc64, \
    userland is still 32-bit, \
    and in fact the same as on sparc32.
<topic id="linux-port-on-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>SPARC Linux port</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.sparc.org/linux.shtml"/>
  </occurrence>
  <occurrence>
    <resourceData>The kernel and kernel mod....</resourceData>
</occurrence>
</topic>

If appropriate, you can also type topic characteristics:

AsTMaXTM
reiserfs (filesystem)
bn: Reiser File System, ReiserFS
oc (download): http://www.namesys.com/download.html
<topic id="reiserfs">
  <instanceOf>
    <topicRef xlink:href="#filesystem"/>
  </instanceOf>
  <baseName>
     <baseNameString>Reiser File System, ReiserFS</baseNameString>
  </baseName>
  <occurrence>
    <instanceOf>
       <topicRef xlink:href="#download"/>
    </instanceOf>
    <resourceRef xlink:href="http://www.namesys.com/download.html"/>
  </occurrence>
</topic>

To scope a characteristic you use '@' to introduce a particular context:

AsTMaXTM
RedHat-Linux-sparc (linux-distribution-port)
bn: RedHat Linux for SPARC
bn @ deutsch : RedHat Linux für SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>RedHat Linux for SPARC</baseNameString>
  </baseName>
  <baseName><scope><topicRef xlink:href="#deutsch"/></scope>
     <baseNameString>RedHat Linux für SPARC</baseNameString>
  </baseName>
</topic>

2.3 Associations

Associations may or may not have a particular type. In any case they have a number of members playing roles:

AsTMaXTM
(kernel-patch-provides-feature)
feature: reiserfs
platform: i386
patch:   generic-reiserfs-patch-2.4.x
<association>
  <instanceOf>
    <topicRef xlink:href="#kernel-patch-provides-feature"/>
  </instanceOf>
  <member>
     <roleSpec>
       <topicRef xlink:href="#feature"/>
     </roleSpec>
     <topicRef xlink:href="#reiserfs"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#platform"/>
     </roleSpec>
     <topicRef xlink:href="#i386"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#patch"/>
     </roleSpec>
     <topicRef xlink:href="#generic-reiserfs-patch-2.4.x"/>
  </member>
</association>

For better readability you may want to indent the roles

(kernel-patch-provides-feature)
      feature: reiserfs
      platform: i386
      patch:   generic-reiserfs-patch-2.4.x

2.4 Topic Maps

To inform the processor about the name (id) of the topic map itself, the very first non-empty line within the document MUST provide it:

AsTMaXTM
sparclinux : iso-8859-1
<?xml version="1.0" encoding="iso-8859-1"?>
<topicMap id="sparclinux"
          xmlns       = 'http://www.topicmaps.org/xtm/1.0/'
          xmlns:xlink = 'http://www.w3.org/1999/xlink'>

Optionally, you can specify an particular encoding, like in the example above. The encoding defaults to iso-8859-1, though.

2.5 Macros

For authors who have no access to a general macro expansion environment, the language supports a rudimentary macro facility which comes handy when to abbreviate long strings. The idea is to first declare a macro via

  de=http://www.topicmaps.org/xtm/1.0/language.xtm#de

somewhere towards the beginning of the document and then use this definition for, say, scoping:

AsTMaXTM
oc @ &de; (press-release) : http://www....
<occurrence>
  <scope><topicRef xlink:href="http://www.topicmaps.org/xtm/1.0/language.xtm#de"/></scope>
  <instanceOf>
     <topicRef xlink:href="#press-release"/>
  </instanceOf>
  <resourceRef xlink:href="http://www.s...."/>
</occurrence>

Every instance of &de; throughout the document will be expanded.

While these macros cannot have parameters, they will be evaluated recursively, i.e. macros can contain other macros. Any processor MAY detect circular definitions.

It goes without saying that the notation &...; may collide with other XML entities. It lies in the responsibility of the author to take care of that.

3. Language Definition

tbd, just thoughts at this stage.
special character in id, encoding according to www.isi.edu/in-notes/iana/assignments/character-sets, causal stream, no implicit merging (not even with ids)
conformant converter is free to produce any additional topic/assocs appropriate??
in terms of an abstract pattern processor?
causal streams (really?) , performance at bigger maps
specifying additional behavior? auto complete, defaults for assocs (types?)

4. Conclusions

First experiments with the language have shown that most situations can be disambiguated by a proper AsTMa processor. Still, a couple of issues remain:

Completeness
The language is far from being complete relative to XTM. At a second thought, though, the expressiveness of XTM should not be the goal. Along the same lines as XTM lacks all kinds of application specific constraints, also AsTMa should develop in a language to define appropriate ontological models.
Border cases
There are - due to the declarativeness of the language - some cases in which coverters will produce some non-intuitive results, depending on the implementation approach taken. This has to be resolved.
Relationship to LTM
The existence of LTM itself has shown that there is definitely a need for a textual notation for authors who are used to the expressiveness of textual user interfaces. Although LTM may have some shortcomings relative to AsTMa, it might prove valuable in other contexts. So, for instance, is it not possible in AsTMa to define topics directly within an association, whereas AsTMa's line orientedness is more convenient in text editors or when fighting along the sed-awk-perl-grep monsters.

To Be Done


© 2001 Robert Barta, Bond University