This was originally posted in the July First thursday Topic, but really belongs here, I think.
This follows on from the 'off-topic' conversation on Friday (Oct 30) about appropedia / semantic web, and is also related to the recurring idea that Pamela and I keep touching on about 'pattern language' type approaches. I am trying to set out some basic thoughts I have about this sort of thing in a clear way, and post some links to what I consider to be seminal material, hopefully as the beginning of a shared set of knowledge and information in which to base these conversations. All opinions and assumptions are subject to discussion and revision!
This is Pamela's topic, of course - so this is bound to be wrong / incomplete - re-statement / enlargement / clarification please!
Someone involved in a development project has a question. They don't have time to conduct in-depth research, but they have a clear idea of their situation / need. The system would allow them to communicate that situation, whereupon it would direct them to any relevant entries.
Implications of the Ideal
- That the knowledge base has enough content to be useful often enough ( at least 70% of the time would be a starting guess).
- That information is structured in such a way as to make connections and relevance machine-discoverable from a user search.
- That machine-discovered information can be presented in a useful, human-readable format.
Unpacking the implications
1. Enough Content
This issue is essentially only crackable in two ways; either a major investment of time and money to commission experts to build the minimal viable content [Encyclopedia Britannica] or offer a mechanism whereby the user base spontaneously self-compiles the information [Wikipedia].
It seems axiomatic that DadaMac should pursue the second approach.
2. Machine Discoverable
Various approaches have been proposed to allow free-form, human-readable information to be made machine navigable. Of these, the various 'semantic' approaches would seem to be the most well-known. These rely on the text containing data being marked up with what are called 'triples', of Subject :: Predicate :: Object, such as: Germany :: has Capital :: Berlin. These triples then connect objects in chains, so that Germany would be identified as a country, countries would be identified as having capitals, capitals identified as being cities, cities as having populations, and so on. For more information, you could look here and here.
There are various problems with this, the most immediate being that the process of marking-up text is clunky, onerous and fraught with problems. I will describe two serious ones;
Ambiguity / indistinct terminology - it is fairly easy to see that Berlin should be described as the capital of Germany, but there are many situations where ambiguity will be present - particularly when we are describing new modes of practices across cultural boundaries, where people may not be using their first language. Take a solar panel - is it a 'solar thermal panel', 'solar PV', 'solar photovoltaic panel', 'photovoltaic panel', 'solar cell'? Should all these be equated, or differentiated? Who decides? When? Once a decision is made, whose responsibility is it to trawl all the previously entered data to ensure compatibility? This is difficult enough with wikipedia, one of the most valued and visited sites on the internet, with a well-funded foundation behind it, which aspires to nothing as ambitious as a discovery agent (an attempt to use the content of wikipedia as the raw material for a semantically queriable encyclopedia was made, but seems to have been abandoned in 2012). Another example (unintentionally hilarious) is that, in different documents about Semantic Web practice, the elements of the 'triple' are also named as Individual :: Property :: Individual (it's actually much worse than that).
Inability to 'map' real-world structures - this is more fundamental. Semantic Web techniques are founded on hierarchical, tree-like structures, where every concept is in a singular relationship to some larger concept (within a particular context, at least). Thus, within the context of 'The Beatles', 'John Lennon' can only be identified as having one role at a time, so that if we identify him as a 'Member', he cannot at the same time be a 'Guitarist', 'Singer' or 'Songwriter', let alone as 'the Political One', or 'the Angry One', or ...
This is a problem that crops up time and again in attempts to formalise information structures - it appears to be the 'logical' approach - however, as I hope the Beatles example makes clear, the real world is complex and messy, and is impossible to faithfully map by means of such rigid, 'tree-like' structures, where each twig joins only to one larger branch, and so on until you reach the trunk.
Christopher Alexander, creator of Pattern Languages, identified this issue through deep personal experience during his work on San Francisco's metro network in the late 1960s, in a seminal article - 'A City is not a tree', which I consider to be a foundational text in this area. His basic point is that those multiple, simultaneous linkages between elements of any complex network must all be acknowledged in any model that is to be more than trivially useful. He proposes, instead, a model which is still hierarchical (a City has Districts, Districts have Areas and Roads, Areas have Buildings and Open Spaces...), but which allows for linkages across hierarchy levels (City has Buildings, Areas have Roads, Buildings have Open Spaces...).
The resulting structure is more complex, harder to navigate in an abstract way, but, crucially, still allows us to focus on elements (each is a Pattern) one at a time when we need to, while never letting us forget the web of relationships which are implicated.
The people who develop Semantic Web tools and approaches are not, of course, stupid, and recognise the problems discussed above. Unfortunately, their approach seems to be to bolt additional complexity onto the fundamentally reductive framework they started with. However, as far as I can see, this just makes a nightmare out of representing perfectly (human) graspable sets of relationships, such as John Lennon's to the Beatles, as ever more work is required of the producer of the original material who is required to enter more and more metadata using unnatural syntax and symbols (according to w3.org, the body responsible for web standards, and who set the standards for the Semantic Web, this abstruse 2006 document is the latest word on addressing issues like these).
3. Human Readable
This is probably the easiest requirement to manage, assuming that the content is human-produced in the first place (as opposed to being automatically discovered and produced by machine means). Nevertheless, all the issues of human communication are still present. Did the writer make reasonable assumptions of the existing knowledge of potential readers (too much can be as bad as too little)? Was the writer skilled at presenting argument, description, prescription, analysis in a coherent fashion? What if the originator of the material wrote in a different language than that used in the knowledge base as standard?
On the basis of the above, I would suggest two things:
- that it is worth spending time in careful thought about the underlying structure of such a knowledge repository before launching any particular platform.
- that human editors / collators / curators will be needed for the foreseeable future.
Start where you are, and move forward
In the spirit of agile development, and knowing the value of early engagement and feedback, it is also worth starting to collect and disseminate data in a simple way as soon as possible.
If I had to map out an approach today, I would suggest that DadaMac should begin to select some candidates for Information Agent work as knowledge specialists, rather than communicator specialists. These should solicit knowledge-type information from changemakers, and mine the DadaMac archive for raw material.
Working to edit this material, and to develop a set of tags, in a text-based format, will be the foundation for increasing understanding of the breadth and nature of the material.
On the basis of this knowledge, group workshops on identifiable patterns and on pattern language structures would begin the work of developing appropriate overarching structures.
Alongside this, some software and data-structure people would be evaluating existing IT tools and deciding whether they can be adopted, adapted or used as the basis for some bespoke system. My feeling is that something based on 'Wikitect', a 'proof-of-concept' structured wiki tool would be worth investigating.