A 21st century essay towards a real character language

Abstract

How do we design classification systems that embrace ambiguity and are highly multilingual? This dissertation presents two speculative knowledge organization systems called ‘Horapollo Ontology’ and ‘Real Character Language Ontology’. These systems use icons to represent cross-linguistic semantic primitives. The systems ask two research questions. The first is: how do we determine which categories are most well attested cross-linguistically? The second is: how do we represent categories with easily interpretable images? Taking a documentalist approach to the role of knowledge organization, these questions are situated within the context of Francis Bacon’s definition of “Real Character” from the history of knowledge organization, and present a problematic to challenge the unspoken assumptions of universality in the field of library and information science, particularly in the use of authority control to manage ambiguity. Bacon’s ideas about universal language are further situated in the context of the Renaissance Hieroglyphic tradition, a tradition deriving from the encyclopedia of Horapollo, from which the Horapollo Ontology gains its name. Combining research creation, a mixed methods quantitative content analysis, information visualization, and the construction of an information retrieval system, this project is grounded in contemporary linguistic linked open data (LLOD) standards. The Cross-linguistic linked data (CLLD) Concepticon dataset is used as a source of crosslinguistic concepts derived from a systematic review of linguistic literature, while the Noun Project API is used to collect crowd-curated images associated these concepts. The concept sets are coded to determine which concepts are best represented as cross-lingual images. Following this, research creation is used to visualize and dramatize the results of the research. The 26 best performing concepts are visualized as a font, Horapollo 1.3. The same images from the font are defined at unique URIs in the Horapollo ontology. 14 of these concepts are reused to form the Real Character Language Ontology, an experimental ontology which uses fuzzy logic to operate over a multilingual vector database for an information retrieval system using Wikipedia articles. The result is a knowledge organization system which is a work of art intended to provoke questions concerning our assumptions about the cross-linguistic universality of knowledge, and also demonstrate expanded potentials for information retrieval in the era of machine learning.

Summary for Lay Audience

Can there ever be a universal language that everyone can understand, no matter what language they speak? Library and information science has historically sought to create universality through classification systems. Everyone who uses the Dewey Decimal or Library of Congress Classification systems expect to find similar books shelved under the same classes. This dissertation takes a more creative approach to how we organize our information. Drawing on the history of universal languages, specifically an often-forgotten period from the dawn of the early modern era, it first investigates the historical roots of our ideas about scientific ambiguity. Why is ambiguity so frowned upon, and what is the theoretical origin of universality? To ask these questions, it looks at the history of the Renaissance hieroglyphic tradition, whose ideas about hieroglyphs go back to ancient Egypt and a priest named Horapollo, influencing our ideas about universal languages. These ideas inspired the definition of “Real Characters” by Francis Bacon, which had an influence on the search for the perfect language. The dissertation then subverts this history by creating a new, multilingual hieroglyphic font, based on contemporary scientific data (the Cross-linguistic linked data, or CLLD Dataset). It implements these glyphs in a special font and word list called “Horapollo” which represents 26 highly translatable concepts represented as cross-linguistic images. Following this, a Real Character Language is constructed which uses methods from artificial intelligence to link documents cross-linguistically using these categories. What makes the new classification system produced here unique is that it can do things previous library classification systems could not. For instance, every document is a member of every class to some degree, creating a non-binary system of classification. Also, the method for defining new classes across languages encourages alternative interpretations and ambiguity. Paradoxically, a multilingual system is created by getting rid of the idea that a library classification system can ever be truly universal. Instead, a multi-perspectival approach is adopted, where the same images can be redefined in different contexts, leading to endless potential “remixing”. All of this serves to furnish a creative intervention to provoke us to rethink the role library classification systems can have in the 21st century.

Description

The title of this thesis is based on John Wilkins' famous 1668 essay "An Essay Towards a Real Character and Philosophical Language." It supports a work of web art located at https://realcharacterlanguage.world

Keywords

Knowledge organization, Ontology, Universal language, Real character, Linked open data, Information retrieval

Collections