Artificial Intelligence — Reappropriation, Recombination and Evolution
Can AI evolve optimally if information is not freely accessible?
Over the past few years, I’ve continuously pondered over the value of competitive moats in technology and what could be seen to constitute such an advantage. Several times I’ve written over the value of ‘data-mart’ vendors, which in my view, through harnessing the observation of information, have through commercial models significantly hampered innovation and quality in the financial markets for years.
With the heightened level of interest around artificial intelligence and the likelihood of years of lawsuits attempting to define informational interests, and rights to model design, training data and acquisition sources, perhaps it is time to get ahead of the conversation around what constitutes a competitive moat and what should form part of the canon of societal property.
It should be obvious to anyone in modern society that it is in our best interests for artificial intelligence to evolve optimally. It is now part of our everyday lives and increasing exponentially in capabilities. In the very near future, it will be an inescapable part of the fabric of modern society. Ultimately it will become the force to both assist humanity and yet also shape it.
This is going to be a defining moment in the evolution of human history. How the concept of information ownership plays out, as indicated by high-profile announcements, such as Elon Musk threatening to sue Microsoft [1], will impact not just the producers and consumers of information and models, but the quality of their outputs which are provided to large swathes of a diverse society. This is not a niche activity; this is playing out in the public domain. The world’s biggest societal experiment.
We are indeed already seeing examples where people, with the means to do so [2], will be able to declare data sovereignty over user-contributed data on a platform, or observations arising from use of the platform itself.
Enterprises have gone to great lengths to create and protect competitive moats, with the view that they are competitive differentiators. However, this concept of isolation of some of the most important elements of artificial intelligence design, is contrary to desirable and optimal evolution of the technology itself.
The competitive moat is the antithesis of desirable behaviour when seeking to rapidly evolve the quality of an industry. Throughout history it has been shown that evolutionary outcomes are optimised when there are the foundations of sharing. Whether that be through shared bodies or resources, where we can track evolution and share components, there should be a means where ontologies, model weights and designs can evolve optimally to serve the best interests of society.
Art: The rise of the remix culture and digital platforms
Remix culture is the re-appropriation, remixing, and free sharing of pre-existing cultural works into new art forms. This free sharing of information in the artistic community has played a significant role in the development of culture and led to the emergence of platforms to reimagine and re-interpret digital works from other artists. These platforms have provided an environment where ideas and techniques can be exchanged freely to inspire new forms of artistic expression.
In street art, pioneers such as Banksy and Shepard Fairey re-appropriated existing images, symbols, and texts, to merge them into original creations to convey new meaning, messages, and a new visual language. [3] This free-form mixing allowed re-appropriation of symbols and texts and allowed them to be merged into new original creations. This created a more diverse and vibrant art scene as artists from various backgrounds and with different perspectives contributed to the evolution of the artform.
Similarly, a whole generation of artists have been created that thrive in the digital world, connecting and collaborating with each other regardless of their physical location. The development of digital art has been accelerated by the emergence of online platforms such as DeviantArt, Behance and Dribble, providing artists with areas to showcase their work, collaborate on projects, and receive feedback from their peers.
The emergence of digital art and remix culture has pushed the boundaries of what is possible in both real-world and digital art and has undoubtably been positive for the evolution of the contemporary art scene.
Floating Point Operations: The rise of artificial intelligence
Artificial Intelligence could now be considered as commonplace. Elements of artificial intelligence already appear as commonplace in everyday life, from making an airline booking, deciding what you should pay, whether you should get a loan, or even whether you should be released from jail [4].
Artificial intelligence is running autonomous weapons systems, being used to translate books and decide what you should listen to or read next. Large recommender systems are running in everyday life across platforms such as Spotify, Amazon, YouTube, and Facebook.
Steady advances in artificial intelligence have led us to this point, as well as recent advances in machine learning and trained models. The origins of these trained models, the predecessors for recent developments such as ChatGPT, find their origins many years ago in research in the 1940’s. This research and subsequent evolution of artificial intelligence has always occurred in siloes around specific capabilities, whether it be images, voice, or text prediction. Cases that have up until recently, remained relatively siloed in their capabilities [5].
A recent development is the emergence of language-oriented models such as GPT-3, GPT-4 and ChatGPT, however it is a common characteristic that the depth and sophistication of models has moved in positive correlation with the increasing availability of compute power, measured as floating-point operations. It has moved in lockstep with the capabilities predicted in models such as Moore’s Law. [6]
We are now at the point where compute and storage are widely available and relatively cheap, and that has allowed these models to flourish. That power along with the emergence of language-oriented models has led to a convergence in capabilities. This seemingly exponential increase in horizonal capability has taken the public largely by surprise and has brought the contention around information ownership and model design, into the public discourse. [7]
In addition, at least in the last 20 years or so, most of the powerful models which have been developed, have been contributed by private enterprise, such as Facebook, Google and Amazon, with next to no regulatory oversight, guidance or transparency [6].
Culture: The birth of the World Wide Web
The creation of the World Wide Web transformed the way people access and share information and has led to significant cultural changes that can be directly attributed to its’ creation. The explosion of creativity in the early days can be attributed to the underlying protocols of the web being publicly available [8]. The sharing of these foundational protocols such as TCP/IP, XML and HTTP, contributed to rapid growth and innovation around the world wide web. From the very beginning of the web, commonly referred to as Web 1.0, through to the now emerging semantic web, Web 3.0. The very essence of the revolution in connectivity, information sharing, ecommerce and collaboration, evolved into the emergence of internet titans such as Facebook, Google and Amazon. [9][10]
From what was largely a static information sharing medium to users, communicating basic ideas, through to the dynamic and personalised experience we have today, the basic foundations remain with the free sharing of communication and design protocols which underpin its existence.
Now as private enterprises and researchers work on the next generation of the web, the Semantic Web, they face similar challenges. How can we scale and have widespread adoption of contextual information and data if we don’t have agreement around elements of data sharing such as agreed ontologies, classifications, and protocols? Given the fractal nature of the Web, even if we were to, who would assume the cost of building and maintaining them?
Could it be then that the failure of the semantic web to emerge, is down to a failure of sharing common protocols that would allow the associated models to flourish? [8]
Winner takes all: The first to a releasable product into the public domain.
When examining the field of artificial intelligence today, there is an element of ‘winner takes all’ mentality which has emerged in the artificial intelligence community. This mantra of being the first with innovations is setting dangerous precedents for the evolution of this technology into society. This seems to be a relatively new phenomenon; most recently with the launch of ChatGPT. However, it hasn’t always been, and need not be the way of shepherding innovations into the public domain.
Quite often the reasons behind this public domain testing of radical new capabilities verge on elements of political xenophobia, or a case of, ‘if we don’t then someone else will’ type of arguments, rather than a rational discourse of the state of play of the industry.
It needn’t be this way, however. You don’t have to look far to find concrete examples of how evolution has been positively enhanced through sharing. A good example being, the open sourcing of machine learning library TensorFlow, developed by Google.
By making this library open-source and freely available, Google has allowed countless researchers, developers, and organisations to collaborate, innovate, and enhance the field of artificial intelligence.
There have been recent calls for a pause in the release of artificial intelligence innovations to the public, in the form of the now infamous ‘Pause Giant AI Experiments’ open letter from the Future of Life Institute [11]. Likewise, there have been counterarguments that a pause would not be beneficial and that releasing the technology and allowing societal impacts of this technology to be realised, assumedly the majority of which would be positive, would be a better approach.
To summarise the latter, let’s invent the car first and then that will lead to the invention of the seatbelt [12]. For a technology that has reached 100 million users in two months [13], you would have to imagine that’s a bold approach.
Language: The evolution of language through cultural exchange
Language serves as a means of information exchange between humanity and has undoubtedly improved human evolution with the sharing of ideas and exchange of information. Language is both current and historical, an imperfect but perfectly functional showcase of years of history and evolution.
The free sharing of information through cultural exchange has led to positive evolution of language throughout history, whether it be words, phrases and grammatical structures which have been adapted and borrowed from others. This process of adaptation and recombination has resulted in a rich and diverse linguistic landscape.
The English language serves as an example where one of the most significant influences was through the adaptation of Latin words into Old English, especially in the fields of law, religion, and science [14]. This was then followed by influences from French, adapted into Middle English, particularly in the fields of government, art, and literature. The period of adaption led to significant expansion of the English vocabulary [15].
In recent history, the colonisation and exploration age led to the borrowing of words from indigenous languages world-wide. Words such as “kangaroo” and “boomerang” were borrowed from Australian aboriginal languages, while “canoe” and “tobacco” have their origins in the languages of the indigenous people of the Americas. [16]
This evolution has not been a one-way street. English has also contributed words and phrases to other languages, reflecting the bidirectional sharing of information and context leading to positive evolution of the ability to convey meaning.
The English language serves as an example, an easy one, but certainly not the only one, of how the free sharing of information and cultural exchange has led to positive evolution in language. English has incorporated words, phrases, and grammatical structures from various languages throughout history, reflecting its global interactions and adaptability. This process of borrowing and recombination has resulted in a rich and diverse linguistic landscape, demonstrating the power of open sharing in language development.
Artificial Intelligence: The rising need for collaboration
With artificial intelligence we are already at the point of widespread adoption and yet we are still in very primitive stages of collaboration around how to evolve the models optimally and how to deploy the technology safely. The question of model safety has largely been considered as a regulatory concern, where the societal impacts are examined ex-post. For which our regulators are woefully under-equipped, under-resourced and lacking the organisational flexibilities to keep pace with the newly emerging technology base.
When looking at areas of language, culture, and art, we have seen positive evolution of these domains over time. While taking much longer to reach this state, they have all benefited from the free sharing of information, context, and design and that has had positive influences on their evolution. The evolution of these domains has not always been positive; however, I would argue that the net results are positive even though you can’t always easily control that evolution, and that it will undoubtedly at times produce undesirable outcomes.
With artificial intelligence, should we be taking a similar approach with how to evolve this technology? There are quite a few possible areas which would be ripe for collaboration and knowledge sharing.
- Foundational model designs: Sharing the architecture and design of AI models such as the transformer architectures used in large models such as GPT-4, would aid researchers to build upon and improve the models and while contributing back to the community.
- Training methodologies: Sharing training methodologies and techniques can help in areas such as transfer learning, fine-tuning, unsupervised learning and eliminating obvious biases from the training process.
- Annotations and data: Open sharing of annotated datasets and data sources would be a possible result of increased transparency around training methodologies. This would allow training on a diverse range of data, resulting in more robust and generalizable AI systems. This is particularly important in the context of large language models, which rely on vast amounts of data to generate human like text. Sharing real-world data is preferable to training on synthetic data, with imperfect real-world correlations.
- Weights and model parameters: Sharing information around pre-trained model weights and ontologies can help save researchers time and resources by allowing them to fine-tune existing models and ontologies, rather than allowing these important elements to evolve in isolation from the wider community.
The examples presented above are just some of the areas where a transparent and open approach to artificial intelligence technology would benefit the overall quality of the resulting models which are deployed into the public domain. This would build upon the work done from pioneers such as David Hanson and Dr Ben Goertzel, with SingularityNET, where the concept of an AI marketplace where collaboration could occur more freely was originally born.
Perhaps it’s time we started to build cross-organisational groups and bodies to share information, designs, and approaches rather than locking it away behind corporate opaqueness. These cross-organisational groups would also include regulators, and could well include the cross pollination of organisational design into regulatory bodies. This could no doubt be a positive development allowing them to be more agile or ‘AI-friendly’.
The approaches above to improve collaboration and information sharing would, in my view, undoubtedly assist with the evolution of this extremely important area and improve quality and outcomes for humanity.
It is imperative that, as this technology is rolled out, we recognise that it is incumbent upon us all to make sure that it goes well. [17]
References
- Annabelle Liang (2023), Elon Musk threatens to sue Microsoft over Twitter data, BBC
- James Vincent (2022), The scary truth about AI copyright is that nobody knows what will happen next, The Verge
- Bohnacker, H., Gross, B., Laub, J., & Lazzeroni, C. (2009). Generative Design: Visualize, Program, and Create with Processing. Princeton Architectural Press
- Karen Hao (2019), AI is sending people to jail — and getting it wrong, MIT Technology Review
- Tristan Harris, Aza Raskin (2023), The A.I Dilemma, Center for Humane Technology
- Max Roser (2022), The brief history of artificial intelligence: The world has changed fast — what might be next?, Our World in Data
- Nick Bonyhady, AI bots are being trained on Australian data. Should we be paid for it?, The Age
- Shadbolt, N., Berners-Lee, T., & Hall, W. (2006). The Semantic Web Revisited. IEEE Intelligent Systems, 21(3), 96–101. DOI: 10.1109/MIS.2006.62
- O’Reilly, T. (2005). What is Web 2.0: Design patterns and business models for the next generation of software O’Reilly Media, Inc.
- Shadbolt, N., Berners-Lee, T., & Hall, W. (2006). The Semantic Web Revisited. IEEE Intelligent Systems, 21(3), 96–101. DOI: 10.1109/MIS.2006.62
- Various (2023), Pause Giant AI Experiments: An Open Letter, Future of Life Institute
- Yann LeCun, Andrew Ng (2023), Why the 6-month AI pause is a bad idea
- Krystal Hu (2023), ChatGPT sets record for fastest growing user base, Reuters
- Simon Horobin (2016), How English became English — and not Latin, Oxford University Press
- Crystal, D. (2003). The Cambridge Encyclopedia of the English Language. Cambridge University Press.
- Marijana Ivančić (2013), Which words has English borrowed from other languages? An Extensive Analysis of Modern English Loan Words, University of Zadar
- Max Roser (2022), Artificial Intelligence is transforming our world — it is on all of us to make sure that it goes well, Our World in Data
Recommended Reading
- Sachin Kumar Sharma (2023), Synergizing Ontologies and Large Language Models: The Future of Context-Aware AI Solutions, LinkedIn
- Roman Suzi (2023), Large language model as a source for ontology, Medium
- Zongjie Li, Chaozheng Wang, et. al (2023), On the Feasibility of Specialized Ability Extracting for Large Language Code Models, Cornell University
- Shomik Ghosh (2022), Why AI is not a moat, Substack
- Daniel Shenfeld (2019), How to build an AI moat, Medium
- Daniel Barker (2018), How we use the word data has changed and it’s dangerous, Medium