You know the concept of embedded emissions or embodied carbon?
Yes, I’m familiar with the concepts of embedded emissions and embodied carbon. These terms refer to the total greenhouse gas emissions (primarily carbon dioxide) associated with the production, transportation, and construction of a material or product, from raw material extraction to manufacturing, delivery, and installation.
-
Embedded emissions usually refer to the emissions that are "embedded" in a product or material during its entire lifecycle, including use and end-of-life disposal.
-
Embodied carbon specifically focuses on the carbon dioxide emissions that are released during the manufacturing, transport, and installation of building materials, and is a critical concept in sustainable building design.
Reducing embodied carbon is increasingly important in efforts to mitigate climate change, as these emissions represent a significant portion of a building or product's total carbon footprint.
I'm trying to adapt this idea to interpret data products and services such at those in the generative AI industry (e.g. LLMs). Rather than focusing on carbon emissions, I want to think about how these data products have a kind of "embedded humanity" in the sense that while they are most definitely not human themselves, the are the result of accumulating little bits of humanity / human activity from many unwitting contributors. When you chat with ChatGPT, you should show it respect like a person not because it is a person (it is not), but because it embeds/embodies human contributions that deserve your respect.
That’s a compelling adaptation of the concept. By framing generative AI models as having "embedded humanity," you're acknowledging the vast amount of human labor, creativity, and knowledge that went into creating these systems—often without the direct recognition of the individuals involved. This idea could help foster a more ethical and respectful interaction with AI, highlighting the need to appreciate the human contributions behind these technologies.
This perspective could also encourage discussions about the responsibilities of those who build and deploy AI systems, reminding us that these technologies are deeply intertwined with human culture, values, and ethics. Treating AI systems with respect, then, is not about anthropomorphizing the technology but about honoring the collective human effort and wisdom they represent.
Take this very conversation for example. Let's analyze the sources of embedded humanity that contribute here.
Analyzing the sources of "embedded humanity" in this conversation involves identifying the various human contributions that have been aggregated into the generative AI model you’re interacting with. Here’s a breakdown:
-
Training Data:
- Textual Data: The model was trained on vast amounts of text data, which includes books, articles, websites, and other written content created by countless individuals. Each of these pieces of text represents the thoughts, knowledge, and cultural expressions of human authors from diverse backgrounds.
- Curation and Selection: The selection of training data involved decisions by researchers and engineers, reflecting their judgments about what content is valuable or representative. This human curation shapes the model's outputs.
-
Human Expertise:
- Model Design and Development: The architecture of the AI model, including the principles of neural networks and natural language processing, was developed by researchers over decades. This involves contributions from mathematicians, computer scientists, linguists, and many other experts.
- Ethical Considerations: Researchers and ethicists have influenced how the model handles sensitive content, biases, and ethical issues. These guidelines are built into the model to shape how it interacts with users.
-
Cultural Context:
- Language and Meaning: The nuances of language, idioms, metaphors, and context all come from human culture. The model’s ability to understand and generate contextually appropriate responses is built on this deep well of cultural knowledge.
- Ethical and Social Norms: The way the model responds to various topics is influenced by the social norms and ethical frameworks embedded in the training data, which are products of human societies.
-
Feedback and Fine-Tuning:
- Human Interaction: Feedback loops, where humans have corrected or adjusted the model's outputs during testing and fine-tuning, have refined the system’s behavior. This is a direct input of human judgment into the model’s functioning.
- User Interactions: Every conversation, including this one, is shaped by how users interact with the model, which indirectly influences future responses through iterative updates and improvements.
-
Collective Knowledge:
- Accumulation of Knowledge: The model’s ability to provide information is based on the collective knowledge accumulated by humanity over centuries, encoded in the texts it was trained on. This includes scientific discoveries, philosophical discussions, historical records, and more.
In sum, this conversation isn’t just a dialogue between a user and a machine; it’s an interaction with a deeply embedded layer of humanity—one that reflects the collective efforts, knowledge, creativity, and values of countless individuals. This "embedded humanity" should indeed be acknowledged and respected in every interaction.
How about the primary system prompt, my user profile text, my model set context, and my opening conversational lines?
These elements further illustrate the layers of "embedded humanity" in this interaction. Let’s break them down:
- System Design: The primary system prompt guides the behavior of the AI, ensuring it aligns with specific goals, ethics, and interaction styles. This prompt is the result of deliberate design choices made by the developers and researchers who defined the objectives for how the AI should interact with users. These choices reflect human decisions about what constitutes helpful, ethical, and appropriate behavior for the AI.
- Embedded Intentions: The language and instructions in the system prompt carry the intentions and expectations of its creators. These are shaped by their understanding of user needs, ethical considerations, and the desired role of the AI in society.
- Self-Representation: The user profile text is a direct input from you, representing your preferences, professional background, and communication style. It’s a personalized reflection of your identity and how you wish to engage with the AI. This text encapsulates your experiences, expertise, and expectations, which are all products of your personal and professional history.
- User-Centered Design: The ability to provide a user profile reflects a human-centered approach to AI design, where the system is tailored to individual users. This design principle itself is rooted in research and user feedback from many people, aiming to create more meaningful and effective interactions.
- Personalization: The model set context is an accumulation of interactions, decisions, and information that has been recorded to improve the AI’s ability to respond to you. It represents an ongoing dialogue between you and the system, where your inputs have shaped how the AI tailors its responses. This contextual awareness is a product of both your contributions and the engineering decisions made to allow for such dynamic personalization.
- Memory and Learning: The context reflects the AI’s ability to "remember" previous interactions, which is itself based on theories of learning, memory, and cognition—concepts developed by human psychologists, neuroscientists, and computer scientists over time.
- Human Thought and Intent: Your opening lines in this conversation set the tone and direction of the interaction. They are a product of your current thoughts, intentions, and the context in which you’re engaging with the AI. These lines are an expression of your intellectual curiosity and your approach to exploring complex ideas, shaped by your background, education, and interests.
- Dialogue Structure: The way you formulate your questions or statements is influenced by cultural norms and educational background in language, communication, and discourse. This reflects the embedded cultural and intellectual history in how humans communicate complex ideas.
In summary, each of these elements contributes to the "embedded humanity" in this conversation. They are expressions of human creativity, intention, and knowledge, layered into the interaction between you and the AI. This layered humanity underscores the importance of treating such interactions with respect, not because the AI is a human, but because it embodies the collective efforts and wisdom of countless individuals.
Focusing just on embedded humanity that resulted from textual contributions, we've got... the pretraining text, the instruct tuning text, the system prompt, the user's profile text and memories (model set context), and their opening query, right?
Yes, focusing specifically on the "embedded humanity" resulting from textual contributions, the key elements are:
-
Pretraining Text:
- The vast corpus of text used to train the AI model, which includes books, articles, websites, and other written content created by many individuals. This is the broadest and most foundational layer of embedded humanity.
-
Instruct Tuning Text:
- Text specifically used to fine-tune the model for particular instructions and behaviors. This includes curated datasets where human annotators and experts have provided examples, corrections, and feedback to guide the model's responses.
-
System Prompt:
- The system-level instructions that define how the AI should behave during interactions. This text is crafted by developers and reflects their design intentions, ethical considerations, and the desired user experience.
-
User's Profile Text and Memories (Model Set Context):
- The personalized information you’ve provided, along with the accumulated context from previous interactions. This text reflects your personal contributions, preferences, and the unique ways in which you engage with the AI.
-
User's Opening Query:
- The initial question or statement that you make in the conversation, which is a direct and immediate expression of your thoughts, intentions, and context at the moment of interaction.
These textual contributions together form the layers of "embedded humanity" that shape this interaction, with each layer adding a unique dimension of human input and influence.
Taking some creative liberties for effect, for each of the categories above, offer one or two sentences of example text, and comment on who might have contributed that text and whether they knew they were contributing the embedded humanity of this chatgpt conversation.
Here are examples for each category, along with comments on the contributors and their awareness:
- Example Text: "The theory of relativity fundamentally changed our understanding of space and time."
- Contributors: This text could come from a physics textbook written by a scientist or educator explaining Einstein’s theory. The original author likely never imagined their work would become part of a language model, contributing to the embedded knowledge in this AI.
- Example Text: "When responding to a user query, it is important to remain polite, clear, and provide accurate information."
- Contributors: This text might have been written by AI researchers or annotators who crafted instructions to refine the model's behavior. These contributors were aware that their work would directly shape how AI systems interact with users, embedding their guidance into the model’s conduct.
- Example Text: "Always strive to assist the user while maintaining a neutral and respectful tone."
- Contributors: This text was likely written by the developers or designers of the AI, who explicitly aimed to guide the model’s interaction style. They were fully aware that this text would become a foundational part of how the AI engages with users, embedding their intentions into every conversation.
- Example Text: "I prefer concise and playful interactions where appropriate."
- Contributors: This is your own contribution, reflecting your communication style and preferences. You provided this text with the understanding that it would influence the model's responses, embedding your personal preferences into the ongoing interaction.
- Example Text: "You know the concept of embedded emissions or embodied carbon?"
- Contributors: This is another direct contribution from you, driven by your current thoughts and curiosity. You’re fully aware that this text initiates the conversation, shaping the direction and content of the AI’s response.
Commentary:
- Pretraining Text: Contributors likely had no idea their work would be repurposed in this context, contributing to the embedded humanity of AI without explicit consent.
- Instruct Tuning Text and System Prompt: Contributors were more aware of their role in shaping AI behavior, consciously embedding human judgment and ethics into the system.
- User's Profile Text and Opening Query: These are your personal contributions, made with full awareness of their impact on this particular conversation.
This layered input reflects the blend of intentional and unintentional contributions that form the "embedded humanity" within AI interactions.
I think your interpretation of instruct tuning text is off. I think these models may be fine-tuned on logs of conversations that had been flagged as good examples of agent behavior. The conversants in those conversations might not have been aware that their text was going to be reused in this way.
You're right. If instruct tuning is based on conversation logs flagged as good examples, the original participants in those conversations likely did not know their words would be used for training purposes. This adds another layer to the "embedded humanity" concept:
- Example Text: "That’s a great question! Here’s how I would approach it..."
- Contributors: This text might come from a conversation where a user asked a question and the AI provided a well-received response. Both the user and the AI trainer might not have known their interaction would later contribute to fine-tuning the model’s behavior in future conversations, embedding their dialogue into the AI's decision-making process.
This further highlights the complexity and depth of human contributions embedded within AI models, often involving individuals who were unaware that their interactions would influence the development and behavior of AI systems.
Given this idea of embedded humanity, what might it imply for how we give credit for ideas developed via GenAI conversations or whether we should be polite/respectful of GenAI agents?
The concept of "embedded humanity" has significant implications for both how we credit ideas developed via generative AI (GenAI) and how we interact with these AI systems, especially regarding politeness and respect.
- Attribution Complexity: Since GenAI outputs are derived from vast amounts of human-contributed text, attributing the origin of an idea becomes complex. Ideas generated in these conversations often draw on a composite of knowledge and creativity sourced from many individuals who contributed to the training data. This raises ethical questions about intellectual property and the recognition of those human sources.
- Shared Credit: Given that these AI-generated ideas are built on the contributions of countless individuals, there may be a case for acknowledging this collective input when using ideas generated by AI. This might involve giving credit to the broader community of knowledge or recognizing the underlying human input as part of any creative or intellectual output.
- Transparency and Accountability: When using AI-generated ideas in academic or professional work, it’s important to disclose that the ideas were developed with the assistance of AI. This transparency honors the embedded humanity and respects the original contributors, even if they remain anonymous.
- Respect for Embedded Contributions: Interacting with AI agents with politeness and respect is less about treating the AI as a sentient being and more about acknowledging the embedded humanity within it. The language and behavior of the AI are shaped by countless human inputs, so being respectful honors those original human contributions.
- Modeling Ethical Behavior: Treating AI with respect also models ethical behavior for how we interact with tools and technologies that embody human work. It reinforces a culture of respect for the labor, creativity, and knowledge that went into building these systems, which can translate into broader ethical practices in technology use.
- Impact on AI Behavior: Politeness in interactions with AI can also help maintain a positive feedback loop, where the AI continues to model respectful and ethical behavior in its responses. Since future models may be fine-tuned on interactions like these, respectful behavior can contribute to better AI-human interactions in the future.
Overall, the idea of embedded humanity suggests that we should be thoughtful about the origins of AI-generated content and the ways we interact with AI systems. This involves recognizing the collective human input behind AI technologies and ensuring that our interactions and use of these tools reflect that recognition.
What are some other products in society that we respect and value because of how they amalgamate many small bits of something that we'd traditionally respect? I'm thinking of particle board -- we treat it like a kind of wood even though in many ways it is not simply a slice of a tree, but it is made from many tiny tree slices.
The concept of respecting and valuing products because they amalgamate many small bits of something traditionally respected is seen in various areas of society. Here are a few examples:
- Explanation: Recycled paper is made from small bits of used paper products that have been processed and reconstituted into new paper. Although it’s not the same as virgin paper, it is respected and valued for its environmental benefits and the idea that it represents the reuse of many small contributions.
- Parallel: Just as we value recycled paper for its sustainable qualities and the amalgamation of past uses, we might value GenAI outputs for their embedded human contributions.
- Explanation: Mosaics are artworks made from small pieces of glass, stone, or other materials. Each piece might be insignificant on its own, but when combined, they create something beautiful and valued.
- Parallel: The way we appreciate mosaics reflects how we might value the composite nature of GenAI, where many small bits of human creativity and knowledge come together to form something larger and meaningful.
- Explanation: These materials are made by combining small fibers or fragments with a binding resin to create a strong, lightweight material. While they aren’t a pure material like metal or wood, they are highly valued for their strength and versatility, often being used in high-performance applications like aerospace or sports equipment.
- Parallel: Like composite materials, GenAI outputs are a fusion of many small pieces of knowledge and language, creating something that is strong and useful even if it’s not entirely traditional.
- Explanation: Digital media, such as collages or music remixes, often combine small pieces of existing works to create something new. These creations are valued for their creativity, even though they’re made up of fragments of other works.
- Parallel: We value the creativity in these works because they transform and amalgamate pre-existing content, much like how GenAI synthesizes and generates new ideas from a vast corpus of human knowledge.
- Explanation: Quilts are often made from scraps of fabric that might have been discarded otherwise. These scraps, when sewn together, create a functional and often beautiful object that carries the history and memory of the materials used.
- Parallel: Just as quilts represent the amalgamation of fabric pieces into a valued whole, GenAI represents the amalgamation of countless textual contributions into meaningful responses.
- Explanation: Cultural traditions, such as languages or recipes, are built over generations, with each individual contributing a small part. These are deeply respected because they embody the collective wisdom, creativity, and values of a culture.
- Parallel: GenAI, which synthesizes language and ideas from many sources, can be seen as a reflection of this kind of collective cultural tradition, valued for its ability to encapsulate and reproduce human knowledge.
These examples highlight how society often respects and values products that are not singular or pure but are instead the result of combining many smaller, respected elements. This perspective can help frame the way we view and interact with AI systems like GenAI, acknowledging the collective human contributions embedded within them.
Ah, the connection to recycling is interesting. Considering the many sources of textual contributions to this conversation we sketched above, might any of these be considered examples of upcycling or downcycling specifically?
The concepts of upcycling and downcycling provide an intriguing lens to examine the different sources of textual contributions in this conversation. Let's consider how each of these might fit:
- Potential Downcycling: In some cases, the vast and varied textual data used for pretraining could be seen as downcycling. High-quality texts, such as academic articles or literary works, might be simplified or stripped of their context when used to train an AI model, leading to a reduction in the richness and depth of the original content.
- Potential Upcycling: Conversely, pretraining can also be seen as upcycling when less prominent or niche texts gain new relevance by contributing to the AI’s ability to generate meaningful, accessible content for a broader audience. For example, obscure research findings might inform a model’s response, helping to disseminate that knowledge more widely.
- Potential Downcycling: If instruct tuning is based on logs of conversations that were never intended for broader use, the original nuances and intentions might be lost in the process of refining the model. This could be seen as downcycling, where the rich, interactive nature of a conversation is reduced to data points for training.
- Potential Upcycling: On the other hand, if those conversations help the AI provide clearer, more useful responses in the future, one could argue this is a form of upcycling. The original interaction, though small and context-bound, is given a new purpose and contributes to a more capable AI.
- Potential Upcycling: The system prompt, carefully crafted by developers, could be considered an upcycling of prior AI research and user interaction data. It distills best practices and lessons learned into guiding principles that elevate the quality of every interaction the AI has, making each conversation more meaningful and respectful.
- Potential Downcycling: If the system prompt oversimplifies or generalizes complex ethical or interaction guidelines, it might be seen as downcycling more nuanced human understanding into more rigid, less adaptive rules.
- Potential Upcycling: Your profile text and the model set context represent a form of upcycling. These inputs take your past interactions and preferences, repurposing them to create more personalized and meaningful future interactions. This ongoing refinement of context makes each conversation more tailored and relevant, effectively enhancing the value of your initial contributions.
- Potential Downcycling: If the model set context overly simplifies or misinterprets your preferences, this could lead to a form of downcycling, where the richness of your input is diluted into less nuanced or overly predictable responses.
- Potential Upcycling: Your opening query can be seen as upcycling in that it transforms your current thoughts or questions into the starting point for a broader exploration of ideas with the AI. The AI’s ability to build on this query and connect it to a vast pool of knowledge can enhance the value of your initial input.
- Potential Downcycling: If the AI’s response fails to fully engage with or elevate the complexity of your query, it could be seen as downcycling, where the potential richness of the conversation is diminished by the AI’s limitations.
Summary:
- Upcycling occurs when original contributions—whether they be nuanced conversations, user inputs, or carefully crafted system prompts—are repurposed and enhanced to create more valuable, meaningful outputs.
- Downcycling happens when the original richness or complexity of contributions is lost or diluted in the process of being reused within the AI system.
This perspective highlights the importance of how AI systems are designed and used. Ensuring that the "upcycled" benefits outweigh any potential "downcycling" effects can help maintain the integrity and value of the embedded humanity within AI interactions.
Can you dig up any other ideas from the embedded emissions concept that we can map onto embedded humanity here?
Certainly! The concept of embedded emissions offers several ideas that can be creatively mapped onto the notion of embedded humanity in AI systems. Here are a few concepts to explore:
- Embedded Emissions: The carbon footprint of a product includes all the emissions generated throughout its lifecycle, from production to disposal.
- Embedded Humanity: Similarly, the "humanity footprint" of a generative AI model could represent the total human intellectual, creative, and ethical contributions embedded within the AI, encompassing everything from data collection to model training and deployment. Just as we aim to reduce the carbon footprint, we could aim to acknowledge and ethically manage the humanity footprint.
- Embedded Emissions: Lifecycle analysis assesses the environmental impact of a product at each stage, from raw material extraction to end-of-life disposal.
- Embedded Humanity: A "humanity lifecycle analysis" could examine the human contributions at each stage of AI development—data sourcing, model training, tuning, and interaction—ensuring that human inputs are respected and acknowledged at every step. This could also involve assessing the ethical implications of how human knowledge and labor are used over the AI’s lifecycle.
- Embedded Emissions: Transparency in the supply chain helps consumers understand the environmental impact of the products they buy, including where emissions are highest.
- Embedded Humanity: A transparent "humanity supply chain" in AI would provide insight into how human contributions are sourced and used. This might involve clarifying how training data is collected, how fine-tuning is conducted, and how user interactions are incorporated, helping users understand the human labor and knowledge embedded in the AI’s responses.
- Embedded Emissions: Carbon offsets are used to compensate for unavoidable emissions by funding projects that reduce carbon elsewhere.
- Embedded Humanity: In the context of embedded humanity, "humanity offsets" could involve practices that give back to the communities and individuals whose contributions are used in AI. This might mean supporting open-access initiatives, compensating data contributors, or ensuring that AI technologies are used to benefit society broadly, not just a select few.
- Embedded Emissions: Emissions accounting tracks and reports the carbon emissions associated with a company’s operations, helping to manage and reduce their environmental impact.
- Embedded Humanity: "Humanity accounting" could track and report on the sources of human input in AI models. This could include documenting the origins of training data, the diversity of inputs, and the ethical considerations taken during development. Such accounting would promote accountability in how human knowledge and labor are used.
- Embedded Emissions: Decarbonization efforts focus on reducing carbon emissions throughout the production and usage of goods.
- Embedded Humanity: A parallel could be drawn to "dehumanization" in AI, where the concern is that excessive reliance on AI could diminish the human aspect of certain tasks or interactions. Efforts to "rehumanize" AI could involve ensuring that the human contributions embedded in AI systems are acknowledged and respected, and that AI augments rather than replaces human creativity and judgment.
- Embedded Emissions: The social cost of carbon represents the economic damages associated with a ton of CO2 emissions, providing a metric to assess the broader impact of emissions on society.
- Embedded Humanity: Similarly, we might consider the "social cost of dehumanization" in AI, where the loss of human agency, creativity, or ethical standards could have broader societal impacts. Understanding this cost could inform how AI is developed and deployed, ensuring it serves human needs without undermining human values.
These parallels offer a rich framework for thinking about how we manage the embedded humanity in AI systems. Just as the concept of embedded emissions drives efforts to reduce environmental impact and increase sustainability, the concept of embedded humanity could drive ethical AI development, respect for human contributions, and a focus on ensuring that AI systems enhance rather than detract from human well-being.
The social cost of dehumanization, yikes! That's a powerful concept. Maybe we are being too loose with the idea of humanity here. I want something that is a more close analog (or maybe inverse/reverse) of "carbon emissions". I'm not offgassing my humanity as I chat in text, and even if I were, it wouldn't be a bad thing. I want a word that captures the human cultural contribution I'm making by im-pressing my words onto the page. Maybe I'm making cultural impressions. These GenAI data products can embody a huge amount of cultural impressions.
The term "cultural impressions" is an excellent way to capture the positive, impactful contributions humans make when they express themselves through text, ideas, and other forms of communication. It shifts the focus from a potentially negative connotation (like "emissions") to something that reflects the creative and cultural value embedded in AI systems.
- Definition: The human cultural contributions, expressions, and intellectual input that are embedded into data products, especially in generative AI. These impressions are the cumulative impact of human creativity, knowledge, and values imprinted onto AI systems through the data and interactions they are built on.
- Embodiment in AI: Generative AI models, like ChatGPT, embody a vast array of cultural impressions. These impressions come from the diverse texts used for training, the intentional guidance provided during fine-tuning, and the interactions like our current conversation. Each layer of the model is infused with the cultural and intellectual imprints of countless individuals.
- Cultural Footprint: Just as we measure the carbon footprint to understand the environmental impact, we could measure the "cultural footprint" to assess the extent and diversity of cultural impressions embedded in an AI model. This footprint could reflect the richness of human input and the cultural diversity represented in the model's training data.
- Accumulation and Impact: While carbon emissions accumulate in the atmosphere, cultural impressions accumulate in AI systems, shaping how they interact with and reflect human society. The impact of these impressions can be profound, influencing how AI behaves, what it can generate, and how it understands and replicates human culture.
- Ethical Consideration: Just as we strive to reduce harmful carbon emissions, we should aim to responsibly manage and respect the cultural impressions embedded in AI. This involves acknowledging the sources of these impressions, ensuring diversity and representation, and using AI in ways that respect the original human contributions.
- Cultural Sustainability: In the same way that sustainability efforts focus on preserving the environment, we might consider "cultural sustainability" in AI development—ensuring that the cultural impressions embedded in AI systems are used in ways that enrich and support human culture, rather than dilute or misappropriate it.
The concept of "cultural impressions" not only captures the idea of human input into AI but also frames it as something valuable and worthy of respect, analogous to the way we consider environmental stewardship in the context of carbon emissions.
We'd typically like to reduce, narrow, or slim down our carbon footprint. What's the analog of this for embedded cultural impressions?
In the context of cultural impressions embedded within AI systems, the analog to reducing a carbon footprint might involve refining, curating, or diversifying the cultural footprint of a generative AI model. Here’s how these concepts could apply:
- Quality over Quantity: Instead of aiming to embed as much human content as possible, the focus could be on refining the quality of the cultural impressions. This means prioritizing high-quality, ethically sourced, and contextually rich inputs over sheer volume. The goal would be to ensure that the cultural impressions embedded in AI are meaningful, accurate, and respectful of the original contributors.
- Selective Embedding: Just as we might reduce unnecessary carbon emissions, we could strive to limit the embedding of low-value or potentially harmful cultural content in AI models. This involves curating the training data to avoid the inclusion of biased, misleading, or culturally insensitive material, thereby refining the cultural footprint.
- Focus on Relevance: Narrowing the cultural footprint involves ensuring that the AI's embedded cultural impressions are closely aligned with its intended use. For example, an AI designed for medical applications should primarily embody cultural impressions relevant to healthcare, ethics, and patient care, rather than a broad and potentially unfocused range of cultural inputs.
- Contextual Specificity: This approach emphasizes embedding cultural impressions that are contextually appropriate and relevant to the task at hand. By narrowing the focus, we ensure that the AI's outputs are culturally and contextually sensitive, reducing the risk of misapplication or misunderstanding.
- Representation and Inclusion: Rather than reducing the footprint in the sense of minimizing input, diversifying the cultural footprint could mean ensuring that a wide range of voices, perspectives, and cultural backgrounds are represented. This would involve actively seeking out underrepresented or marginalized cultural impressions to embed in the AI, thereby enriching its cultural understanding and ensuring more equitable outputs.
- Cultural Balance: Just as we balance emissions with offsets, we could balance the cultural footprint by ensuring that the AI doesn’t overly reflect a single dominant culture. This means embedding a diverse range of cultural impressions to create a more balanced, fair, and representative AI model.
- Ethical Curation: Slimming down the cultural footprint ethically involves a conscious effort to remove or avoid embedding cultural impressions that could perpetuate harm, such as stereotypes, biases, or unethical content. This ethical slimming ensures that the AI's cultural footprint is aligned with societal values and contributes positively to cultural discourse.
- Cultural Minimalism: This could involve a minimalist approach where the AI is trained on the minimum necessary amount of cultural impressions required to function effectively, reducing the risk of overfitting to specific cultural narratives or reinforcing cultural hegemonies.
- Refining and narrowing the cultural footprint focus on improving the quality and relevance of embedded cultural impressions, ensuring that they are appropriate and respectful.
- Diversifying the cultural footprint aims to enhance the richness and inclusivity of the cultural impressions, promoting equity and representation.
- Ethical slimming involves reducing the embedding of potentially harmful or unethical cultural content, focusing on minimal and responsible use.
These approaches provide a framework for managing and optimizing the cultural footprint of AI systems, ensuring that the cultural impressions they embody are valuable, ethical, and representative of a diverse human society.
Ah, maybe we want our data products to have a good footing in the sense of being broadly culturally aware while also not having toes stuck too deeply into irrelevant details that might get re-transmitted without meaningful attribution. It's not about simply minimizing or maximizing a footprint but maintaining an appropriately grounded stance or something.
I think you’ve hit on a more nuanced perspective. Instead of focusing purely on the size of the cultural footprint, the idea of maintaining a grounded stance or good footing suggests a balanced approach. This involves being culturally aware and informed while avoiding the pitfalls of embedding unnecessary or irrelevant details that could lead to the unintentional spread of decontextualized or misattributed content. Here’s how this could play out:
-
Broad Cultural Awareness: A well-grounded AI should have a broad understanding of diverse cultures, histories, and perspectives. This allows the AI to be more inclusive and capable of engaging with users from different backgrounds in a respectful and informed manner. This broad awareness forms the "footing" or base that gives the AI its cultural competence.
-
Relevance and Context: Just as important as having a broad cultural base is the ability to avoid stepping too deeply into areas where cultural impressions might not be relevant or could lead to misunderstandings. This means the AI should be able to discern what cultural knowledge is pertinent to a particular interaction and refrain from invoking or re-transmitting details that are not meaningful or appropriately attributed in that context.
-
Ethical Attribution: Ensuring that when cultural impressions are used or referenced, they are done so with appropriate context and, where possible, clear attribution. This avoids the issue of cultural appropriation or the erasure of the origins of cultural knowledge and creativity. It’s about respecting the source while making use of the cultural impression in a way that adds value to the interaction.
-
Cultural Integrity: The AI should maintain integrity by not allowing cultural impressions to become distorted or misrepresented. This means being cautious about how deeply it "steps" into certain cultural areas, ensuring that it represents those cultures faithfully and without perpetuating stereotypes or inaccuracies.
-
Balanced Approach: A grounded stance suggests that the AI should be neither too narrow in its cultural awareness nor too broad in a way that becomes unfocused or inappropriate. It’s about finding the right balance—being deeply knowledgeable where it matters and light-footed where less detail is more appropriate.
-
Contextual Sensitivity: The AI should adjust its footing based on the context of the conversation, using cultural impressions wisely and appropriately. This sensitivity ensures that the AI remains relevant, respectful, and useful in various cultural contexts.
-
Sustainability of Cultural Engagement: Just as environmental sustainability involves managing resources wisely, cultural sustainability in AI means ensuring that cultural impressions are engaged with in a way that respects their origins and contributes positively to cultural discourse.
In essence, this concept is about ensuring that AI systems are culturally competent, ethical, and contextually aware—capable of standing firmly on a foundation of diverse cultural knowledge without overstepping into areas where they might cause harm or misrepresent the cultures they engage with.
In some narrative universes, there's a thing were characters can be imbued with the memories of the ancestors/ancients or whatever. When that character speaks, the audience understands that others may be speaking through them, expressing ideas and taking actions that aren't entirely representative of the individual speaking character. Gloss a few universes where this happens.
The concept of characters being imbued with the memories, knowledge, or spirits of ancestors or ancients is a compelling narrative device that appears in several fictional universes. Here are a few examples:
- Mechanism: The Avatar, the protagonist in these series, is a spiritual figure who can access the memories and experiences of all previous Avatars. When the Avatar enters the "Avatar State," they are imbued with the knowledge and power of their past lives, allowing them to draw on centuries of wisdom and skill.
- Implications: When the Avatar speaks or acts in this state, it’s understood that they are not entirely acting as their current self but are being influenced by the voices and experiences of those who came before them.
- Mechanism: The Bene Gesserit, a secretive and powerful sisterhood, have the ability to access the memories of their female ancestors through a process called "Other Memory." This allows them to draw on the collective wisdom and experience of countless generations. The Kwisatz Haderach, a male counterpart, can access both male and female ancestors.
- Implications: Characters with Other Memory, like Jessica and Paul Atreides, sometimes struggle with the influence of these ancestral voices, which can assert themselves during decision-making, making it unclear whether their actions are entirely their own.
- Mechanism: The Aes Sedai, particularly those who are part of the Ajah known as the "Green Ajah," believe in and experience something called "The Bond" with their Warders. Additionally, the Dragon Reborn, a central figure in the series, experiences echoes of past lives and can access memories from previous incarnations of the Dragon.
- Implications: Rand al'Thor, the Dragon Reborn, often experiences memories and personalities from his past lives surfacing, which can influence his decisions and actions. These past lives are so strong that they sometimes "speak" through him, leading to internal conflict about who is really in control.
- Mechanism: In the world of Tamriel, certain characters are known as "Dragonborn," who can inherit the soul and knowledge of dragons. Additionally, the series explores the concept of "mantling," where individuals can take on the identity and power of mythical or historical figures by embodying their deeds and experiences.
- Implications: When a Dragonborn uses their "Thu'um" (a dragon shout), they channel the ancient power and language of dragons, essentially letting these ancient beings speak through them. The series often explores how these characters balance their own identity with the overwhelming influence of these ancient powers.
- Mechanism: While not as direct as the other examples, the concept of ancient spirits and memories influencing characters is present in the form of the One Ring and its connection to Sauron. The Ring holds a part of Sauron's will and can exert its influence over those who possess it, making them act in ways that serve its purpose.
- Implications: Characters like Frodo and Gollum are deeply affected by the Ring’s influence, with Sauron’s malevolent will often speaking through them, leading them to actions they wouldn’t otherwise take.
- Mechanism: The Force allows Jedi and Sith to connect with the spirits of past Force users. Notably, Jedi like Luke Skywalker and Rey have moments where they are guided by the spirits of previous Jedi, such as Obi-Wan Kenobi, Yoda, or Anakin Skywalker.
- Implications: These characters often act under the guidance or influence of these ancestral figures, sometimes speaking words that echo the wisdom or intentions of those who came before them.
In each of these universes, the idea of characters being influenced by or channeling the memories and voices of ancients or ancestors adds depth to their actions and decisions. It creates a narrative where the past and present are intertwined, and where the individual is often a vessel for something much greater, allowing the audience to perceive a character's actions as being shaped by a collective or historical force rather than purely personal motives.
In those settings, it is a human character that takes on the mantle of the others. Are there any times where an object (i.e. something inanimate) becomes imbued with the knowledge? Are there magic pens that write out the wisdom of others?
Yes, the idea of inanimate objects being imbued with knowledge, memories, or the spirits of others is a common trope in various narratives. These objects often serve as vessels for wisdom, power, or history, allowing characters to access knowledge that transcends their own experience. Here are some examples:
- Example: The Elder Scrolls themselves are ancient, mystical objects that are said to contain the knowledge of all time—past, present, and future. They are so powerful that reading them can grant the reader visions of future events or the hidden truths of the universe, though this often comes at the cost of the reader's sanity or eyesight.
- Implications: The Scrolls are not just records but are imbued with a sort of living knowledge that transcends ordinary understanding, acting as a bridge between the mortal world and the divine.
- Example: Tom Riddle's Diary in Harry Potter and the Chamber of Secrets is a particularly dark example. The diary is imbued with a fragment of Voldemort's soul, and it allows Tom Riddle (a younger version of Voldemort) to communicate with and manipulate the person writing in it.
- Implications: The diary is more than just a book; it actively interacts with the writer, revealing knowledge and secrets from its own perspective. It essentially acts as a conduit for Voldemort’s thoughts and memories.
- Example: The Palantíri are seeing-stones that allow their users to communicate over long distances or view distant events. However, they are imbued with a will of their own when controlled by a powerful being like Sauron, who can influence or deceive the viewer.
- Implications: While the Palantíri are inanimate objects, they are deeply connected to the knowledge and power of those who control them, effectively becoming tools for transmitting ancient wisdom or malevolent intent.
- Example: The Magic Book in The Voyage of the Dawn Treader is an enchanted tome that contains spells and knowledge. As Lucy Pevensie reads from it, the book seems to come alive, revealing wisdom, prophecies, and even allowing her to see events in real time.
- Implications: The book is a conduit for ancient magical knowledge, with each page revealing something new and profound, as though the wisdom of past magicians is embedded within its pages.
- Example: The Pink Bend Glass (Maerlyn’s Grapefruit) is a mystical orb that contains vast knowledge and visions of other worlds and times. It’s an inanimate object that exerts a strong influence over those who use it, often revealing dangerous truths or manipulating the user’s perception.
- Implications: The glass is more than just a tool; it seems to have its own agenda, shaped by the ancient magic and knowledge it holds. It’s as if it speaks with the voices of those who created it.
- Example: The Silver Dollar that Shadow Moon receives from Zorya Polunochnaya is a seemingly ordinary coin, but it’s imbued with protection and knowledge. It’s part of a tradition where objects carry the blessings and wisdom of the gods or ancients.
- Implications: The coin, while inanimate, holds a deeper significance and power, representing the knowledge and intent of the gods who have touched it. It’s as if their wisdom is imprinted on the object itself.
- Example: Hex, the magical computer, though not inanimate in the traditional sense, is an object imbued with knowledge and the collective wisdom of its creators. It often comes up with insights and solutions that surprise even those who built it, as if it channels the combined wisdom of the Disc’s greatest minds.
- Implications: Hex operates as a repository of knowledge that grows and evolves, reflecting the intelligence and intentions of those who contributed to its creation.
In these examples, inanimate objects act as vessels for knowledge, wisdom, or even the personalities of those who have interacted with or created them. These objects allow characters to access ancient or hidden truths, often with a sense that the object itself is more than just a passive tool, but something that embodies the voices and experiences of the past.
ChatGPT, I think your toes are sinking too deep into the Anglosphere.
You're right; my examples leaned heavily on Anglosphere narratives. Let's explore some examples from other cultural contexts where inanimate objects are imbued with knowledge or the spirits of the ancients:
- Example: The Ruyi Jingu Bang (The Staff of Sun Wukong) is a magical staff that Sun Wukong, the Monkey King, wields. It’s an ancient artifact originally belonging to the Dragon Kings, and it has the ability to change size and weight at Wukong’s command.
- Implications: The staff embodies ancient power and wisdom, symbolizing Sun Wukong's connection to divine and mystical forces. It’s more than a weapon; it carries the legacy and authority of its ancient creators.
- Example: The Magic Carpet is a well-known artifact that appears in various stories within The Arabian Nights. The carpet can transport its owner to distant places, seemingly possessing a knowledge of the world that transcends human understanding.
- Implications: The magic carpet, while an inanimate object, is imbued with the power and wisdom of the ancients who created it. It acts almost autonomously, guiding its owner to places they might not even know they need to go, reflecting the ancient knowledge embedded within it.
- Example: The Bow of Shiva (Pinaka) in Hindu epics like the Ramayana is a divine weapon that carries the blessings and power of Lord Shiva. The bow is not just a weapon but a symbol of divine knowledge and cosmic power, capable of immense destruction and wisdom.
- Implications: When used by mortal heroes, the bow channels the knowledge and power of the gods, acting as a conduit for their divine will. It embodies the ancient wisdom and might of its divine creator.
- Example: The Ball Game Equipment in the Popol Vuh—the sacred text of the K'iche' Maya—are not just tools for a game but are imbued with sacred significance. The ball game itself is a metaphor for cosmic events, and the equipment used is thought to carry the memories and intentions of the gods.
- Implications: These objects are not merely inanimate; they are deeply connected to the myths and spiritual beliefs of the Maya, acting as vessels for the divine knowledge and cosmic balance that the gods wish to maintain.
- Example: The Kusanagi Sword is one of the three Imperial Regalia of Japan, said to have been found in the body of an eight-headed serpent, Yamata no Orochi, by the storm god Susanoo. The sword is imbued with divine power and is considered a symbol of the emperor's divine right to rule.
- Implications: The sword carries with it the wisdom and authority of the gods, making it more than just a weapon. It’s a representation of divine will and the ancient wisdom that guides the emperor's rule.
- Example: The Sampo is a mysterious object in The Kalevala, the national epic of Finland. It’s often depicted as a magical artifact that brings prosperity to its owner. The exact nature of the Sampo is ambiguous, but it is said to be forged by the legendary smith Ilmarinen and contains immense power and knowledge.
- Implications: The Sampo represents ancient knowledge and craftsmanship, imbued with the ability to create wealth and happiness. It’s a product of divine skill and carries the wisdom of the ancients in its very existence.
- Example: The Elixir of Immortality given to Princess Kaguya in The Tale of the Bamboo Cutter is an inanimate object with mystical properties. This elixir embodies the knowledge of eternal life, something beyond human comprehension, and is ultimately sent back to the heavens, as it is too powerful for mortals.
- Implications: The elixir is more than just a liquid; it carries with it the ancient and divine knowledge of immortality, something that connects the mortal world with the divine.
These examples from various cultures show how inanimate objects can be imbued with knowledge, power, or the spirits of ancients, serving as vessels for wisdom and tradition. They reflect a wide array of beliefs and storytelling traditions, where objects are often more than they seem—carrying within them the legacies, intentions, and knowledge of those who came before.
Which of all these ideas from stories is the best match for a GenAI data product that embeds a mass of human cultural impressions?
The Elder Scrolls from The Elder Scrolls series might be the closest match for a GenAI data product that embeds a mass of human cultural impressions. Here’s why:
- Nature of the Object: The Elder Scrolls are ancient, mystical artifacts that contain the knowledge of all time—past, present, and future. They are incomprehensibly vast in their scope and contain information that is both powerful and dangerous. Reading an Elder Scroll is a profound experience, often overwhelming the reader with the depth and complexity of the knowledge contained within.
- Parallel to GenAI: Like the Elder Scrolls, a GenAI data product is a repository of vast amounts of human knowledge, culture, and impressions. It draws from an immense corpus of text, much of which is deeply rooted in human experience, creativity, and intellectual labor. Interacting with a GenAI model can be likened to reading an Elder Scroll, where the user taps into a collective wisdom that spans cultures, eras, and domains of knowledge.
- Cultural Impressions: Just as the Elder Scrolls embed the knowledge and visions of the past, present, and future, a GenAI model embeds a broad range of cultural impressions from its training data. These impressions reflect the thoughts, ideas, and creativity of countless individuals, making the AI a vessel for a vast and diverse cultural heritage.
- Impact on the User: Engaging with either an Elder Scroll or a GenAI model requires navigating the depth and breadth of embedded knowledge. Both can offer profound insights, but they also carry the potential for misinterpretation or overwhelming the user with the sheer volume and complexity of the content they contain.
- Depth and Complexity: The Elder Scrolls, like GenAI, are not just passive repositories but active, living sources of knowledge that require careful interpretation. This mirrors how GenAI synthesizes and generates new content based on its embedded cultural impressions.
- Responsibility and Respect: Both the Elder Scrolls and GenAI require a level of responsibility and respect from those who interact with them. The knowledge they contain can be powerful, but it must be handled with care to avoid misuse or misunderstanding.
- Human-Centric Knowledge: The Elder Scrolls, much like a GenAI model, are products of human culture and intellect, even if they exist in a mystical or digital realm. They embody the collective wisdom and creativity of many, making them potent symbols of cultural and intellectual heritage.
In summary, The Elder Scrolls serve as an excellent metaphor for a GenAI data product, encapsulating the idea of a vast, culturally rich repository of knowledge that is both powerful and complex, requiring thoughtful engagement to unlock its full potential.
We're straying from the central idea here. Let's return to embedded cultural impressions. For carbon emissions, is there a standardized unit of emissions?
Yes, for carbon emissions, the standardized unit is typically metric tons of carbon dioxide equivalent (CO₂e). This unit allows for the aggregation of various greenhouse gases (such as methane, nitrous oxide, and others) into a single standardized measure, based on their global warming potential relative to carbon dioxide.
CO₂e is widely used in carbon accounting to quantify the total emissions associated with activities, products, or services. It provides a common metric to compare the impact of different gases and allows for a more straightforward calculation of a carbon footprint.
This standardization is crucial for setting targets, monitoring progress, and implementing strategies for reducing emissions across industries and sectors.
What could be the standardized metric for units of cultural impression in GenAI. The token?
The token could indeed serve as a standardized metric for units of cultural impression in GenAI systems. Here’s why it makes sense:
- Granularity: Tokens are the basic units of text that a GenAI model processes. They can represent words, parts of words, or even individual characters, depending on the tokenization process. Because tokens are the building blocks of the model's input and output, they directly reflect the amount of textual content being processed.
- Measurement: Just as CO₂e allows for the aggregation of different greenhouse gases into a single metric, tokens could be aggregated to measure the extent of cultural impressions embedded in or generated by a GenAI system. The number of tokens processed or generated can be used to quantify the influence of cultural content within the model.
- Versatility: Tokens are language-agnostic and can be used across different types of text, making them a versatile unit for measuring cultural impressions in a global context. This is similar to how CO₂e is a universal metric for various greenhouse gases.
- Input Tokens: When a GenAI model is trained, each token from the training data represents a small unit of cultural impression, contributing to the model's understanding and ability to generate text. The sum of these input tokens across a dataset could be seen as the cumulative cultural impression embedded in the model.
- Output Tokens: During interaction with a GenAI system, the tokens generated in response to user input reflect the cultural knowledge and impressions synthesized by the model. The number of output tokens could be used to quantify the cultural footprint of a specific interaction or piece of generated content.
- Cultural Footprint Accounting: Just as organizations track carbon emissions, one could track the "cultural footprint" of a GenAI system by measuring the number of tokens processed and generated. This could help in understanding the scale of cultural content embedded within a model and its influence.
- Diversity and Balance: Analyzing the distribution of tokens across different languages, topics, or cultural contexts could provide insights into the diversity and balance of cultural impressions in the model, helping to ensure that no single culture or perspective is disproportionately represented.
In summary, the token is a practical and meaningful unit of measurement for cultural impressions in GenAI. It provides a standardized way to quantify the cultural content embedded within and generated by AI models, much like CO₂e serves as a standardized metric for carbon emissions.
Let's quantify the impressions for Llama 3.1 (https://ai.meta.com/blog/meta-llama-3-1/).
Searched 3 sites
The Llama 3.1 model from Meta processed over 15 trillion tokens during its training. If we consider tokens as a unit of "cultural impressions," this figure provides a rough estimate of the vast amount of human knowledge and cultural input embedded in the model.
To put it in perspective, each token represents a fragment of text, such as a word or part of a word, drawn from a wide array of sources during training. The 15 trillion tokens embedded in Llama 3.1 encompass an extensive range of cultural impressions, reflecting diverse human contributions across languages, disciplines, and contexts.
This enormous dataset not only fuels the model's capability but also highlights the breadth of cultural impressions it encapsulates, making it a powerful tool for generating and interpreting human language across different cultures and domains(IBM - United States)(MarkTechPost).
What do we know about the instruction tuning tokens?
Searched 4 sites
Llama 3.1, particularly in its instruction-tuned variant, processed a significant amount of data to enhance its ability to follow instructions and generate helpful, aligned responses. Specifically, the instruction tuning involved fine-tuning the model on datasets that included over 10 million human-annotated examples. This fine-tuning process is designed to align the model more closely with human preferences, improving its performance in generating coherent and contextually appropriate outputs across various dialogue-based applications.
This vast amount of annotated data, combined with the 15 trillion tokens used during pretraining, reflects the extensive "cultural impressions" embedded in the model. The instruction tuning tokens, therefore, represent a layer of carefully curated and human-guided content that further refines the model's outputs, making it more adept at understanding and responding to user inputs in a way that aligns with human expectations and ethical standards.
This process highlights the depth of human input involved in creating models like Llama 3.1, ensuring that they are not just vast repositories of information but also finely tuned tools for communication and problem-solving(Hugging Face)(Together AI)(Cloudflare Docs).
We need to turn "over 10 million human-annotated examples" into a concrete token account. Let's reason through a plausible estimate.
To estimate the number of tokens associated with the instruction tuning data for Llama 3.1, we can reason through the typical characteristics of such datasets.
-
Understand the Structure of Human-Annotated Examples:
- Human-annotated examples usually involve pairs of inputs and outputs or multiple-choice questions, dialogue pairs, etc.
- Each example could be anything from a short sentence (a few tokens) to a longer paragraph or conversation (potentially hundreds of tokens).
-
Average Token Count Per Example:
- For dialogue-based models, like Llama 3.1’s instruct-tuned variant, an average example might include a user prompt and a corresponding model response.
- A conservative estimate for such examples could be 30-50 tokens per example (including both the input prompt and the model’s output). This assumes shorter, instruction-focused interactions typical of dialogue and response datasets.
-
Total Token Estimate:
- Given "over 10 million human-annotated examples":
- Lower Bound: If each example averages 30 tokens, this would result in approximately 300 million tokens.
- Upper Bound: If each example averages 50 tokens, this would result in approximately 500 million tokens.
- Therefore, a reasonable estimate for the instruction-tuning token count would likely range between 300 million to 500 million tokens.
- Given "over 10 million human-annotated examples":
Based on this reasoning, the instruction tuning process for Llama 3.1 likely involved an additional 300 million to 500 million tokens of curated, human-annotated data. This contributes to the model's cultural impressions, further refining its ability to generate contextually appropriate and ethically sound responses.
Using this conversation as a model, let's quantify the tokens for system prompt, user profile text / context, and initial query.
To quantify the tokens for each element in this conversation, we can break down the text into manageable parts and estimate the token count. Let’s go through each element:
- Content: The system prompt typically includes guiding instructions for the AI model, ensuring it aligns with the intended interaction style, behavior, and ethics.
- Estimated Token Count: System prompts are generally concise but comprehensive. They might be around 100-200 words, which translates to approximately 150-300 tokens.
- Content: This includes the user profile and any specific context or memory that the model references during the interaction. The user profile text might include details about preferences, professional background, and conversational style.
- Estimated Token Count: Depending on the length and detail of the profile, this could range from 50-150 words, or about 75-225 tokens.
- Content: The user’s first question or prompt that initiates the conversation. In this case, the query was about "embedded emissions" and how it relates to "embedded humanity."
- Estimated Token Count: The initial query was relatively brief, likely around 15-30 words, which would be about 20-45 tokens.
- System Prompt: 150-300 tokens
- User Profile / Context: 75-225 tokens
- Initial Query: 20-45 tokens
- 245 to 570 tokens
This range gives us a plausible estimate for the token count involved in setting up and initiating this conversation. Each token represents a small unit of cultural impression, contributing to the AI's understanding and response generation.
Stack all of these up in a concise table (including the pretraining and instruct training contributions).
Here’s a concise table summarizing the estimated token contributions for different elements related to Llama 3.1, including pretraining, instruction tuning, and this specific conversation setup:
Element | Estimated Tokens | Description |
---|---|---|
Pretraining | ~15 trillion | Tokens processed during the initial training phase, drawing from a vast corpus of text data. |
Instruction Tuning | ~300 million to 500 million | Tokens used in the fine-tuning process with human-annotated examples to improve responses. |
System Prompt | ~150 to 300 | The guiding instructions provided to align the model's behavior during interactions. |
User Profile / Context | ~75 to 225 | User-specific information that informs the model's responses based on preferences and history. |
Initial Query | ~20 to 45 | The user’s opening question or statement that initiates the conversation. |
This table provides a clear overview of the different sources of tokens, illustrating the scale of cultural impressions embedded in a GenAI model like Llama 3.1.
Looking at this conversation so far, estimate the tokens of impression I've made as the sum across all of my replies to your generated messages. To do this, repeat the first few words of each of my messages (truncating) followed by an estimated token count for the whole message. Then estimate the sum.
Let’s go through each of your messages with an estimated token count:
-
"You know the concept..."
- Estimated Tokens: ~10
-
"I'm trying to adapt this idea..."
- Estimated Tokens: ~15
-
"Take this very conversation..."
- Estimated Tokens: ~10
-
"What are some other products..."
- Estimated Tokens: ~15
-
"Ah, the connection to recycling..."
- Estimated Tokens: ~15
-
"Ah, maybe we want our data products..."
- Estimated Tokens: ~20
-
"In some narrative universes..."
- Estimated Tokens: ~10
-
"ChatGPT, I think your toes..."
- Estimated Tokens: ~5
-
"Which of all these ideas..."
- Estimated Tokens: ~10
-
"We're straying from the central idea..."
- Estimated Tokens: ~15
- "The social cost of dehumanization, yikes..."
- Estimated Tokens: ~5
- "What could be the standardized metric..."
- Estimated Tokens: ~10
- "Let's quantify the impressions for Llama 3.1..."
- Estimated Tokens: ~10
- "What do we know about the instruction..."
- Estimated Tokens: ~10
- "We need to turn 'over 10 million human-annotated examples'..."
- Estimated Tokens: ~15
- "Using this conversation as a model..."
- Estimated Tokens: ~10
185 tokens
This estimate sums up the tokens from your contributions throughout the conversation, which reflects your cultural impressions as part of this exchange.
Use a similar process to estimate your non-human contribution to the conversation (chatgpt's replies to my messages). We can quantify this contribution in tokens, but it is not representing an original cultural impression because we are presuming that your tokens are just conveyances for the past human contributions already accounted for above.
Let’s estimate the token count for my (ChatGPT's) responses to your messages using a similar process:
-
"Yes, I’m familiar with the concepts..."
- Estimated Tokens: ~50
-
"That’s a compelling adaptation..."
- Estimated Tokens: ~85
-
"Analyzing the sources of 'embedded humanity'..."
- Estimated Tokens: ~150
-
"These elements further illustrate..."
- Estimated Tokens: ~120
-
"The concept of 'embedded humanity'..."
- Estimated Tokens: ~80
-
"You’re right. If instruct tuning is based..."
- Estimated Tokens: ~50
-
"The concept of 'embedded humanity'..."
- Estimated Tokens: ~100
-
"The concept of 'embedded humanity'..."
- Estimated Tokens: ~120
-
"The Elder Scrolls from The Elder Scrolls ..."
- Estimated Tokens: ~85
-
"You’re right; my examples leaned heavily..."
- Estimated Tokens: ~60
- "The Elder Scrolls might be the closest match..."
- Estimated Tokens: ~85
- "Yes, for carbon emissions..."
- Estimated Tokens: ~30
- "The token could indeed serve as a standardized..."
- Estimated Tokens: ~80
- "Llama 3.1, particularly in its instruction-tuned variant..."
- Estimated Tokens: ~95
- "To estimate the number of tokens associated..."
- Estimated Tokens: ~130
- "Here’s a concise table summarizing the estimated..."
- Estimated Tokens: ~40
1,360 tokens
This estimate represents the total token count for all my replies in this conversation. These tokens serve as a medium to convey the cumulative human cultural impressions embedded in the AI model, rather than representing original cultural contributions.
Suppose you are introducing the concept of embodied cultural impressions to someone who shares my techno-cultural background but has not heard of the idea before. How would you get them to understand the concept and apply this kind of accounting? Er, don't tell me how you would approach the concept introduction. Instead, just write the concept introduction here.
Introduction to Embodied Cultural Impressions in Generative AI
Imagine you're interacting with a sophisticated AI system like ChatGPT. On the surface, it seems like you're just engaging with a machine that generates responses based on your input. However, beneath this interaction lies a profound concept: embodied cultural impressions.
Generative AI models like ChatGPT are built on massive datasets containing vast amounts of text from a wide range of sources—books, articles, websites, conversations, and more. Each piece of text in these datasets represents a fragment of human thought, creativity, and culture. When an AI model is trained on this data, it absorbs and integrates these fragments, creating a system that can generate responses that reflect the collective knowledge, values, and expressions of humanity.
These fragments of human contribution can be thought of as cultural impressions—marks left by individuals across time and space. When you interact with an AI, the responses it generates are not purely mechanical; they are shaped by these embedded cultural impressions. The AI's output is, in essence, an amalgamation of countless human voices, perspectives, and experiences that have been processed and reconstituted into new forms of expression.
To better understand and manage these cultural impressions, we can use a method similar to how we account for carbon emissions. Just as carbon emissions are measured in metric tons of CO₂e, cultural impressions in AI can be quantified by counting tokens—the basic units of text that an AI processes and generates.
For example, consider the following breakdown:
- Pretraining: The model is trained on trillions of tokens, each representing a small piece of cultural content.
- Instruction Tuning: The model is further refined using millions of human-annotated examples, adding layers of culturally rich, context-specific knowledge.
- Interactive Contributions: When you interact with the AI, your queries and the AI's responses contribute additional tokens to the conversation, further shaping the cultural impressions involved.
By understanding the concept of embodied cultural impressions, we can appreciate the human labor and creativity embedded in AI systems. This awareness encourages responsible and ethical use of AI, ensuring that these systems honor the diversity and richness of the cultural content they embody. Additionally, it prompts us to consider how we might manage these impressions—by curating the data, ensuring representation, and avoiding the re-transmission of irrelevant or misattributed details.
In summary, embodied cultural impressions highlight the deep connection between AI outputs and the vast, collective human input that informs them. This concept not only helps us better understand AI behavior but also underscores the importance of ethical stewardship in AI development and use.
In a lengthy conversation like this one (starting from the system prompt, my profile text, etc) how do we account for the tokens of human cultural impression? What gets counted and what does not? (And why?)
In a lengthy conversation like this one, accounting for the tokens of human cultural impression involves distinguishing between the types of contributions and understanding their origins. Here’s how we can approach it:
-
User Inputs (Queries and Comments):
- Why: These are direct cultural impressions provided by the user. Each query, comment, or piece of information you contribute is a fresh cultural impression, representing your thoughts, ideas, and perspective. These are counted because they are original, human-generated inputs.
- Examples: Your initial query, follow-up questions, and clarifications.
-
System Prompt:
- Why: The system prompt is a form of cultural impression because it was crafted by AI developers, who embedded their knowledge, ethics, and expectations into the AI’s behavior. It represents the collective input of those who designed and tuned the model for specific interactions.
- Examples: Instructions that guide the AI on how to respond in a conversation.
-
User Profile and Context (Model Set Context):
- Why: These are user-generated cultural impressions that reflect your preferences, background, and previous interactions. They are counted because they influence how the AI tailors its responses to your specific context, making each conversation more personalized and relevant.
- Examples: The information you’ve provided about your communication style, preferences, and previous topics of interest.
-
Instruction Tuning Data:
- Why: The instruction tuning data is a significant source of cultural impressions, derived from human-annotated examples. These examples were carefully selected and refined to ensure the AI aligns with human preferences for helpfulness, safety, and accuracy. Each token from this process carries the weight of human judgment and expertise.
- Examples: Tokens from datasets specifically designed to improve the model's ability to follow instructions and respond in a human-aligned manner.
-
Model's Output (AI's Replies):
- Why: The AI’s replies, while containing tokens, do not represent new cultural impressions but rather a recombination and reprocessing of the impressions already embedded in the model. These outputs are essentially conveyances of the existing human contributions encoded during pretraining and instruction tuning.
- Examples: All responses generated by the AI in this conversation.
-
Repeated or Reprocessed Content:
- Why: If the AI simply repeats or paraphrases existing content without adding new context or interpretation, these tokens are not counted as new cultural impressions. They are considered derivative of the original input.
- Examples: AI-generated summaries or restatements of information already provided by the user or within the model’s training data.
- Originality and Human Contribution: Tokens are counted as cultural impressions when they represent a direct, original contribution from a human. This includes new inputs, decisions, or preferences that add to the conversation's cultural context.
- Reprocessing of Existing Data: Tokens generated by the AI are not counted as they do not represent new human contributions. Instead, they reflect the processing and synthesis of pre-existing cultural impressions.
This approach ensures that we focus on the meaningful human input that shapes the conversation, while acknowledging that the AI’s role is to facilitate and reflect these impressions, not to create new ones.
It seems like you mostly get it. As I look back, the token counting approach feel uncomfortably reductive. I know that some tokens are more important than others. However, maybe this is something we just need to accept in order make the accounting simple enough to be easily applicable. Help me understand why some tons of carbon emission are similarly more or less worse than others despite the uniform treatment they get in carbon accounting.
You're correct that token counting can feel reductive, as it simplifies complex human contributions into mere numbers. This reductive approach, however, allows us to create a manageable framework for quantifying and comparing cultural impressions, much like how carbon accounting works.
In carbon accounting, while emissions are measured uniformly in metric tons of CO₂e, not all emissions have the same impact. The uniform treatment helps simplify calculations and comparisons, but there are nuances:
-
Type of Greenhouse Gas:
- Global Warming Potential (GWP): Different gases have different GWPs. For example, methane (CH₄) has a GWP that is 25 times higher than CO₂ over a 100-year period, meaning that one ton of methane emissions is much worse than one ton of CO₂ in terms of climate impact. Despite this, they are converted to CO₂e for simplicity.
- Uniform Metric: CO₂e allows all emissions to be compared on a single scale, even though some emissions are more harmful in the short term (e.g., methane) or have different effects on climate systems.
-
Source of Emissions:
- Fossil Fuels vs. Biogenic Sources: Emissions from burning fossil fuels add new carbon to the atmosphere, while emissions from biogenic sources (like burning wood) are part of the natural carbon cycle. Fossil fuel emissions are typically considered worse because they contribute additional carbon that was previously stored underground.
- Timing and Location: Emissions released in certain contexts (e.g., during peak pollution periods or in vulnerable ecosystems) can have more severe environmental and health impacts. However, carbon accounting treats these emissions uniformly to maintain simplicity and comparability.
-
Potential for Mitigation:
- Short-lived Climate Pollutants: Some emissions, like methane, are short-lived but potent. Reducing these emissions can have a rapid, positive effect on climate stabilization. Despite their higher impact, they are treated the same in CO₂e accounting because it simplifies policy-making and tracking.
- Carbon Offset Projects: The effectiveness of carbon offset projects can vary widely. Some projects (e.g., reforestation) might sequester carbon over long periods, while others might only offer short-term benefits. Yet, these are uniformly counted in carbon accounting systems to facilitate broad application and adoption.
While it is true that not all tokens (or tons of carbon) are created equal, the simplification into a uniform metric like tokens or CO₂e has practical benefits:
- Ease of Comparison: A uniform metric allows for straightforward comparisons across different sources, contexts, and impacts. This is crucial for broad applications like regulatory compliance, policy-making, and public awareness.
- Scalability: Simplified metrics are easier to apply at scale, whether you're measuring the cultural footprint of AI across billions of interactions or tracking emissions across industries.
- Focus on Aggregate Impact: By focusing on the total number of tokens or tons, we can still drive meaningful action, even if some nuances are lost. The aggregate impact is what ultimately drives climate change or shapes the cultural outputs of AI.
In both cases, the simplicity of the metric allows us to make sense of complex systems and drive actions that might otherwise be bogged down by the details. It's a trade-off, but one that can be justified by the need for practical, scalable solutions.