Introduction
This article explains how the two “Playing with Generative AI” sample experiences in the Intuiface Examples catalog were built. To access these samples, use the "Examples" tab of the Experiences panel in Composer. You can also download them from their online Examples pages: Windows version / Non-Windows version.
The first version of the experience is optimized for running in Player on Windows, while the other version is optimized for running in Player on all its other supported platforms.
Generative AI is the use of artificial intelligence to create new text, images, code, and more in response to a prompt. Both experiences contain the same three examples of how to use generative AI within an Intuiface experience.
- The first example uses the OpenAI DALL-E large language model (LLM) to generate images
- The second uses the OpenAI GPT LLM to respond to a question
- The third uses the OpenAI GPT and Whisper LLMs to illustrate the power of natural language prompts for the creation of "smart" experiences.
About the experience
When either experience is run, it verifies that two API keys are present. If present, the user can choose between image generation, answering a question, and using a museum wayfinder.
Image Generation
For image generation, the user can interactively build a prompt by selecting items in three separate carousels. The prompt is created, and an image is generated.
Prompt
For answering a question, the user has two options:
- Select a pre-written question in a carousel.
OR - Scan a QR code with a phone, type in any prompt on that phone, and submit it to the running experience.
Using a Wayfinder
For the museum wayfinder, we've provided a variation of an actively deployed experience found in the Museum of Flight. Verbally specify what it is you want to see in the museum, and the experience will tell you which part of the museum to visit.
To use it, tap and hold the microphone icon. Using your device's default microphone, speak about a topic you'd like to learn - e.g., "Where can I learn more about the Space Race?". Release your mouse (or finger), and you will be presented with an explanation of which gallery to visit, accompanied by a point-of-interest icon placed over the appropriate gallery in the map.
Playing the experience
Run it in Composer
To successfully run the experience in Composer, you must select the appropriate version of Play Mode:
- "Playing with Generative AI - Windows" version
In the Composer "Project" menu, select "Simulate Player on Windows" - "Playing with Generative AI - Player Next Gen" version
In the Composer "Project" menu, select "Simulate Player on all other Platforms (Web, Android, etc.)".
How it works
This experience uses three Generative AI models built by OpenAI: 1) DALL-E for image generation, 2) GPT for answering questions (the same model used by ChatGPT), and 3) Whisper for transcribing audio into text.
Each model is accessed through a corresponding Web API:
-
DALL-E is accessed via the OpenAI Images API
- GPT is accessed via the Chat Completion API
- Whisper is accessed via the Audio Transcription API
Specifying OpenAI Key and Intuiface Credential Keys
- To use any OpenAI large language model, you must include an OpenAI API key with the request.
- To use Intuiface's web triggers feature, which makes it possible to submit questions from a mobile phone in the "Answer a Question" scene, you must include a web trigger credential key.
Both experiences require one API key and one credential key. You must add these keys before running these experiences. If either key is missing, the experience will display a warning message immediately after launch.
NOTE: You must enter a valid OpenAI API key, or none of the DALL-E/GPT/Whisper calls will work. However, entering a valid credential key is unnecessary if you don't want to test question submission via mobile phone. In this case, just enter a random string of letters and numbers to get passed the key check that occurs when running the experience.
API Key
Every new OpenAI account can create unlimited keys, which can be collectively used to process a preset number of image requests, question prompts, and audio transcriptions at no cost. This limit is measured in terms of "tokens". Additional tokens can be purchased for use beyond the free limit.
For any OpenAI account, you will find your API key(s) on the API key page.
Using Composer, paste your OpenAI API key into the interface asset named "OpenAI API Key":
Credential Key
To facilitate communication between a mobile phone and the running Intuiface experience, the experience must contain one of your Intuiface account’s credential keys. This key uniquely identifies your experience among all the experiences running worldwide, ensuring personal devices can communicate with it directly.
To create a credential key, log into your Intuiface account and visit the Credential Keys page on My Intuiface. Click the “Create new credential key” button and create a key whose scope includes (but doesn’t have to be limited to) “Web Triggers”. Copy the resulting key.
Using Composer, paste your Credential Key into the interface asset named ‘WebTrigger Credential Key’:
DALL-E Interface Asset for Image Generation
The DALL-E interface asset was created using API Explorer. This interface asset (IA) is identical in both experience versions.
As with all interface assets created using API Explorer, this IA can be edited anytime by right-clicking it and selecting "Edit in API Explorer".
As currently configured, a 512x512 image is returned.
ChatGPT Interface Asset for Question Answering
The ChatGPT interface asset was hand-coded. There are two versions of this IA:
- .NET-based, created for the Windows version of this experience. The interface asset is named ChatGPT - Player on Windows.
- TypeScript-based, created for the non-Windows version of this experience. The interface asset is named ChatGPT - Player on all other platforms.
Both ChatGPT IAs were hand-coded because the Chat Completion Web API includes arrays, a feature not yet supported in API Explorer. Both the .NET and TypeScript versions of the IA have also been compiled, so their code cannot be viewed or edited. However, if you'd like to see the code, you will find it on GitHub.
Whisper Interface Asset for Audio Transcription
The Whisper interface asset was hand-coded. There are two versions of this IA:
- .NET-based, created for the Windows version of this experience. The interface asset is named Whisper - Player on Windows.
- TypeScript-based, created for the non-Windows version of this experience. The interface asset is named Whisper - Player on all other platforms.
Both Whisper IAs were hand-coded because the Audio Transcription Web API includes arrays, a feature not yet supported in API Explorer. Both the .NET and TypeScript versions of the IA have also been compiled, so their code cannot be viewed or edited. However, if you'd like to see the code, you will find it on GitHub.
Prompt Construction
Like all Generative AI systems, DALL-E and GPT respond to a prompt - a complete sentence (or question) specifying the desired information. These sample experiences generate prompts using a variety of methods.
DALL-E Image Generation
For DALL-E image generation, the prompt is created by combining the style, flower, and weather selections with the help of a Text Concatenation Interface Asset. The “Create Image” button then triggers an action, which is a call to the DALL-E Web API, using the OpenAI API key and prompt. (click image to enlarge)
GPT Question Response
For the GPT question response, the prompt is created using one of two methods:
- Through submission of a prewritten question found in the scene.
- Through the submission of a prompt written on a mobile device.
This question is assigned to two Text Assets.
- "Prompt - Display only", whose value is displayed onscreen.
- "Prompt Received - DO NOT DELETE", whose value is the prompt with the words "In one sentence, " added to the front. This hidden enhancement to the prompt is used to minimize token consumption by GPT when it generates a response.
The value of "Prompt Received - DO NOT DELETE" is submitted to GPT when the "Send Question" button is pressed.
Museum Wayfinder
For the wayfinder, the prompt is comprised of the text returned by Whisper after it transcribes the spoken audio. (Whisper is configured to recognize and transcribe any language, but the wayfinder response will always be in English.)
As soon as the scene is entered, the "Send system guidance" action of the GPT IA is used:
The system guidance is the elaborate prompt used to "teach" GPT how to be a wayfinder for the Museum of Flight:
This spoken prompt is sent to GPT as soon as the Whisper IA receives the transcription:
Receiving questions from a mobile device for the Prompt scene
After scanning the QR code in the GPT scene, the experience loaded in the personal mobile device can send any prompt to the running experience using web triggers. The webpage itself is a simple experience built using Composer and deployed to the web. Our article about web triggers discusses how communication between the mobile device and the main experience is accomplished.
In Composer, the "Web Triggers" Interface Asset constantly listens for a message from a mobile device. Its "Message is received" trigger is tripped whenever a prompt sent from a mobile device is detected. This prompt is displayed onscreen and then submitted to the GPT service with the OpenAI API key.
Comments
0 comments
Please sign in to leave a comment.