What to Build
Develop an interactive tool or agent that leverages NYC datasets and Gen AI to address civic issues, increase transparency, and surface inequities.
Live Agents
Real-time Interaction (Audio/Vision)
Build an agent that users can talk to naturally can be interrupted. This could be a real-time translator, a vision-enabled customized tutor that "sees" your homework, or a customer support voice agent that handles interruptions gracefully.
Creative Storyteller
Multimodal Storytelling with Interleaved Output
Build an agent that thinks and creates like a creative director, seamlessly weaving together text, images, audio, and video in a single, fluid output stream. Leverage Gemini's native interleaved output to generate rich, mixed-media responses that combine narration with visuals, explanations with generated imagery, or storyboards with voiceover, all in one cohesive flow. Examples include Interactive storybooks (text + generated illustrations inline), marketing asset generator (copy + visuals + video in one go), educational explainers (narration woven with diagrams), and social content creator (caption + image + hashtags together).
Mandatory Tech: Projects must be hosted on Google Cloud, and use Google GenAI SDK/Agent Development Kit.
Bonus Points: Industry Use Cases
Bonus points are awarded to projects that demonstrate practical industry applications. Here are some possible use cases:
Sports
Example: Real-time game analytics agent, personalized fan engagement platform, athlete performance tracker using multimodal inputs.
Finance
Example: AI financial advisor, automated document parsing for loan approvals, fraud detection assistants, market trend analysis.
Retail
Example: Voice-powered personal shoppers, smart inventory management agents, dynamic personalized marketing asset generators.
Healthcare
Example: Medical transcription assistants, patient symptom triage agents, multimodal diet, workout, and health planners.
Engineering
Example: Code review and debugging agents, architectural diagram generation from text, interactive technical documentation tutors.
Judging Criteria
Innovation & Multimodal User Experience (40%)
- The "Beyond Text" Factor: Does the project break the "text box" paradigm? Is the interaction natural, immersive, and superior to a standard chat interface? Does the agent "See, Hear, and Speak" in a way that feels seamless?
- Fluidity: Is the experience "Live" and context-aware, or does it feel disjointed and turn-based?
Technical Implementation & Agent Architecture (30%)
- Google Cloud Native: Does the code effectively utilize the Google GenAI SDK or ADK?
- System Design: Is the agent logic sound? Does it handle errors, API timeouts, or edge cases gracefully?
- Robustness: Does the agent avoid hallucinations? Is there evidence of grounding?
Demo & Presentation (30%)
- The Story: Does the video clearly define the problem and the solution?
- The Proof: Is the architecture diagram clear? Is there visual proof of Cloud deployment in the video or submission materials?
- The "Working" Factor: Does the video show the actual software working (not just mockups)?
The Judging Flow
Participants are split into 2-3 equal groups. Each team has 5-8 minutes to explain and demo their work to a subset of judges.
Judges select the top 6 teams (2-3 from each group) to move forward to the grand finale.
Finalists present to the entire room and all judges for the final call and winner announcement.
Strictly Prohibited
- AI Mental Health/Medical Advisors
- Basic RAG apps ("Chat with my PDF")
- Standard Education Chatbots
Setup Instructions
Provision Workshop Resources
- Open a new Incognito window.
- Paste and visit your event link: goo.gle/TBD
- Accept the Google Cloud Platform Terms of Service.
Submit Your Work
Submit your team details and repository link. You can submit multiple times; we'll count the latest one.
Recent Submissions
| Team Name | Project Name |
|---|