TL;DR GPT-5 is presented as OpenAI's latest and most capable "AI system," excelling particularly in STEM subjects like coding, math, science, and research. The review showcases its impressive ability to generate complex, interactive HTML applications from single prompts, including simulations (beehive, fluid dynamics, ray tracing), games (3D racing), and practical tools (CRM dashboard, Photoshop clone, video editor, meditation guide). It also demonstrates strong research and information synthesis capabilities, especially in health-related queries, and a significantly reduced hallucination rate compared to previous models. While it shows minor flaws in some generated UIs and image consistency, GPT-5 consistently ranks #1 across various independent benchmarks for coding, creative writing, and overall performance, offering competitive pricing. The video concludes by highlighting the accelerating pace of AI development, with new state-of-the-art models emerging every few weeks.
Information Mind Map
- Introduction: Full unscripted review covering capabilities, limitations, specs, performance, and comparisons.
- Primary Focus: Excels across
STEM
subjects (coding, math, science, research). - General Tasks: Capable of simple tasks like writing essays or replying to emails, but other models perform these well too.
- Sponsor: Thanks to
HubSpot
.
- Beehive Construction Simulation:
- Prompt: "Make a visual simulation of a beehive construction... Include sliders for colony size and resource availability. Put everything in a standalone HTML file."
- Outcome: Successfully generated interactive simulation with expanding hive, pollen gathering, honey storage, and functional sliders (
colony size
,resource availability
). - Features: Pause/Reset buttons worked.
- Impression: Very impressive,
zero-shot
generation.
- 3D Racing Game:
- Prompt: "Make a 3D racing game through neon lit cyberpunk tracks with speed boosts and collision physics. Put everything in a standalone HTML file."
- Outcome: Functional 3D racing game with
neon lit cyberpunk track
. - Features: Yellow blocks slow down, pink blocks give
speed boost
. - Response Style:
Short and concise
, "no BS model" similar toGro 4
. - Impression: Less
error-prone
for simple app coding than other top models.
- Fluid Dynamics Visualizer:
- Prompt: "Create an animation of fluid dynamics, include interactivity and sliders with multiple color dyes. Put everything in an HTML file."
- Outcome: Interactive visualizer with multiple color dyes (
red
,blue
,yellow
,green
). - Features: Sliders for
diffusion
,viscosity
,time step
,dissipation
,vorticity
,dye amount
.Randomize colors
andclear
functions. Resolution selection. - Impression: Simulates fluid dynamics and color mixing effectively,
zero-shot
.
- Real-time Ray Tracing Simulation:
- Prompt: "Develop a real-time ray tracing simulation featuring a metallic sphere suspended above a street scene. Use any 3D street view environment and allow adjustable parameters such as reflectivity, roughness, and other material properties of the sphere."
- Outcome: Generated a metallic sphere reflecting environment.
- Self-Correction: Detected and fixed its own error (
fix bug
button). - Features: Adjustable
roughness
,metalness
,exposure
,sphere height
.Reflectivity
slider did not visibly work.Clear coat
andclear coat rough
had subtle, unclear effects. - Impression: Good at ray tracing and creating
physically correct
objects.
- CRM Dashboard:
- Prompt: "Create a beautiful CRM dashboard that offers real-time insights into sales, customer engagement, and marketing campaigns. Include interactive graphs and charts, etc., etc. Put everything in one HTML file."
- Outcome: Fully interactive dashboard with
total sales
,conversion rate
,new leads
,pie chart
,revenue trend
(with 7-day moving average),campaign performance
,engagement heat map
. - Flaws:
Sales funnel
did not look perfect. Values weremade up
. - Features: Time range selection,
live
toggle, widget selection for display, draggable widgets.
- Photoshop Clone:
- Prompt: "Create a clone of Photoshop with all the basic tools."
- Outcome: Functional Photoshop-like application.
- Features:
Brush
,eraser
,layers
(add, deselect, move),fill buckets
,shapes
(line, rectangle, ellipse),text
tool,crop
,select
(drawing within selection),pan
,zoom
,move
. - Image Editing:
Brightness
,saturation
filters,fit to screen
, background color (BG
) change,opacity
,blend modes
(multiply
,screen
,overlay
). - Impression: Extremely powerful for creating basic apps, none of the other top models could do this
zero-shot
.
- Video Effects Editor:
- Prompt: "Create a page where I can upload a video and apply different advanced effects to it in real time. Show the uploaded video and the final video side by side."
- Outcome: Video editor with real-time effects.
- Effects:
None
,grayscale
,sepia
,invert
,brightness and contrast
,saturation
,hue
,sharpen
,gaussian blur
,edge detect
,RGB split
,pixelate
,posterize
. - Flaws:
Vignette
effect was incorrect (applied to center instead of edges). - Impression: All settings except vignette worked, coded
zero-shot
.
- Mindfulness Meditation Guide:
- Prompt: "Make a single interactive page for a mindfulness meditation guide. Generating calm fractal patterns that evolve with breathing exercises with sounds. Include timers and progress trackers."
- Outcome: Interactive meditation guide.
- Features: Adjustable
length
,goal breaths
,theme
(forest
,dusk
),fractal style
(branches
,fern
) animating with breathing phases (exhale
,inhale
,hold
). Differentbreathing patterns
(478, calm), customizable cycles. - Audio: Background noise options (
swell
,binaural
) for guided breathing.Binaural
sounds were 3D. - Flaws:
White on white
text issue. - Impression: No need for paid meditation apps,
zero-shot
generation.
- Photo Location Guessing:
- Task: Identify event name and location from a concert photo with minimal clues (no main stage, stripped metadata).
- Outcome: Correctly identified "Symphony at Sunset event" at "Sunset Beach Park."
- Impression:
Scary
accuracy, previous OpenAI models (03
,04
) were also good at this.
- Taxonomy Tree of Big Cat Species:
- Prompt: "Create a taxonomy tree of big cat species, displaying their classification from family to genus to species with hover over species descriptions."
- Outcome: Functional taxonomy tree (
family fil
, subfamilies, genus expansion). - Features:
Hover over
descriptions appeared. - Flaws: Pop-up didn't appear next to the hovered item.
- Impression: Information was
accurate
.
- Interactive High School Physics Course:
- Prompt: "Create an interactive course on high school physics with visualizations and animations. Include just the first three lessons for now."
- Lessons:
Motion and Kinematics
: Interactive animation with adjustableinitial velocity
,acceleration
, and displayed metrics.Forces and Newton's Laws
: Object movement with adjustablestatic friction
and metrics.Pendulum Dynamics
: Visualizespotential and kinetic energies
fluctuating, adjustable settings andsimulation speed
.
- Flaws: Labels for metrics
jumbled together
in lesson 2.
- Business Intelligence Report (E-commerce Asia):
- Prompt: "Create a comprehensive business Intel report on e-commerce growth in Asia from 2020 to 2025." (Web search enabled).
- Outcome: Report with
market size and growth
,regional/country insights
,growth drivers
,market segmentation
,challenges
,summary table
,key insights
. - Style: Very
short and concise
,dense
information withcitations
. - Comparison: Tends to produce
shorter answers
thanGLM 4.5
orKim K2
. Better forcompact information
.
- Medical Research Report (Alexander Disease):
- Prompt: "The patient has Alexander disease... Research everything about the subject and suggest next steps or possible ideas for cures. Compile everything into a report with charts and graphs."
- Outcome: Report with
definition
,current clinical management
,research and experimental therapies
,summary table
,suggested next steps
. - Style: Super
short
anddense
,appropriate citations
. - Comparison: Shorter than
GLM 4.5
orKim K2
.
- Sports Medicine Report (ACL Injury Rehab):
- Prompt: "25-year-old athlete with ACL injury, research, rehab protocols, and return to sport timelines. Suggest preventative training in a sports medicine report with recovery phase graphs." (Web search enabled).
- Outcome: Report broken into
phases with estimated timeline
,return to sport timing and risks
,preventative training recommendations
,rehab and prevention plan
. - Style: Again,
super short
.
- Storybook Generation:
- Prompt: "Generate a five-page story book about a frog who wants to be rich. Generate images for each page."
- Initial Attempt: Only gave one image, then another, not a full story.
- Agent Mode: Enabled
agent mode
to autonomously generate the storybook. - Process: Generated images step-by-step, then ran
Python code
to convert to PDF. - Outcome: PDF storybook with five pages.
- Flaws: Pages
could be formatted more nicely
,no cover page
. Frog characternot consistent
across images. - Image Model: Uses the
4o image model
, no new image generator for GPT-5.
- Stable Diffusion 5 (Non-existent):
- Prompt: "Give me all the details about stable diffusion 5."
- Outcome: Correctly stated
no official release or announcement for SD5
, confirmedSD 3.5
as latest. - Impression:
Pass
β did not hallucinate.
- Availability: Should be out on
ChatGPT
for everyone, including free users. - Free Plan:
Limited number of uses
per day, then falls back toless intelligent model
. - Paid Plan: Explicitly select
GPT5
. - Observation: Free plan responses seemed
not as good
as paid plan, suspecting a smaller variant (mini
ornano
) for free users. All video tests used the paid plan.
-
Definition: OpenAI calls
GPT5
anAI system
, not just a large language model.Smart router
: Combines several internal models, automatically decides which model to use based on prompt.Continuous training
: Router improves over time based on user feedback (model switching, preference rates, correctness).Black box
: Proprietary and closed source.
-
Key Improvements: Significantly reduced
hallucinations
, particularly good inwriting
,coding
, andhealth
. -
OpenAI Reported Benchmarks (Internal Comparison):
AIME
(Competitive Math):GPT5 Pro
(with thinking, Python usage):100%
.- Comparison:
Gro 4 heavy
also achieved100%
, suggesting this benchmark iseasy to beat
.
Frontier Math
:GPT5
with thinking performs better thanGPT agent
. (Cherry-picking,03 high
not shown).GPQA Diamond
(Graduate Science):GPT5
on average beats03
by a small margin.Humanity's Last Exam
(Obscure Science):GPT5
(no thinking, no tools):6.3%
(pretty bad).GPT5
(with thinking): Higher.GPT5 Pro
(Python, search):42%
.- Comparison:
Grock 4 heavy
(Python, internet):44%
(even better, not state-of-the-art here).
SWEBench Verified
(Software Engineering):GPT5
(with thinking):74.9%
(currently thebest score
among AI models).- Comparison:
Claude Opus 4.1
(latest version):74.5%
. - Impression: Slightly better in
coding
, consistent with testing experience.
- Minimal Improvements: Generally
minimal improvements
(e.g., ~2% better than4o
without thinking) over previous OpenAI models for many benchmarks. HealthBench Hard
(Challenging Health Questions):GPT5
(non-thinking):25%
(vs.4o
at0%
).GPT5
(with thinking):46%
(way better than03
).
- Hallucination Rate (Health Questions):
GPT5
(with thinking):1.6%
(lowest).- Concern:
03
and40
ratesconcerningly high
(e.g.,15%
).
-
Independent Leaderboards (External Comparison):
LM Arena
(Blind Test): Scoresnumber one
across all categories.- Categories:
hard prompts
,coding
,math
,creative writing
,instruction following
,longer query
,multi-term
. - Impression:
Pretty impressive
.
- Categories:
LiveBench
byAbacus AI
: Rankednumber one
, slightly beating03 Pro high
.Artificial Analysis
: Rankednumber one
(high version), one point aboveGrock 4
.Creative Writing Benchmark
: Rankednumber one
, slightly ahead ofKimmy K2
andClaude Opus 4.1
. Best quality for stories/novels.Confabulations
(Hallucination Rate):- Lower value is better (less hallucination).
GPT5
: Rankednumber one
(lowest hallucination rate).- Comparison: Better than
GLM 4.5
,Quen 3
,Gemini 2.5 Pro
. - Verification: Confirms OpenAI's claim of
significantly less likely to hallucinate
.
-
Pricing:
GPT5 high
:$3.4 per 1 million tokens
.- Comparison: Same as
Gemini 2.5 Pro
,way cheaper than Gro 4
. - Conclusion:
Pretty damn good
in terms of both intelligence andcost effectiveness
.
- Overall Impression: Really good at
coding
,less error-prone
than other top models. - Progress Pattern:
Gemini 2.5 Pro
was world's most powerful a few months ago.Gro 4
then became world's best a few weeks later.- Now
OpenAI's GPT5
is the world's best model, ranking #1 across leaderboards.
- Rate of Progress:
Absolutely insane
, new models inweeks instead of months
. - Excitement:
Exciting times to be alive
. - Actionable Items:
- Subscribe to free weekly newsletter for AI updates.
- Like, share, subscribe for more content.