Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Dec 9, 2024
1 parent 957d25f commit b7e4b7d
Show file tree
Hide file tree
Showing 2 changed files with 174 additions and 66 deletions.
134 changes: 68 additions & 66 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated December 08, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated December 09, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -122,66 +122,12 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>8</pre>
<pre>9</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Document OCR</h2>
<p>Can GPT-4V read a document and return the exact characters in the text?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
<p>Fail</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We send a screenshot of a typed document to determine if it can correctly read the text. If it correctly gets the text, it gets a 100%. Otherwise, it gets a 0%.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
Read the text in the image. Return only the text, with punctuation.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/swift.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
Expand Down Expand Up @@ -230,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.5, 'y': 0.37, 'width': 0.2, 'height': 0.25}</pre>
<pre>{'x': 0.44, 'y': 0.33, 'width': 0.28, 'height': 0.35}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -359,9 +305,9 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"R": 80,
"R": 88,
"G": 0,
"B": 130
"B": 140
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -403,7 +349,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -417,9 +363,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the provided image, all visible cars appear to have red bounding boxes around them, indicating they are already annotated. There doesn't seem to be any missing annotations for cars.
<pre>Based on the image, all visible cars have been annotated with red bounding boxes. There don’t seem to be any missing annotations.

Here is the corresponding JSON response:
Here’s the JSON response:

```json
{
Expand Down Expand Up @@ -465,7 +411,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -479,7 +425,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
<pre>The sticker appears to be a square approximately **3 inches by 3 inches** based on the ruler in the image. Here’s the JSON representation:

```json
{
"length": 3.0,
"width": 3.0
Expand Down Expand Up @@ -552,7 +500,61 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Document OCR</h2>
<p>Can GPT-4V read a document and return the exact characters in the text?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>Pass</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">We send a screenshot of a typed document to determine if it can correctly read the text. If it correctly gets the text, it gets a 100%. Otherwise, it gets a 0%.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
Read the text in the image. Return only the text, with punctuation.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/swift.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
Expand Down
106 changes: 106 additions & 0 deletions results/2024-12-09.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 3.5639283657073975,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.007870000000000002,
"pass_fail": "Fail",
"response_time": 2.1310863494873047,
"result": "9"
},
"document_ocr": {
"score": 1,
"success": true,
"price": 0.008539999999999999,
"pass_fail": "Pass",
"response_time": 2.5311460494995117,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 3.654644727706909,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 2.86663556098938,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015290000000000002,
"pass_fail": "Pass",
"response_time": 3.3656280040740967,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.47648514851485174,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 2.1066975593566895,
"result": "{'x': 0.44, 'y': 0.33, 'width': 0.28, 'height': 0.35}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01079,
"pass_fail": "Fail",
"response_time": 2.9440698623657227,
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
},
"color_recognition": {
"score": 0.9620915032679739,
"success": false,
"price": 0.008870000000000001,
"pass_fail": "Fail",
"response_time": 3.39060378074646,
"result": "```json\n{\n \"R\": 88,\n \"G\": 0,\n \"B\": 140\n}\n```"
},
"annotation_qa": {
"score": 0.0,
"success": false,
"price": 0.01626,
"pass_fail": "Fail",
"response_time": 3.0744924545288086,
"result": "Based on the image, all visible cars have been annotated with red bounding boxes. There don\u2019t seem to be any missing annotations.\n\nHere\u2019s the JSON response:\n\n```json\n{\n \"missing\": 0\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.00967,
"pass_fail": "Fail",
"response_time": 3.5515594482421875,
"result": "The sticker appears to be a square approximately **3 inches by 3 inches** based on the ruler in the image. Here\u2019s the JSON representation:\n\n```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 1.660681962966919,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 1.6252644062042236,
"result": "charybdis indubitable"
}
}

0 comments on commit b7e4b7d

Please sign in to comment.