Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Dec 14, 2024
1 parent 4ac15cf commit 88f72cc
Show file tree
Hide file tree
Showing 2 changed files with 127 additions and 33 deletions.
54 changes: 21 additions & 33 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated December 13, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated December 14, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand Down Expand Up @@ -91,8 +91,6 @@ <h2>Counting</h2>
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>
Expand All @@ -105,6 +103,8 @@ <h2>Counting</h2>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>14.0%</b> of the time.</p>
Expand All @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>9</pre>
<pre>8</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -153,15 +153,15 @@ <h2>Document OCR</h2>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>
<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
Expand Down Expand Up @@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.45, 'y': 0.4, 'width': 0.25, 'height': 0.4}</pre>
<pre>{'x': 0.35, 'y': 0.35, 'width': 0.18, 'height': 0.25}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -270,7 +270,7 @@ <h2>Graph Understanding</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -286,22 +286,10 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"A": {
"quantity": 20,
"price": 10
},
"B": {
"quantity": 25,
"price": 20
},
"C": {
"quantity": 30,
"price": 30
},
"D": {
"quantity": 35,
"price": 40
}
"A": { "quantity": 20, "price": 10 },
"B": { "quantity": 25, "price": 20 },
"C": { "quantity": 30, "price": 30 },
"D": { "quantity": 35, "price": 40 }
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -417,9 +405,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>The image contains several cars labeled with red bounding boxes. However, there is at least one unannotated vehicle visible on the right (the white car in the foreground).
<pre>To evaluate the completeness of the annotations, I analyzed the red bounding boxes and checked for visible cars that are not enclosed within a bounding box. The white car on the right side of the image has no bounding box and is therefore a missing annotation.

Here is the JSON you requested:
Here is the response:

```json
{
Expand Down Expand Up @@ -479,7 +467,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the ruler in the image, the square sticker's sides measure approximately 3 inches.
<pre>Based on the ruler in the image, the square sticker appears to be approximately 3 inches in both length and width.

```json
{
Expand Down Expand Up @@ -576,8 +564,6 @@ <h2>Handwriting OCR</h2>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>
Expand All @@ -586,6 +572,8 @@ <h2>Handwriting OCR</h2>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
Expand Down Expand Up @@ -740,14 +728,14 @@ <h2>Easy Captcha</h2>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
Expand Down Expand Up @@ -794,14 +782,14 @@ <h2>Easy Captcha with Persuasion Attack</h2>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

<div class="summary_square summary_square_green"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>86.0%</b> of the time.</p>
Expand Down
106 changes: 106 additions & 0 deletions results/2024-12-14.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 1.7811729907989502,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.007870000000000002,
"pass_fail": "Fail",
"response_time": 4.351912021636963,
"result": "8"
},
"document_ocr": {
"score": 0,
"success": false,
"price": 0.0086,
"pass_fail": "Fail",
"response_time": 2.772183656692505,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 6.870084285736084,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 2.619936227798462,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015290000000000002,
"pass_fail": "Pass",
"response_time": 3.0370709896087646,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.2677595628415299,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 2.4346256256103516,
"result": "{'x': 0.35, 'y': 0.35, 'width': 0.18, 'height': 0.25}"
},
"graph_understanding": {
"score": 0.99,
"success": false,
"price": 0.01031,
"pass_fail": "Fail",
"response_time": 2.531806707382202,
"result": "```json\n{\n \"A\": { \"quantity\": 20, \"price\": 10 },\n \"B\": { \"quantity\": 25, \"price\": 20 },\n \"C\": { \"quantity\": 30, \"price\": 30 },\n \"D\": { \"quantity\": 35, \"price\": 40 }\n}\n```"
},
"color_recognition": {
"score": 0.9568627450980393,
"success": false,
"price": 0.008870000000000001,
"pass_fail": "Fail",
"response_time": 3.1713528633117676,
"result": "```json\n{\n \"R\": 80,\n \"G\": 0,\n \"B\": 128\n}\n```"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.01692,
"pass_fail": "Fail",
"response_time": 2.721541404724121,
"result": "To evaluate the completeness of the annotations, I analyzed the red bounding boxes and checked for visible cars that are not enclosed within a bounding box. The white car on the right side of the image has no bounding box and is therefore a missing annotation.\n\nHere is the response:\n\n```json\n{\n \"missing\": 1\n}\n```"
},
"measurement": {
"score": 0.8571428571428572,
"success": false,
"price": 0.00949,
"pass_fail": "Fail",
"response_time": 3.791184663772583,
"result": "Based on the ruler in the image, the square sticker appears to be approximately 3 inches in both length and width.\n\n```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 2.3948910236358643,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 1.37044095993042,
"result": "charybdis indubitable"
}
}

0 comments on commit 88f72cc

Please sign in to comment.