Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure usage of Unstructured in playground v2 #264

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

Rindrics
Copy link
Contributor

@Rindrics Rindrics commented Dec 20, 2024

Summary

Count API call for Unstructured with number of PDF pages processed

Related Issue

this PR closes #259

Changes

  • added implementation to API call for Unstructured in playground v2
    • removed instrumentation-related codes from action.ts to clarify business logic
  • updated the corresponding code in prev/beta-proto agent to resolve build error
    • but the code is not refactored because it won't be used

Testing

metrics below are arrived to the o11y backend ✅

  • number of pages(numPages
  • request count
  • type of API (strategy)

image

Additional Inoformation

the total cost, unit price * number of pages, will be calculated on the analytics platform (not in this app)

Copy link

vercel bot commented Dec 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
giselle ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 20, 2024 6:00am

The error got:
> ⨯ Error: Setting up fake worker failed: "Cannot find module '/SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/pdf.worker.mjs' imported from /SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/node_modules_pdfjs-dist_build_pdf_mjs_c60773._.js".
    at /SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/node_modules_pdfjs-dist_build_pdf_mjs_c60773._.js:12706:42
digest: "3292027865"
@Rindrics Rindrics changed the title Measure usage of Unstructured at playground v2 Measure usage of Unstructured in playground v2 Dec 20, 2024
Copy link
Member

@shige shige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rindrics I asked a question and shared my thoughts in the comments. ✍️

@@ -80,6 +80,7 @@
"next-auth": "^5.0.0-beta.20",
"next-themes": "0.3.0",
"openai": "4.64.0",
"pdf-lib": "1.17.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO:

The file size is large, and I’m hesitant to introduce it solely for the purpose of counting the number of pages in the PDF.

https://www.npmjs.com/package/pdf-lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll consider another approach simply obtaining number of pages of the given PDF.

type ServiceOptions = UnstructuredOptions | VercelBlobOperationType | undefined;

function getNumPages(pdf: PDFDocument) {
return pdf.getPages().length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rindrics

Isn't information like the number of pages in a PDF or cost calculation references included in the response from the Unstructured API, without using pdf-lib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although I should have written this as PR author comment...,) yes, PartitionResponse does not provide it.

Firstly I attempted to obtain PDF page from the response, but found that each element in elements field was text block, not a page.
Thus calculating the length of elements will return the number of text elements, not page.

@Rindrics Rindrics marked this pull request as draft December 23, 2024 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Measure usage of Unstructured in playground v2
2 participants