-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure usage of Unstructured in playground v2 #264
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
The error got: > ⨯ Error: Setting up fake worker failed: "Cannot find module '/SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/pdf.worker.mjs' imported from /SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/node_modules_pdfjs-dist_build_pdf_mjs_c60773._.js". at /SOME_PATH/giselles-ai/giselle/.next/server/chunks/ssr/node_modules_pdfjs-dist_build_pdf_mjs_c60773._.js:12706:42 digest: "3292027865"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rindrics I asked a question and shared my thoughts in the comments. ✍️
@@ -80,6 +80,7 @@ | |||
"next-auth": "^5.0.0-beta.20", | |||
"next-themes": "0.3.0", | |||
"openai": "4.64.0", | |||
"pdf-lib": "1.17.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO:
The file size is large, and I’m hesitant to introduce it solely for the purpose of counting the number of pages in the PDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I'll consider another approach simply obtaining number of pages of the given PDF.
type ServiceOptions = UnstructuredOptions | VercelBlobOperationType | undefined; | ||
|
||
function getNumPages(pdf: PDFDocument) { | ||
return pdf.getPages().length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't information like the number of pages in a PDF or cost calculation references included in the response from the Unstructured API, without using pdf-lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Although I should have written this as PR author comment...,) yes, PartitionResponse
does not provide it.
Firstly I attempted to obtain PDF page from the response, but found that each element in elements
field was text block, not a page.
Thus calculating the length of elements
will return the number of text elements, not page.
Summary
Count API call for Unstructured with number of PDF pages processed
Related Issue
this PR closes #259
Changes
prev/beta-proto
agent to resolve build errorTesting
metrics below are arrived to the o11y backend ✅
numPages
)strategy
)Additional Inoformation
the total cost,
unit price
*number of pages
, will be calculated on the analytics platform (not in this app)