Sandeep,
Can you pl clarify to Kishan ?
hcp
Subject: Re: FOR YOUR EXPERIMENTATION
I am excited about the opportunity to contribute to the success of this project.
When discussing parsing, what kind of file do you wish to parse? (for e.g. Plain text(txt), csv, pdf, docx, etc.)
On Tue, 27 Dec 2022 at 10:59, Hemen Parekh <hcp@recruitguru.com> wrote:
Kishan,
It was nice talking to you a few minutes ago
I am glad to know that the U / I which you developed ( for my Digital Avatar ), is prepared in HTML and JavaScript
That means :
If you succeed in integrating this U / I with “ QnA ( Question and Answer ) “ API of Personal.ai, then we can actually TEST how well it works by posting some questions and getting relevant answers
In this regard, find below the email received from Suman Kanuganti ( Founder , Personal.ai ) :
Api key is in the settings, follow doc here
https://docs.personal.ai/docs/
memory-api
Message API is in the same link on the left. You can find the api key to use in Hemen personal ai account in settings. Here is direct link for message api.
https://documenter.getpostman.
com/view/13134732/TzscpSjZ# 4acaf2f5-d5a7-46a6-a4c4- 06e015cbc959
Next :
How to parse 40 GB word file to :
Ø Separate words
Ø Arrange these words in the descending order of “ Frequency of Occurrence “ ( I suppose the frequency number will appear within brackets, next to each word
Ø Remove words representing : Verbs – Adverbs – Preposition – common nouns – hyphenation etc ( this may need to be done manually )
What final list emerges , could be considered “ Keywords / Topics ( my areas of knowledge )
Following is the answer that ChatGPT provided :
Yes, there are several tools that can parse a large file of Word documents and arrange the parsed words in a descending order of frequency. One option is to use a command-line tool called "sort," which is available on most Unix-like operating systems (including Linux and MacOS). To use "sort," you can first convert the Word documents to plain text files (using a tool like "catdoc" or "antiword"), and then use the "sort" command to sort the words in the text files by frequency.
For example, you can use the following command to convert a Word document to a plain text file:
Copy code
catdoc input.doc > output.txt
Then, you can use the following command to sort the words in the text file by frequency:
Copy code
sort output.txt | uniq -c | sort -nr
This will output a list of words and their frequencies, sorted in descending order.
Another option is to use a programming language like Python to parse the Word documents and count the frequencies of the words. There are several libraries available for working with Word documents in Python, such as the "docx" library or the "python-docx" library. You can use these libraries to extract the text from the Word documents, split the text into individual words, and then use a dictionary to count the frequencies of the words. Finally, you can sort the dictionary by value to obtain a list of words and their frequencies in descending order.
I hope this helps! Let me know if you have any other questions.
Is above mentioned “ Process Instructions “ sufficient for you to help me carry out this “ listing of keywords “ ?
Regards,
Hemen Uncle
No comments:
Post a Comment