Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do. There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Now as I approach my 90th birthday ( 27 June 2023 ) , I invite you to visit my Digital Avatar ( www.hemenparekh.ai ) – and continue chatting with me , even when I am no more here physically

Thursday 28 September 2023

PARSING ENTIRE BACK-UP OF MY BLOGGER SITE

 Manoj,

 

 

My colleague Sandeep will send to you this entire back up, today morning

 

For parsing ( to find Keywords ), I wonder if the following suggestion from ChatGPT will help . If it does, then please send me the DESCENDING ORDER file, from which, I will try to remove those word which have no relevance with my “ Areas of Knowledge “

 

Regards

 

Hemen

 

 

My Question to ChatGPT :

 

Are you aware of any " Text Parsing " software which can parse a very large file of WORD documents and arrange the parsed words in a descending order of frequency ?

 

 

Answer given by ChatGPT :

 

Yes, there are several tools that can parse a large file of Word documents and arrange the parsed words in a descending order of frequency. One option is to use a command-line tool called "sort," which is available on most Unix-like operating systems (including Linux and MacOS). To use "sort," you can first convert the Word documents to plain text files (using a tool like "catdoc" or "antiword"), and then use the "sort" command to sort the words in the text files by frequency.

For example, you can use the following command to convert a Word document to a plain text file:

Copy code

catdoc input.doc > output.txt

Then, you can use the following command to sort the words in the text file by frequency:

Copy code

sort output.txt | uniq -c | sort -nr

This will output a list of words and their frequencies, sorted in descending order.

Another option is to use a programming language like Python to parse the Word documents and count the frequencies of the words. There are several libraries available for working with Word documents in Python, such as the "docx" library or the "python-docx" library. You can use these libraries to extract the text from the Word documents, split the text into individual words, and then use a dictionary to count the frequencies of the words. Finally, you can sort the dictionary by value to obtain a list of words and their frequencies in descending order.

I hope this helps! Let me know if you have any other questions.

No comments:

Post a Comment