Emails (Others): RE: USEFUL

Thursday, 28 September 2023

RE: USEFUL

Manoj,

I will follow up with Sandeep as soon as he reaches office

At this stage , it is worth considering that , a large part of that 40 GB , is NOT word files

In blogger , we have following files which are NOT word :

# 1500 + photos

# 1,000 + Poem Videos / Poems embedded in Pictures ( includes even ENGLISH poems which are embedded in PICTURES / VIDEOS )

# 650 + Handwritten letter IMAGES ( scanned ) – Sweet Memories

# Hundreds of “ scanned “ handwritten notes

# Scans of dozens of JOB ADVTS ( released by 3P in news papers for its clients )

# E Mail database of MPs / MLAs

IN THE FIRST INSTANCE, WE SHOUD REMOVE / ELIMINATE THESE BEFORE STARTING THE PARSING PROCESS

This elimination would substantially bring down the GB

Sandeep > Could these be eliminated BEFORE sending to Manoj ?

The WORD files which are likely to contain “ Keywords / topics “ ( my areas of knowledge ) are mainly :

# Blogs

# Notes to Colleagues ( only TYPED portions )

# Linkedin posts

# Letters to L&T Employees

# Notes to colleagues ( only those which are TYPED / there are many which are SCANNED IMAGES of my handwritten notes )

# L&T Story

# E Mails to Colleagues

# E Mails to Others

Regards,

hemen

Subject: RE: USEFUL

Hi Hemen Sir,

I have not received any credentials or download link yet.

Since 40 GB is too much to download. I am not sure if the files are in Gmail and credentials will help.

May be files are available to download from Google Drive or something. Better to provide download link or access to Google Drive to access the individual folders/files.

Sandeep, please share the download link or share me access Google drive folders/files.

Once we download some subset of data, we can definitely try the solutions recommended by ChatGPT and see keyworks output if it helps.

Seems command GREP may not be useful since it provides line of text where particular keywords are found. However we can try python NLP option.

Regards,

Manoj Hardwani

Subject: USEFUL

Manoj,

I suppose you managed to get from my colleague Sandeep, Gmail credentials , to enable you to download 40 GB of back up of my blogger site ( from which, Personal.ai has generated 98,000 memory blocks )

When you finish parsing these files, it is quite likely that it throws up a MILLION words ( hopefully, in descending order of frequency of occurrence )

Question :

From this huge list, how to eliminate > verbs / adverbs / adjectives / prepositions / common nouns / hyphenations etc., so that what few ( may be 1,000 ? ) remain are “ Keywords / topics “ ( denoting my “ areas of knowledge “ ) ?

I posed this question to ChatGPT which replied as follows :

Question :

From a file containing a million words, is there any software which can eliminate words which are > verbs / adverbs / adjectives / prepositions / hyphenations / common nouns etc ?

Answer :

Yes, there are several tools that can help you eliminate certain types of words from a large text file. One option is to use a command-line tool called "grep," which is available on most Unix-like operating systems (including Linux and MacOS). "grep" allows you to search for patterns in text files and extract only the lines that match the pattern.

For example, you can use "grep" to extract only the lines that contain verbs by using a regular expression that matches verbs. Similarly, you can use regular expressions to match other types of words, such as adverbs, adjectives, prepositions, hyphenations, or common nouns.

Here is an example of how you can use "grep" to extract only the lines that contain verbs from a text file:

Copy code

grep -E '\b(am|are|is|was|were|be|being|been)\b' input.txt > output.txt

This will create a new text file called "output.txt" that contains only the lines from "input.txt" that contain verbs. You can use similar regular expressions to extract other types of words.

Another option is to use a programming language like Python to process the text file and eliminate certain types of words. There are several libraries available for natural language processing in Python, such as the "NLTK" (Natural Language Toolkit) library, which can help you identify the part of speech of each word in a text. You can use the "NLTK" library to tag the words in the text file with their part of speech, and then filter out the words that are not verbs, adverbs, adjectives, prepositions, hyphenations, or common nouns.

I hope this helps! Let me know if you have any other questions.

I hope this helps you

Regards,

Hemen

Thursday, 28 September 2023

RE: USEFUL

No comments:

Post a Comment