I have a
database of over 5 million job advts , downloaded over the past 6 / 7 years
from various job portals of India
Each job advt database consists of :
Ø Advt
ID
Ø Designation
( being advertised )
Ø Company
Name ( Advertiser )
Ø Job
Description
Ø Desired
Profile
Ø Compensation
Offered
Ø Experience
( desired ) – Years
Ø Industry
Type
Ø Education
Quali ( Min )
Ø Location
( Posting City )
Ø Keywords
Ø Advt
Posting Date
Ø Expiry
Date
Some years
back , ( when our website , www.World-Wide-Jobs.com , was up
and running ) , we had developed a feature to analyze this database and display
the findings visually , in different ways
We were displaying PIE-CHARTS of :
Ø Industry-wise
Jobs
Ø City-wise
Jobs
You will observe that , with a much larger database
available now , it is possible to analyze / display the “ No of Jobs “ , in
many more ways
Not only that , it should be possible to
analyze this huge database to predict the future expected PATTERN of the
occurrence of jobs , in many different ways !
At any given time , the number of jobs getting
advertised , is an important Economic Indicator
If economy is booming and company Order Books
are getting fatter , then more jobs will get advertized – and vice-versa
Hence , a time-series analysis of the no of new
jobs getting posted on job portals , has a straight line relationship
with the state of the economy ( a high co-efficient of correlation )
Apart from that , can a Data mining of 5
million jobs , answer ( even partially ) , the following questions ?
Ø Who (
which Companies ) are advertizing and when ?
Ø What
jobs / vacancies / positions are being advertized ?
Ø What
is the frequency with which a particular job gets advertized ? By entire
industry ? By a given Company ?
Ø Which
regions / cities have max / min no of new jobs ?
Ø What
are regional disparities due to ?
Ø Which
Industries are advertising most – creating most jobs ?
Ø What
Edu Qualifications are in max demand ?
Ø What kind
of jobs demand what kind of Edu Qualifications ?
Ø What
is the level of co-relation between , Position and the years of Experience
demanded ?
Ø For
identical positions being advertized , how much do “ Job Descriptions / Desired
Profiles “ differ, from company to company ?
Ø Are
there significant differences in the “ No of years of Experience “ being
demanded , for identical positions ?
Ø What
is the probability of finding the “ Keywords “ in “ Job Description / Desired
Profile “ ?
Ø What
is the extent of duplication ( redundancy ? ) between , “ Job Description “ and
“ Desired Profile “ ?
Ø What
percentage of Advts fail to make any mention of , Compensation Offered ?
Ø When a
company posts an advt for same / identical position , at different points of
time , are there any differences in values ( fields ) ?
Ø From
an analysis of all the advts posted by a given Company ( over past 7 years ) ,
can any conclusion be reached as to the changing nature of that company’s
business (by co-relating the “ Skills related Keywords “)?
Ø Can
the algorithm predict what job a company will advertize next – and when ?
Ø Is
there any correlation between , “ Designation / Position “ and the “ Keywords “
?
Ø From
analyzing this huge data , can software auto-generate , a complete / editable
job advt , as soon as a Recruiter simply types the “ Designation / Position “ ?
I believe , so far , no one has undertaken such a
Data mining project
If carried out diligently , I am sure , the
outcome would be of immense benefit to :
Ø HR
Managers
for Manpower
Planning / Compensation Planning
Ø Recruiting
Managers
for framing Man
Specifications / Job Description Manuals
Ø Educationists
for deciding
what Edu Quali are in demand and tailor the Courses
Ø Students
to figure out
what “ Skills “ are in demand by Industry and prepare
Ø Planning
Commission ( NITI Aayog )
for allocating Resources
to States / Regions , based on imbalances
Ø HRD
Ministry
For long term
Macro-Planning in respect of Education
Ø National
Skills Development Commission
for chalking out
Skills Development Programs in collaboration with Companies / Industries
If
undertaken – and executed seriously – then this Data mining project has the potential to place
Ministry of Statistics and Programme Implementation ,
on the Centre-Stage of National Education
Planning Scenario
What can / will such a project yield ?
Without exaggerating , it would be safe to assume that , this
vast database of job advts would contain :
Ø 50 million phrases / sentences
Ø 500 million words
Obviously , each word / phrase / sentence , is nothing more than
a
“ Database
of Intentions “ of the Employer
Companies
( to borrow from John Battelle’s well-researched book about
Google )
Our goal shall be to make this ( Data mining Algorithm ) a
dynamic / continuous “ Process “ , so that , we can measure the changing nature
of these “ Intentions “ , over a long , long period
And we must enable a “ Researching Visitor ( of web site ) “, to
benefit from these trends / patterns
Even though 5 million job
advts may contain 500 million “ words “ , these are not Unique
Most of these are used
again and again , hundreds or thousands of times
Thru data mining , it is
not difficult to compute their “ Frequency of Usage “
And then , these
frequencies can be graphically plotted against any particular time-period
Such Graphical
Representations can be further broken up by ,
Ø City Names
Ø Company Names
Ø Industry Names
Ø Function Names
Ø Designations ( Vacancy Names ).. etc
And such graphical analysis
can be done , not only for “ Keywords “ but even for “ Key Phrases “ and “
Sentences “ !
Take a look at this project paper ( NOT ENCLOSED )
It is all about data mining of some 150 million records (
location points ) and about uncovering “ trends / patterns “ of physical
movements of 300 human volunteers , over a “ period of time “
I quote from article in Times of India ( 19 July 2013 ) :
“ ..the first system of its kind to predict long term human
mobility in a unified way , parse the data. " Far Out " does not need
to be told exactly what to look for --- it automatically discovered
regularities in the data “
“ Do you know precisely where you’ll be 285 days from now at 2
pm ?
Researchers have developed a new tracking software that can tell
you exactly where you will be on a precise time and date , years into the
future “
What we want to do with 5 million job advts database , is quite
similar, viz ;
predict ,
WHO (
which Company / Industry ) , will advertize
WHAT (
vacancies / positions / designations ), and
WHEN ( time )
I am talking about developing an “ Expert System “ , thru
discovery of specific “ Co-relations “ amongst various Data Fields of 5 million
job advts
Eg :
Ø What
is the Co-relation between , any given
Ø “ Designation / Vacancy-Name /
Advertized Position ,
and
Ø Educational
Qualifications ?
Here
are some examples :
Ø Any designation
such as “ Production Manager “ would call for an “ Engineering Degree /
Diploma “ ( but never a CS / CA )
Ø Any
designation in “ Finance Function “ will require,
· B Com
· M Com
· CA etc
But never a BE(M ) / BE
(Chem )
Ø Any
designation at Manager level will call for a minimum experience of 5 years (
but never a Fresh Graduate with NIL experience )
Ø MBA /
BBA / MMS etc are the most preferred Edu Qualifications for positions in
Marketing
Ø No vacancy
in an Automobile Manufacturing Company , will call for a degree in
Pharmaceutical
Ø No
Electrical Machinery Manufacturing company will ever demand a Medical Degree
(MBBS )
To a human mind , these ( rules ) are so obvious !
But , no human mind can write-down ALL of such RULES , in 2
minutes ! – something that your Data mining Software can – and will – do in 5
seconds !
All that you need , after computing “ Frequencies of Occurrences
“ , is to :
Ø Plot
the Co-efficients
of Co-relations between
various Fields ( of job advts )
Ø Compute
Probabilities for
each and create hundreds of Probability
Tables
And , since a thousand new job advts are getting added to our
Job Advt Database , daily , the SAMPLE SIZE is perpetually increasing – thereby
, increasing the Accuracies of your Predictions !
Having done this , imagine the following scenario :
Recruitment Officer of Wipro , comes to our “ Post Job “ page
and , in the field for “ Designation “ simply types ,
“ Business Analyst “
And Presto !
The entire Job Advt Form gets auto-filled , with MOST PROBABLE
values !
Would not that amaze her ?
All that our software has done is analyzed job advts of all “
Software Companies “ ( an Industry ),– and of WIPRO – for the position of
Business Analyst and filled in the most probable values
This is no rocket science !
We had actually , partially attempted it – albeit in a crude way
– in our earlier web site ,www.IndiaRecruiter.net
What surprises me is , how come no one has attempted this so far
!
Especially , Naukri / TimesJobs / MonsterIndia , who have
accumulated millions of job advts !
Anyway , the fact that they have , so far , ignored this
Line of Examination , will work to the advantage of
Ministry of Statistics and
Programme Implementation
– making YOU the very first
person in the entire world to come up with a PREDICTION MODEL in the area of
JOBS
However ,
without applying some simple data mining tool , it would not be possible to
answer the following questions :
Where is the greatest decline of jobs being
advertized ?
How much is the percentage decline ?
Ø In which
Industry ?
Ø In
which Company ?
Ø In
which City ?
Ø In
which Region ?
Ø In
which Skills ?
Ø For
which Positions ?
Ø For
which Education Levels ? ………… etc
With a data
mining tool , such individual graphs could emerge ( within fraction of a second
) at the click of a button !
One could even co-relate these graphs with
other ,
publicly available statistical data such
as :
Ø IIP
( Index of Industrial Production )
Ø Stock
Market Index
Ø Currency
Exchange Rate ( eg; declining Rupee )
Ø Decline
in GDP / Increasing Fiscal Deficit
Ø CAD (
Current Account Deficit )
Ø Foreign
Investments
Ø Primary
Bank Rates of RBI…………………………….etc
With
proper co-relations , one could
even predict how much
the job market will further shrink , over the
next 6 months ! or
grow ?
Such”
Predictive Model of Job Market “, would be of immense interest to , not only
the economists but also to the HRD Ministry / Planning Commission / Educational
Institutions and of course the students themselves
hemen
parekh
Marol
, Mumbai , India
(
M ) +91 - 98,67,55,08,08