Saturday, April 26, 2014

Personality Test by

Psychometrics has always been an amazing field of Psychology and I have always been inspired to analyse the collected data trying to understand how human behaviour works, in Organizational Psychology, at least. Talentoday offers this win-win situation where they can collect data for their research and you, as the inquisitor, can explore how you behave especially in your professional career.

Radar Chart
The test is calculated by standardization of the norm making sure that across the test takers, the z-score and stern scores are used to interpret the results. Also, I noticed that there is a pattern in the questions which is usually frequent in these kinds of tests where validity is measured to predict the outcome based on the indicators. The rest of the data collection and calibration process are available in the website once you have finished the exam. The radar chart above is interactive.

The exam result is divided into 5 different clusters. The ordering are based on my results in descending order where my highest score is 6.5 and the lowest is 4. These clusters are the main personality dimensions which are proven to be essential in the professional world.
  • Dare
  • Excel
  • Manage
  • Adapt
  • Communicate
Also, the exam prepares a motivation scale - the things that drive you to achieve your goals and the things you may need to work on. Obviously mine is communication. Each of the clusters mentioned above earns an ordinal rating from 1 to 10 giving you more meaning to your score on a lower-level.

Motivation Scale

Lastly, you will be presented with your talent ID enlisting your empowering attitude that makes you unique.

Once done, you can even receive a PDF of the summary and detailed report based on your answers. After 6 months, you can reassess, as part of the test and retest method, if there's any change in your outlook. Although I have not tried comparing my results among my friends nor the people who have taken the test within the same organization. If there's an option to compare the results within the same industry or job specification, that's going to be interesting. I guess that they're still collecting more data regardless if these have not been scientifically gathered because of dependency and deviance from randomness.

Talentoday Personal Book by Adrian Cuyugan

Join and take your personality exam. Let's compare!

Friday, April 25, 2014

Starbucks Malaysia Half-Priced Frappuccino

In celebrating the new flavors of frappuccinos that Starbucks will be announcing soon, they are offering slashing the price in half of any frappuccino as long as you pay with your Starbucks Card every Fridays from 5 PM to 7PM until May 30, 2014.

Starbucks Half-Priced Frappuccino

Well, amid the hype, I would still prefer my Americano Grande.

Sunday, April 20, 2014

WordCloud Twitter Text Analysis on CSC using R

These past few days, I have been reading a lot on non-parametric tests on natural language as one of the current work I have been doing is about natural language processing via machine learning. This is very advanced and even the Data Science course offered in Coursera has not started yet so I am relying fully on what I have been reading on fora and some blog articles. Starting with acquiring tweets from Twitter requires some libraries because of the oAuth that they have implemented. If you have not installed these packages, you need them before you can reproduce my code.


Then, load the libraries, as usual.

Set the option of RCurl package to use the file you will be downloading later using CurlSSL option
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

Download the Twitter oAuth using RCurl and save it to your working envionment
download.file(url = "", destfile = "cacert.pem")
save(Credentials, file="credentials.RData")

For easier code reading, assign these links to your environment. Note that the consumerKey and consumerSecret objects are redacted because these are unique to your own twitter API. You need to create a developer account on Twitter to acquire your own codes.
reqURL <- ""
accessURL <- ""
authURL <- ""
consumerKey <- "MVdO2NE****************"
consumerSecret <- "oRZ9ff2yWvf9*************************c"
twitCred consumerSecret = consumerSecret, requestURL = reqURL, accessURL = accessURL, authURL = authURL)
twitCred$handshake(cainfo = "cacert.pem")

On your R console, after running the codes above, it will give you a link, somewhat similar like below:
> twitCred$handshake(cainfo = "cacert.pem")
To enable the connection, please direct your web browser to:
When complete, record the PIN given to you and provide it here: 

Copy and paste the link to your browser. Click on accept to allow twitter to provide access on API. Pause here because you input the code given by twitter. If it is successful, as usual, R being the introvert, it will not give you any message. On the other hand, if your code is wrong, it will give you an error like, Unauthorized. You can check if you can now access Twitter API

This is where you can start scraping the tweets. Twitter only allows a maximum of 1,500 tweets you can extract for a limited number of past days. If you want to get a constant feed, you may need to build a custom function to do it for you. For this analysis, let us just get the sample of the most recent tweets while I am writing these bunch of codes.
csc <- searchTwitter("@CSC", n = 1500)

Check the first tweet that was collected.

Or check the head and tail of your list.
head(csc); tail(csc)

Once you have checked that you have a good number of tweets, prepare your data and convert it to a corpus.
csc.frame <-'rbind', lapply(csc,
csc.corpus <- Corpus(VectorSource(csc.frame))

Also, convert the characters into a homogenized language by removing stop words, punctuation marks and numbers. Take note that I added a few more words to be removed because these are values from the category in the data set when we downloaded the tweets.
csc.corpus <- tm_map(csc.corpus, tolower) # Convert to lowercase
csc.corpus <- tm_map(csc.corpus, removePunctuation) # Remove punctuation
csc.corpus <- tm_map(csc.corpus, removeNumbers) # Remove numbers
csc.corpus <- tm_map(csc.corpus, removeWords, c(stopwords('english'), 'false', 'buttona', 'hrefhttptwittercomtweetbutton', 
'relnofollowtweet', 'true', 'web', 'relnofollowtwitter', 'april', 'hrefhttptwittercomdownloadiphone', 'iphonea', 
'relnofollowtweetdecka', 'via', 'hrefhttpsabouttwittercomproductstweetdeck', 'hrefhttpwwwhootsuitecom', 'httptcoqqqiaipk', 
'androida', 'cschealth', 'cscanalytics', 'csccloud', 'relnofollowhootsuitea', 'cscmyworkstyle', 'cscaustralia', 'hrefhttptwittercomdownloadandroid')) # Remove stop words

Prepare the document term matrix
csc.dtm <- DocumentTermMatrix(csc.corpus)
csc.dtm.matrix <- as.matrix(csc.dtm)

Or term document matrix, whichever you prefer.
csc.tdm <- TermDocumentMatrix(csc.corpus)
csc.tdm.sum <- sort(rowSums(as.matrix(csc.tdm)), decreasing = T) # Sum of frequency of words
csc.tdm.sum <- data.frame(keyword = names(csc.tdm.sum), freq = csc.tdm.sum) # Convert keyword frequency to DF

Plot the wordcloud.
cloudcolor <- brewer.pal(8, "Paired")
wordcloud(csc.tdm.sum$keyword, csc.tdm.sum$freq, scale=c(8,.2), min.freq=1, max.words=Inf, random.order=T, rot.per=.3, colors=cloudcolor)

Yes! It is CSC's birthday this April! In my next few posts, I will perform some sentiment analysis particularly on this data set where the false keyword is the mostly frequently and standing word have been used by users.

Monday, April 14, 2014

T-test of Parametric Test of Paired Data of the Null Hypothesis

I was working on a sample data last Friday and testing if it is really worth looking or spending time because someone has requested for an analysis that I have revised a lot of times and one of the frustrations that I have been encountering so far is to translate these statistical tests into business language. That is another topic that I need to rant on.

Anyway, like I mentioned that two separate data were collected. You would think that these as pre and post tests, in a sense but the data's background is that it was measured again after two weeks. I will start of in encoding these into R.

# Load ggplot2 package. Install this if necessary:
# install.packages("ggplot2")

# Creating Dataframe of Paired Data <- data.frame(Test = as.character(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), Score = c(0.54, 0.573, 0.575, 0.589, 0.639, 
    0.624, 0.64, 0.565, 0.694, 0.605, 0.632, 0.535, 0.556, 0.533, 0.516, 0.575, 
    0.57, 0.608, 0.58, 0.502))

As usual, I am a fan of the subset function. I could use the open square brackets, but I am very comfortable in using this; it takes the job done.
# Subset
test1 <- subset(, Test == 1)
test2 <- subset(, Test == 2)

Now that we have subset the data. Let us look how far they are to each other. Most people are intimidated looking at these boxplots. I will not discuss further how to read and interpret these but you can actually see the difference between the mean, which is the small dot in between the boxes, and the median, the straight line across each box.

My question is, are these two data sets statistically significant to say that they are different to each other?
ggplot(data =, aes(x = Test, y = Score)) + stat_boxplot(geom = "errorbar") + 
    geom_boxplot(aes(fill = Test)) + stat_summary(fun.y = mean, geom = "point", 
    aes(group = 1)) + ylab("Scores") + xlab("Test") + theme(legend.position = "none")

Of course, let us rely on the simple Student's test of Paired Data.
t.test(x = test1$Score, y = test2$Score, alternative = "two.sided", paired = T, 
    conf.level = 0.95)

##  Paired t-test
## data:  test1$Score and test2$Score
## t = 2.018, df = 9, p-value = 0.07432
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.00528  0.09268
## sample estimates:
## mean of the differences 
##                  0.0437

If you need me to compute these manually, I would love to. Starting from the standard deviation of differences of the two means to standard error, the degrees of freedom, until we arrive at the p-value according to t-test value. If I would plot this on a normal curve, the end point of t test value of 2.018 in a 9 degrees of freedom, the probability is 0.07.

Even on a 95% confidence level, I could say that they are not different to each other basing it on the p-value of more than 0.05. Why? Let's construct the hypothesis statement first.
H0 = Test 1 = Test 2, Test 1 and Test 2 are equal to each other on a two-sided tail
HA = Test 1 ≠ Test 2, Test 1 and Test 2 are not equal to each other on a two-sided tail
Given that the p-value of 0.07, where the significance level is at 0.05 cut off, we retain the null hypotheses. Therefore, we conclude that these two tests are equal to each other. With all of these languages I speak, what do they really mean?

If you look into both means or averages of the two data sets, they are different. 0.6044 and 0.5607, respectively. I can say that the request I am working on is not worth looking at into a lower level. This is where decision error takes in place of what would be the implication if I continue looking for answers or I just decide not to because it is not worth looking at. Decision Errors is another topic, maybe in the next few posts.

April 2014 Fitness Check

Last year, God knows how I ate trashy and experienced YOLO hence, I gained a lot; I lost track of hardwork I did trying to show 8 lines along my abdomen. By the time I came back from my Christmas vacation, I already knew that I needed to get back on track. This time, I know what to do, the diet and exercise with some modification because I introduced BCAAs into my supplements apart from the usual protein blend and OEP on the side when needed.

In less than two month, I am going back to the Philippines for a short vacation straight to the white beaches of Boracay and yes, I need to prepare for this. With the time given, I don't think I can achieve the body I am rooting for. Nevertheless, I'll look decent, at least in my opinion.

What do you think?

Posted from WordPress for Android