Notice: This site contains creative material intended to make an impression on you
Welcome to my GitHub Pages.
This site contains links to visit GitHub Pages that share my project work, R Shiny apps, and presentations. Allow me to introduce myself. I am part of the competitively selected Booz Allen Hamilton Data Science Cohort and have been working in the data science space for over 14 years. I am a data storyteller who provides answers to client questions through data in an engaging and accessible approach. Listen to some data stories on my YouTube Channel for Data. Four C's describe me: Creative, Critical Thinking to deliver a quality product on time, Collaborative, and Contribute to the client and the organization.
I have a rich and diverse background of professional and self-driven experiences with qualifications in conducting technical analysis, running the business operations of a software development company, and connecting the dots between technical and business people. I am a speaker, artist, author, and Coursarian (lifelong learner). Technically, I am multilingual and can conduct analyses and program on numerous platforms.
Visit Some of My Projects
Here is a sample of projects I have enjoyed completing for various users including clients, readers, and friends. They represent a nice spectrum of different analytic methods.
Here is a k-means cluster analysis I built to provide an interactive tool that lets users design marketing campaigns. It is a capstone project for my book Introduction to R for Business Intelligence. This link to the codebase provides you insight to all the code for the book. For instance, Chapter 5 focuses on clustering techniques. In Chapter 4, I teach linear regression, and the reader later has the chance to build an online app to show a very simple prediction tool based on a model they also create.
Something worth mentioning is that there is a lot of focus on prediction these days, but what about forecasting. It is a challenging technique in and of itself, and frankly, it is not readily accessible even at the graduate school level. I don’t see it presented very often in discussion groups or meetups. How often are we challenged with time-based problems that could be served well by forecasting techniques? Because of that, Chapter 6 of the book specifically looks at the topic of Time Series Analysis. One may argue that it does not belong in an introductory book. It is true that this is a challenging topic, but my philosophy is that an introductory awareness of a difficult topic is better than perfect ignorance of it.
One of my favorite machine learning projects was the use of random forests to use smartphone sensors as an early type of wearable technology. Predicting Human Activity Using Practical Machine Learning Techniques provides a full explanation of what I did to clean, explore, and model an open data source to evaluate the predictive power of a reproducible model. I have also provided presentations on how an artificial neural network does its magic
The value of text-based information continues to increase. Alan Turing (1950) opens his influential article "Computing Machinery and Intelligence" with the statement, "I propose to consider the question, 'Can machines think?'" (p. 433). This Turing Test has become a basis of natural language processing. My job in this project was to make sense of over 3 million tweets, blogs, and news stories to build an online predictive model application giving a user a prediction of what word they will use next based on their input.
How did it go? Well, this project was modeled on the SwiftKey technology. When I did this work, my model had an accuracy of about 20%. This compared well to the accuracy of SwiftKey at approximately 30-40%. Read all the details of how I developed the model in the paper Natural Language Processing: A Model to Predict a Sequence of Words presented during the MODSIM 2015 World Conference.
Another type of text analysis involves text mining to find associations and patterns hidden within documents. In one project, I discerned trends from 5-years of workshop reports to assess predictions made by that group. A summary explanation was presented at the MODSIM 2014 World Conference as well as the ITEC conference in Germany. It shows the workflow and variety of analysis methods used. The project was fun and most importantly - it answered key business questions. More details on the analysis are available in the full paper I presented at MODSIM.
Exploratory Data AnalysisJohn Tukey wrote the famous book Exploratory Data Analysis as well as this little known report for the Army called "Exploratory Data Analysis: Past, Present, and Future". He highlights the importance of understanding the data - even if precise results cannot be discerned. Exploratory data analysis is a fascinating area to me because it blends the art of conversation, the skills of data science, and aspects of the domain studied. It is a structured process - it really is - where you can discover information about the data characteristics and relationships among variables. Here are just a couple of my projects showing that wonderful blend.
When severe weather strikes, it can cause a great amount of damage. This exploratory analysis provides my look at open source storm data from the National Oceanographic and Atmospheric Administration collected from 1950 to 2011. It looks at 48 categories of severe weather events and their effects on population health and economic indicators. The idea behind this work was to see if an exploratory analysis could provide information to state, county, and city planners so they can prioritize resources and improve their emergency preparedness programs.
Preliminary results of this analysis indicate that weather impacts abide by the Pareto Principle - a small number of events (between four and eight) are responsible for approximately 80% of weather impacts. This insight allows planners to focus on the weather events most likely to occur in their region.
Some of my analysis is to help friends and colleagues. I built this quick-look online exploratory app to share some data I found about how data science salaries are affected by the type and location of an organization. This was more fun to share than a link to a report. Other exploratory work has included my use of Hadoop technologies like Hadoop Streaming on a Cloudera platform to cull through millions of lines of open data about economic development. I did this to support my local entrepreneur community inform the group on business and location trends.
Process Mining and Simulation
Process mining is a form of process modeling introduced by Prof Wil van der Aalst in the Department of Mathematics & Computer Science at the Eindhoven University of Technology. I took his course and it was amazing. It blends the disciplines of data mining with model-based process analysis to discover insights about processes. In this project, I programmed a Python simulation to use create synthetic data for use in open source process mining software. You can read more about it in Improve Test Results by Piloting: How to Simulate Field Data in the Lab that I presented at the MODSIM 2016 World Conference.
I also build models in tools like AnyLogic to conduct "what-if" analysis and then optimize possible decisions. I like how it provides a realism to simulation by incorporating process-oriented discrete event simulation with the actions of individuals in agent-based simulation and wrapping it all in a global environment that allows for abstractions through system dynamics.
Creating Community and Helping Others
My favorite secular quote is The Man in the Arena by Theodore Roosevelt. I believe in staying active in my community - whether that is down the street or around the world. I just launched an initiative to build a community of data science managers to help one another when management challenges arise. The site Data Science Management is in beta and I am inviting data science professionals to help as thought leaders for the community. Things are moving nicely and the community will officially launch this fall. Perhaps you would like to join this community as a thought leader? Yes! I am interested in helping grow our data science professional managers.
Community takes on multiple dimensions for me. I founded try.py - Learn Python to teach people how to code Python. Further from home, my activities include peer-to-peer mentoring graduate students and providing insights in feature interviews. I have also designed predictive analytic hackathons to recruit college students and training new learners in LinkedIn groups. This presentation describes a fun hackathon I designed where participants predict which aliens are friendly in a space exploration scenario. The slides were created in R Slidy and are part of a full package that provides the data, R and Python code, and instructions to run the hackathon. I am also a member of the Leadership Committee of MODSIM World Conference. I am proud to serve as the 2017 Deputy Conference Chair and design an event that highlights the benefits and uses of simulation and how they complement data analytics.
Have a question or comment about any of the work on these Pages? How can I help? Let's talk.