WeRateDogs @ Twitter 数据探索
WeRateDogs @ Twitter 数据探索
该项目来源于Udacity Data Analyst Advanced课程的第二个项目,目标是对不同数据来源(既有数据、http数据和API json数据)进行收集评估清洗,通过可视化方法发掘数据有价值的信息。收集的数据包括推特的基本信息,以及利用神经网络针对推特图片进行的内容预测(预测图片中是什么品种的狗)。
对于数据可视化一些有意思的结果,特此在这里与大家分享。
WeRateDogs发推数量变化
从图中可以看出2015年11月该推主开始经营该推特账号,第二个月的发推数量达到了顶峰,平均每天超过了12条原创推特(开荒阶段着实辛苦)。在此后逐渐下降,在2016年第二季度开始就趋于平稳,平均每天约2~3条,并一直维持。
W...
Click to read more ...
Git and GitHub
Git: Commands
Create Repo
git init initial a folder as a new repo, tracking all modification under it
.git/config configure only for this repo
git clone <path> [<new dir name>]
cannot create nested repo, so do check your pwd
git status
...
Click to read more ...
Linux Command Line Basic and Shell
Go Into the Shell
Environment: VirtualBox + Vagrant + Git Bash
Terminal and Shell:
Terminal (emulator) displays your keyboard input and the output, but itself do not know how to handle your input
Shell will accept the input transferred from Terminal, run the command and then send the output to Terminal to display
Default shell ...
Click to read more ...
Sequence Models - Deep Learning Specialization 5
deeplearning.ai by Andrew Ng on Coursera
W1: Recurrent Neural Networks
Building Sequence Model
Notation:
Model Architecture:
Why standard network works not well?
Inputs, outputs can be different lengths in different samples
Doesn’t share features learned across different positions of text
CNN learns f...
Click to read more ...
Machine Learning - Andrew Ng @ Coursera
Week 1: Introduction
Application of ML
Database mining
large dataset growth of automation/web
Application can’t program by hand
handwriting recognition, NLP, computer vision
Self-customizing programs
recommendations
Understanding human learning (brain, real AI)
Definition of ML...
Click to read more ...
Python Packages for Data Science
This blog is created to record the Python packages of data science found in daily practice or reading, covering the whole process of machine learning from visualization and pre-processing to model training and deployment.
This post is kept updating.
Visualization
Scikit-plot
The quickest and easiest way to plot machine learning result, bui...
Click to read more ...
15分钟创建个人博客 @ GitHub Pages
作为新博客的第一篇文章,先写写我是如何创建这个博客的。与标题不同,我花了N倍于15分钟的时间来开启这个博客,而秉持着“解决核心问题,避免额外认知负担”的思路,最终采取了这一套简单稳定的方案。这也符合接触新事物时“粗浅–深入–精炼”的认知过程。
基本思路和准备条件
利用GitHub Pages项目免费的博客生成系统,在GitHub Repository中建立必要的网站文件结构,最终通过StackEdit以Markdown语言撰写博文。
建立并使用整个博客,我们需要完成下列几项:
一个GitHub账号
从Jekyll Themes中选择喜欢的样式,并复制到自己的GitHub中
设置网站的基本信息
创建博文
用StackEdit撰写博文
这里面提到了Mark...
Click to read more ...