【CS20-TF4DL】00 课程概览

CS20: Tensorflow for Deep Learning Research 是一门介绍如何用 Tensorflow 来进行深度学习的课程。很多时候我们学了不少理论，但真正上手都要依赖于 Tensorflow, PyTorch, MXNet 等框架。那么工欲善其事必先利其器，作为当下最火的 Tensorflow，虽然用起来并不是特别顺手，但也是有必要学习一下的。

更新历史

2019.08.02: 完成初稿

环境配置

课程提供的资料比较旧，这里就另起炉灶，原文

基本环境（基于 Mac 配置）

Python 3.6.8
Tensorflow 1.14
scipy 1.2.0
pandas 0.24.2
numpy 1.16.4
matplotlib 3.1.0

采用 virtualenv 配置虚拟环境，命令为 virtualenv -p /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 --no-site-package py36，这里具体的 python 路径可以根据不同的安装方式来自行决定，如果是用 brew 安装的话，位置会不一样。

注：pip 时添加 -i https://pypi.tuna.tsinghua.edu.cn/simple 可以加速

写些什么

对于公开课的系列，以学习笔记为主，主要会写：

核心概念
更新部分过时的代码和描述
总结一些我个人学习过程中觉得有一些理解门槛的要点，帮助大家理解
在 Github 公开源码，包括作业部分

一些写作习惯：

对于专有名词，比如 Graph/Session 之类的，通通不翻译，请不要埋怨我中文夹英文
文章中贴出的代码很大概率是节选，但是会把完整的源代码放在 github 中，如果需要，自取
不钻牛角尖，如果是明确不推荐的写法，我会直接忽略（比如不推荐多个 Graph，非要用多个，不好意思，自己折腾谢谢）
静态博客不带评论，交流可以通过微博、邮件等等途径
非常感谢勘误，会在原文中注明，如果有不想列出名字的同学，也请顺带在勘误中告知
这个列表会随时根据我的心情增加，如果不喜欢，可以直接关掉页面，不需要告诉我

希望能在自己学习的过程中也帮助大家

文章索引

课程大纲

Overview of Tensorflow
1. Why Tensorflow?
2. Graphs and Sessions
3. Check out TensorBoard
Operations
1. Basic operations, constants, variables
2. Control dependencies
3. Data pipeline
4. TensorBoard
Linear and Logistic Regression
1. Tensorflow’s Optimizers
2. tf.data
3. Example: Birth rate - life expectancy, MNIST dataset
Eager execution
1. Example: word2vec, linear regression
Variable sharing and managing experiments
1. Interfaces
2. Name scope, variable scope
3. Saver object, checkpoints
4. Autodiff Example: word2vec
Introduction to ConvNet
Convnet in TensorFlow
1. Example: image classification
Convolutional Neural Networks
1. Example: Style Transfer
Variational Auto-Encoders
Recurrent Neural Networks
1. Example: Character-level Language Modeling
Seq2seq with Attention
1. Example: Neural machine translation
Beyond RNNs - Transformer, Tensor2Tensor
Dialogue agents
Reinforcement Learning in Tensorflow
Keras

深度学习符号规范

在接下来的课程中，我们的公式部分遵从下面的规范。其中的通用规范为

上标 $(i)$ 会用来表示第 i 个训练样本
上标 $[l]$ 会用来表示的 l 层

Size

$m$ - 数据集的样本数量
$n_x$ - 输入大小
$n_y$ - 输出大小（或类别数量）
$n_h^{[l]}$ - 第 l 层的神经元数量
在循环中，也可以把第一层和最后一层这样记：$n_x=n_h^{[0]}$，$n_y=n_h^{[number \ of\ layers+1]}$
$L$ - 网络中的层数

Objects

$X \in \mathbb{R}^{n_x \times m}$ - 输入矩阵
$x^{(i)} \in \mathbb{R}^{n_x}$ - 第 i 个样本，通过一个 column vector 来表达（列）
$Y \in \mathbb{R}^{n_y\times m}$ - 标记 label 矩阵
$y^{(i)} \in \mathbb{R}^{n_y}$ - 第 i 个样本的标记 label
$W^{[l]} \in \mathbb{R}^{number\ of\ units\ in\ next\ layer \ \times \ number\ of\ units\ in\ previous\ layer}$ - 权重矩阵，l 表示对应哪层
$b^{[l]} \in \mathbb{R}^{number\ of\ units\ in\ next\ layer}$ - 第 l 层的 bias vector
$\hat{y} \in \mathbb{R}^{n_y}$ - 网络的输出向量，也可以记为 $a^{[L]}$，其中 L 是网络的层数

前向传播

$a = g^{[l]}(W_xx^{(i)} +b_1)=g^{[l]}(z_1)$

这里的 $g^{[l]}$ 表示的是第 l 层的激活函数

$\hat{y}^{(i)}=softmax(W_hh+b_2)$

通用激活函数公式 $aj^{[i]}=g^{[l]}(\sum_k w{jk}^{[l]}a_k^{[l-1]}+b_j^{[l]})=g^{[l]}(z_j^{[l]})$
损失函数 $J(x, W, b, y)$ 或 $J(\hat{y},y)$

损失函数的例子

$J{CE}(\hat{y},y)=-\sum{i=0}^m y^{(i)}log\ \hat{y}^{(i)}$
$J1(\hat{y},y)=\sum{i=0}^m|y^{(i)}-\hat{y}^{(i)}|$

网络展示

节点表示输入、激活函数或输出
边表示权重或偏置

2019 小土刀再启程