torch
is_tensor
as_tensor
,从arraylike创建tensor,共享内存,默认开启梯度计算from_numpy
,从ndarray拷贝tensorzeros(size)
zeros_like()
ones(size)
ones_like()
随机生成tensor:
bernoulli()
normal(mean,std)
poisson()
\(out_i \sim Poisson(input_i)\)rand(size)
U(0,1)rand_like()
randn(size)
N(0,1)randn_like()
randperm()
0到n-1的随机排列
Tensor
API
与ndarray一致,可互相转化,不同在于Tensor可以在GPU上运行
常用不同数据类型的Tensor:
- torch.FloatTensor, torch.DoubleTensor
- torch.ShortTensor, torch.IntTensor, torch.LongTensor
默认为FloatTensor
构造函数
torch.tensor(data,dtype,device,requires_grad,pin_memory)
参数列表:
data
list,ruple或ndarray,或tensordtype(torch.dtype)
tensor的类型,不填的话默认data的类型,可选项:torch.float
,torch.int
,其他device(torch.device)
tensor的存储位置,不填若data是tensor则在同位置,否则默认在CPU上。常用值为:torch.device('cuda:0')
。(笔者注:此参数是在CPU上训练模型用的)requires_grad(bool)
给autograd
用的,默认False
,若需要使用,可以调用requires_grad_()
来修改值pin_memory(bool)
只对CPU类的Tensor有效,分配张量到固定内存(Pinned Memory),或称页锁定内存(Page-locked Memory),默认为False
,开启后Tensor读取速度增加,需要大内存的计算机才能使用该参数。
对于numpy转tensor,使用torch.as_tensor()
避免数据拷贝
Tensor.item()
只对有一个元素的tensor有效,使用中括号索引tensor得到的结果依然是tensor,需要使用item()才能转成python的数字。
>>> x = torch.tensor([1.0])>>> x.item()1.0
Data
Dataset
torch.utils.data.Dataset
内的抽象类,需要重载__len__
和__getitem__
两个函数。
Map-style Dataset
A map-style dataset is one that implements the __getitem__()
and __len__()
protocols, and represents a map from (possibly non-integral) indices/keys to data samples.
For example, such a dataset, when accessed with dataset[idx]
, could read the idx-th image and its corresponding label from a folder on the disk.
Iterable-style datasets
An iterable-style dataset is an instance of a subclass of IterableDataset
that implements the __iter__()
protocol, and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random reads are expensive or even improbable, and where the batch size depends on the fetched data.
For example, such a dataset, when called iter(dataset)
, could return a stream of data reading from a database, a remote server, or even logs generated in real time.
数据加载
对于迭代类型的数据,数据加载顺序由用户定义的__iter__()
决定。
对于映射型数据,使用torch.utils.data.Sampler
来加载数据。
注:这块内容torch的docs讲得不清楚,还需结合具体项目学习
Dataloader
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, *, prefetch_factor=2, persistent_workers=False)
dataset (Dataset)
– 数据集batch_size (int, optional)
– 数据分批,每批多少个shuffle (bool, optional)
– 是否打乱collate_fn (callable, optional)
– 从关联式数据集分批读取时使用,可填default_collate
nn.Module
模型的抽象类,模板:
import torch.nn as nnimport torch.nn.functional as Fclass Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
torch.optim
里面有很多优化算法,需要先建立一个optimizer
对象然后它会根据计算出的梯度来更新参数。
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)optimizer = optim.Adam([var1, var2], lr=0.0001)
支持算法:Adadelta,Adagrad,Adam,AdamW,SparseAdam,Adamax,ASGD,LBFGS,NAdam,RAdam,RMSprop,Rprop,SGD
模型加载和保存
保存的对象是model类
torch.save(model,'./model.pth')
load_model = torch.load('model.pth')