关于C ++：快速而肮脏的方式来分析代码

Quick and dirty way to profile your code

当您要获取有关特定代码路径的性能数据时，使用哪种方法？

这种方法有几个局限性，但我仍然发现它非常有用。我会先列出限制(我知道)，让任何想使用它的人自负风险。

我发布的原始版本在递归调用上花费了过多的时间(如答案注释中所指出)。

在我添加代码以忽略递归之前，它不是线程安全的，也不是线程安全的，现在它甚至更不安全。

尽管多次调用(百万次)非常有效，但是它将对结果产生可测量的影响，因此您测量的范围将比不使用的范围花费更长的时间。

当手头的问题不能证明我对所有代码进行性能分析是合理的，或者我从要验证的分析器中获取了一些数据时，我就使用此类。基本上，它会汇总您在特定块中所花费的时间，并在程序结束时将其输出到调试流(可通过DbgView查看)，包括执行代码的次数(以及平均花费的时间)。

#pragma once
#include <tchar.h>
#include <windows.h>
#include <sstream>
#include <boost/noncopyable.hpp>

namespace scope_timer {
class time_collector : boost::noncopyable {
__int64 total;
LARGE_INTEGER start;
size_t times;
const TCHAR* name;

double cpu_frequency()
{ // cache the CPU frequency, which doesn't change.
static double ret = 0; // store as double so devision later on is floating point and not truncating
if (ret == 0) {
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
ret = static_cast<double>(freq.QuadPart);
}
return ret;
}
bool in_use;

public:
time_collector(const TCHAR* n)
: times(0)
, name(n)
, total(0)
, start(LARGE_INTEGER())
, in_use(false)
{
}

~time_collector()
{
std::basic_ostringstream<TCHAR> msg;
msg << _T("scope_timer>") << name << _T(" called:");

double seconds = total / cpu_frequency();
double average = seconds / times;

msg << times << _T(" times total time:") << seconds << _T(" seconds ")
<< _T(" (avg") << average <<_T(")
");
OutputDebugString(msg.str().c_str());
}

void add_time(__int64 ticks)
{
total += ticks;
++times;
in_use = false;
}

bool aquire()
{
if (in_use)
return false;
in_use = true;
return true;
}
};

class one_time : boost::noncopyable {
LARGE_INTEGER start;
time_collector* collector;
public:
one_time(time_collector& tc)
{
if (tc.aquire()) {
collector = &tc;
QueryPerformanceCounter(&start);
}
else
collector = 0;
}

~one_time()
{
if (collector) {
LARGE_INTEGER end;
QueryPerformanceCounter(&end);
collector->add_time(end.QuadPart - start.QuadPart);
}
}
};
}

// Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number)
#define TIME_THIS_SCOPE(name) \
static scope_timer::time_collector st_time_collector_##name(_T(#name)); \
scope_timer::one_time st_one_time_##name(st_time_collector_##name)

请注意，以下内容都是专门为Windows写的。

我还编写了一个计时器类，以使用QueryPerformanceCounter()进行高精度的性能分析，以获取高精度时序，但略有不同。当Timer对象超出范围时，我的计时器类不会转储经过的时间。而是将经过的时间累积到一个集合中。我添加了一个静态成员函数Dump()，该函数创建一个经过时间表，按计时类别(在Timer的构造函数中指定为字符串)进行排序，并进行一些统计分析，例如平均经过时间，标准偏差，最大值和最小值。我还添加了一个Clear()静态成员函数，该函数清除了集合并让您重新开始。

如何使用Timer类(伪代码)：

1
2
3
4
5
6
7

int CInsertBuffer::Read(char* pBuf)
{
// TIMER NOTES: Avg Execution Time = ~1 ms
Timer timer("BufferRead");
: :
return -1;
}

样本输出：

1
2
3
4
5
6
7
8
9
10

Timer Precision = 418.0095 ps

=== Item Trials Ttl Time Avg Time Mean Time StdDev ===
AddTrade 500 7 ms 14 us 12 us 24 us
BufferRead 511 1:19.25 0.16 s 621 ns 2.48 s
BufferWrite 516 511 us 991 ns 482 ns 11 us
ImportPos Loop 1002 18.62 s 19 ms 77 us 0.51 s
ImportPosition 2 18.75 s 9.38 s 16.17 s 13.59 s
Insert 515 4.26 s 8 ms 5 ms 27 ms
recv 101 18.54 s 0.18 s 2603 ns 1.63 s

文件Timer.inl：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290

#include <map>
#include"x:\utils\stlext\stringext.h"
#include <iterator>
#include <set>
#include <vector>
#include <numeric>
#include"x:\utils\stlext\algorithmext.h"
#include <math.h>

class Timer
{
public:
Timer(const char* name)
{
label = std::safe_string(name);
QueryPerformanceCounter(&startTime);
}

virtual ~Timer()
{
QueryPerformanceCounter(&stopTime);
__int64 clocks = stopTime.QuadPart-startTime.QuadPart;
double elapsed = (double)clocks/(double)TimerFreq();
TimeMap().insert(std::make_pair(label,elapsed));
};

static std::string Dump(bool ClipboardAlso=true)
{
static const std::string loc ="Timer::Dump";

if( TimeMap().empty() )
{
return"No trials

";
}

std::string ret = std::formatstr("

Timer Precision = %s

", format_elapsed(1.0/(double)TimerFreq()).c_str());

// get a list of keys
typedef std::set<std::string> keyset;
keyset keys;
std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key());

size_t maxrows = 0;

typedef std::vector<std::string> strings;
strings lines;

static const size_t tabWidth = 9;

std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2,"Item", tabWidth, tabWidth,"Trials", tabWidth, tabWidth,"Ttl Time", tabWidth, tabWidth,"Avg Time", tabWidth, tabWidth,"Mean Time", tabWidth, tabWidth,"StdDev");
ret += std::formatstr("

%s

", head.c_str());
if( ClipboardAlso )
lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev

");
// dump the values for each key
{for( keyset::iterator key = keys.begin(); keys.end() != key; ++key )
{
time_type ttl = 0;
ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key));
size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key));
if( num > maxrows )
maxrows = num;
time_type avg = ttl / num;

// compute mean
std::vector<time_type> sortedTimes;
std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key));
std::sort(sortedTimes.begin(), sortedTimes.end());
size_t mid = (size_t)floor((double)num/2.0);
double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid];
// compute variance
double sum = 0.0;
if( num > 1 )
{
for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt )
sum += pow(*timeIt-mean,2.0);
}
// compute std dev
double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0;

ret += std::formatstr(" %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s

", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str());
if( ClipboardAlso )
lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s

", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str()));

}
}
ret += std::formatstr("%s

", std::string(head.length(),'=').c_str());

if( ClipboardAlso )
{
// dump header row of data block
lines.push_back("");
{
std::string s;
for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
{
if( key != keys.begin() )
s.append("\t");
s.append(*key);
}
s.append("

");
lines.push_back(s);
}

// blow out the flat map of time values to a seperate vector of times for each key
typedef std::map<std::string, std::vector<time_type> > nodematrix;
nodematrix nodes;
for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time )
nodes[time->first].push_back(time->second);

// dump each data point
for( size_t row = 0; row < maxrows; ++row )
{
std::string rowDump;
for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
{
if( key != keys.begin() )
rowDump.append("\t");
if( nodes[*key].size() > row )
rowDump.append(std::formatstr("%f", nodes[*key][row]));
}
rowDump.append("

");
lines.push_back(rowDump);
}

// dump to the clipboard
std::string dump;
for( strings::iterator s = lines.begin(); s != lines.end(); ++s )
{
dump.append(*s);
}

OpenClipboard(0);
EmptyClipboard();
HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1);
if( hg != 0 )
{
char* buf = (char*)GlobalLock(hg);
if( buf != 0 )
{
std::copy(dump.begin(), dump.end(), buf);
buf[dump.length()] = 0;
GlobalUnlock(hg);
SetClipboardData(CF_TEXT, hg);
}
}
CloseClipboard();
}

return ret;
}

static void Reset()
{
TimeMap().clear();
}

static std::string format_elapsed(double d)
{
if( d < 0.00000001 )
{
// show in ps with 4 digits
return std::formatstr("%0.4f ps", d * 1000000000000.0);
}
if( d < 0.00001 )
{
// show in ns
return std::formatstr("%0.0f ns", d * 1000000000.0);
}
if( d < 0.001 )
{
// show in us
return std::formatstr("%0.0f us", d * 1000000.0);
}
if( d < 0.1 )
{
// show in ms
return std::formatstr("%0.0f ms", d * 1000.0);
}
if( d <= 60.0 )
{
// show in seconds
return std::formatstr("%0.2f s", d);
}
if( d < 3600.0 )
{
// show in min:sec
return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
}
// show in h:min:sec
return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
}

private:
static __int64 TimerFreq()
{
static __int64 freq = 0;
static bool init = false;
if( !init )
{
LARGE_INTEGER li;
QueryPerformanceFrequency(&li);
freq = li.QuadPart;
init = true;
}
return freq;
}
LARGE_INTEGER startTime, stopTime;
std::string label;

typedef std::string key_type;
typedef double time_type;
typedef std::multimap<key_type, time_type> Times;
// static Times times;
static Times& TimeMap()
{
static Times times_;
return times_;
}

struct extract_key : public std::unary_function<Times::value_type, key_type>
{
std::string operator()(Times::value_type const & r) const
{
return r.first;
}
};

struct extract_val : public std::unary_function<Times::value_type, time_type>
{
time_type operator()(Times::value_type const & r) const
{
return r.second;
}
};
struct match_key : public std::unary_function<Times::value_type, bool>
{
match_key(key_type const & key_) : key(key_) {};
bool operator()(Times::value_type const & rhs) const
{
return key == rhs.first;
}
private:
match_key& operator=(match_key&) { return * this; }
const key_type key;
};

struct accum_key : public std::binary_function<time_type, Times::value_type, time_type>
{
accum_key(key_type const & key_) : key(key_), n(0) {};
time_type operator()(time_type const & v, Times::value_type const & rhs) const
{
if( key == rhs.first )
{
++n;
return rhs.second + v;
}
return v;
}
private:
accum_key& operator=(accum_key&) { return * this; }
const Times::key_type key;
mutable size_t n;
};
};

文件stringext.h(提供formatstr()函数)：

namespace std
{
/* ---

Formatted Print

template<class C>
int strprintf(basic_string<C>* pString, const C* pFmt, ...);

template<class C>
int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args);

Returns :

# characters printed to output

Effects :

Writes formatted data to a string. strprintf() works exactly the same as sprintf(); see your
documentation for sprintf() for details of peration. vstrprintf() also works the same as sprintf(),
but instead of accepting a variable paramater list it accepts a va_list argument.

Requires :

pString is a pointer to a basic_string<>

--- */

template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr);

template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr)
{
# ifdef SECURE_VSPRINTF
return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
# else
return _vsnprintf(buffer, bufferSize-1, format, argptr);
# endif
}

template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr)
{
# ifdef SECURE_VSPRINTF
return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
# else
return _vsnwprintf(buffer, bufferSize-1, format, argptr);
# endif
}

template<class Type, class Traits>
inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args)
{
// prologue
static const size_t ChunkSize = 1024;
size_t curBufSize = 0;
outStr.erase();

if( !format )
{
return 0;
}

// keep trying to write the string to an ever-increasing buffer until
// either we get the string written or we run out of memory
while( bool cont = true )
{
// allocate a local buffer
curBufSize += ChunkSize;
std::ref_ptr<Type> localBuffer = new Type[curBufSize];
if( localBuffer.get() == 0 )
{
// we ran out of memory -- nice goin'!
return -1;
}
// format output to local buffer
int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args);
if( -1 == i )
{
// the buffer wasn't big enough -- try again
continue;
}
else if( i < 0 )
{
// something wierd happened -- bail
return i;
}
// if we get to this point the string was written completely -- stop looping
outStr.assign(localBuffer.get(),i);
return i;
}
// unreachable code
return -1;
};

// provided for backward-compatibility
template<class Type, class Traits>
inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args)
{
return vstringprintf(*outStr, format, args);
}

template<class Char, class Traits>
inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...)
{
va_list args;
va_start(args, format);
int retval = vstringprintf(outString, format, args);
va_end(args);
return retval;
}

// old function provided for backward-compatibility
template<class Char, class Traits>
inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...)
{
va_list args;
va_start(args, format);
int retval = vstringprintf(*outString, format, args);
va_end(args);
return retval;
}

/* ---

Inline Formatted Print

string strprintf(const char* Format, ...);

Returns :

Formatted string

Effects :

Writes formatted data to a string. formatstr() works the same as sprintf(); see your
documentation for sprintf() for details of operation.

--- */

template<class Char>
inline std::basic_string<Char> formatstr(const Char * format, ...)
{
std::string outString;

va_list args;
va_start(args, format);
vstringprintf(outString, format, args);
va_end(args);
return outString;
}
};

文件algorithmext.h(提供transform_if()函数)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

/* ---

Transform
25.2.3

template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate>
OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred)

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred)

Requires:

T is of type EqualityComparable (20.1.1)
op and binary_op have no side effects

Effects :

Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of:
1: op( *(first1 + (i - result))
2: binary_op( *(first1 + (i - result), *(first2 + (i - result))

Returns :

result + (last1 - first1)

Complexity :

At most last1 - first1 applications of op or binary_op

--- */

template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate>
OutputIterator transform_if(InputIterator first,
InputIterator last,
OutputIterator result,
UnaryFunction f,
Predicate pred)
{
for (; first != last; ++first)
{
if( pred(*first) )
*result++ = f(*first);
}
return result;
}

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
OutputIterator transform_if(InputIterator1 first1,
InputIterator1 last1,
InputIterator2 first2,
OutputIterator result,
BinaryOperation binary_op,
Predicate pred)
{
for (; first1 != last1 ; ++first1, ++first2)
{
if( pred(*first1) )
*result++ = binary_op(*first1,*first2);
}
return result;
}

我通过创建两个类来进行配置文件：cProfile和cProfileManager。

cProfileManager将保存来自cProfile的所有数据。

cProfile具有以下要求：

cProfile具有用于初始化当前时间的构造函数。
cProfile具有一个解构函数，该构造函数将类存活的总时间发送给cProfileManager

要使用这些概要文件类，我首先创建cProfileManager的实例。然后，将要分析的代码块放在花括号内。在花括号内，我创建一个cProfile实例。代码块结束时，cProfile将把代码块完成所花费的时间发送到cProfileManager。

范例程式码
这是代码示例(简化)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

class cProfile
{
cProfile()
{
TimeStart = GetTime();
};

~cProfile()
{
ProfileManager->AddProfile (GetTime() - TimeStart);
}

float TimeStart;
}

要使用cProfile，我将执行以下操作：

1
2
3
4
5
6
7
8
9

int main()
{
printf("Start test");
{
cProfile Profile;
Calculate();
}
ProfileManager->OutputData();
}

或这个：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

void foobar()
{
cProfile ProfileFoobar;

foo();
{
cProfile ProfileBarCheck;
while (bar())
{
cProfile ProfileSpam;
spam();
}
}
}

技术说明

这段代码实际上是对C ++中作用域，构造函数和反构造函数工作方式的滥用。 cProfile仅存在于块作用域(我们要测试的代码块)内。程序离开块范围后，cProfile记录结果。

其他增强功能

您可以将字符串参数添加到构造函数中，以便执行以下操作：
cProfile Profile("复杂计算的配置文件")；
您可以使用宏使代码看起来更简洁(请注意不要滥用它。与我们对语言的其他滥用不同，使用宏可能会很危险)。

例：

＃定义START_PROFILE cProfile Profile(); {
#define END_PROFILE}
cProfileManager可以检查代码块被调用了多少次。但是您需要一个代码块标识符。第一个增强功能可以帮助识别块。如果要分析的代码在循环内(例如第二个示例aboe)，这可能会很有用。您还可以添加代码块花费的平均，最快和最长执行时间。
如果您处于调试模式，请不要忘记添加检查以跳过分析。

好吧，我有两个代码段。用伪代码，它们看起来像(它是简化版本，实际上我在使用QueryPerformanceFrequency)：

第一个片段：

1 2	Timer timer = new Timer timer.Start

第二段：

1 2	timer.Stop show elapsed time

一点热键功夫，我可以说这段代码从我的CPU中偷走了多少时间。

因此，我编写了一个简单的跨平台类nanotimer。目标是尽可能轻巧，以免通过添加太多指令从而影响指令缓存来干扰实际代码性能。它能够在Windows，Mac和Linux(以及某些Unix变体)上获得微秒级的精度。

基本用法：

1
2
3
4
5
6

plf::timer t;
timer.start();

// stuff

double elapsed = t.get_elapsed_ns(); // Get nanoseconds

如果需要，start()还会重新启动计时器。"暂停"计时器可以通过存储经过的时间来实现，然后在"取消暂停"时重新启动计时器，并在下次检查经过的时间时将其添加到存储的结果中。

我有一个快速且肮脏的分析类，即使在最紧密的内部循环中也可以用于分析。重点在于极轻的重量和简单的代码。该类分配一个固定大小的二维数组。然后，我在各处添加"检查点"调用。当在检查点M之后立即到达检查点N时，我将经过的时间(以微秒为单位)添加到数组项[M，N]。由于这是为了分析紧密循环而设计的，所以我也有"迭代开始"调用，该调用可以重置"最后一个检查点"变量。在测试结束时，dumpResults()调用会生成所有紧随其后的所有检查点对的列表，以及已计和未计的总时间。

代码分析器和优化文章中有很多有关C ++代码概要分析的信息，并且有指向程序/类的免费下载链接，该链接将为您显示不同代码路径/方法的图形表示。