综述

实验名称与内容：
多线程计算正弦值
方法：利用正弦函数的泰勒级数展开式计算结果，要求使用Pthread并行化实现，输入包含弧度值、展开式的项数和线程数

公式懒得粘贴了
实验环境的配置参数
CPU：2Intel(R) Xeon(R) Gold 5218
内存：256G
硬盘：3600G，使用RAID5磁盘阵列技术，可用容量为1.2T
操作系统：CentOS7
编译环境：GCC4.8.5、OpenMPI 1.10.7
实验题目问题分析
密集型数值计算问题，可将数据划分并行
将N项任务平均分配给多个线程，每个线程按固定步长跳跃式处理项
每个线程先计算局部和val，最后通过互斥锁合并到res(自己加上)

实验报告具体填法建议用一下agent
方案设计
1. 大概思路：通过命令行输入实验所需参数，再在主线程中解析参数并创建子线程，在子线程中进行局部计算，最后将结果累加到全局共享变量，主线程输出结果res
2. 流程图：此处略
3. 伪代码：把.cpp .pbs丢给agent分析得出(记得粘图片)
实现方法：
1. cal_sin.cpp(粘图片)
2. cal_sin.pbs(粘图片)
  1. 申请1个节点
  2. 通过修改ppn值来改变线程数
  3. 合并标准输出和错误输出到同一个文件
  4. 切换到作业提交时的目录
  5. 获取分配的核心数(等于ppn值)
  6. 运行程序，测量时间，结果保存到run.log
结果分析
1. 数据：
  1. 未加锁(选做)：(ppn值依次为16 8 4 2 1)
    不同ppn值的cal_sin.oxxx截图
    数据截图(运行次数10次，10个real time)
    加速比数据
    加速比曲线
  2. 加锁：(ppn值依次为16 8 4 2 1)
    不同ppn值的cal_sin.oxxx截图
    数据截图(运行次数10次，10个real time)
    加速比数据
    加速比曲线
2. 结果分析：随着线程数增加，初期对作业时间的优化较大，当线程数到达某一程度时，优化效果不再明显，性能损失加剧，可能是线程越多锁竞争越激烈，从而导致性能瓶颈
个人总结：交给agent罢

实验操作

ssh连接,密码学号(直接用powershell)

1	ssh -p 9922 用户名@172.28.9.54

进入后如下操作：

cd ~
ls
mkdir -p data # ls结果中没有则创建
cat > cal_sin.cpp

用编辑器打开本地文件cal_sin.cpp，直接把代码复制到powershell并保存
验证结果：

1	cat cal_sin.cpp

集群上ftp好像不能直接连接，此处懒的下载FileZilla

加锁(cal_sin.cpp)：
实验提供的cpp版本是没加锁的，这里建议加上互斥锁

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

double res;
pthread_mutex_t mutex;

typedef struct Args {
    float radian;
    long max_n;
    long begin;
    long step;
} Args;

void *cal(void *arg);

int main(int argc, char *argv[]) {
    if (argc != 4) {
        printf("Parameters error: radian N threads!\n");
        exit(1);
    }

    long i;
    double radian = atof(argv[1]);
    long max_n = atol(argv[2]);
    int threads = atoi(argv[3]);
    Args *arg;
    double* val;

    pthread_t *pid;
    pid = (pthread_t *)malloc(threads * sizeof(pthread_t));
    pthread_mutex_init(&mutex,NULL);
    res = 0.0;
    for (i = 0; i < threads; i++) {
        arg = (Args *)malloc(sizeof(Args));
        arg->radian = radian;
        arg->max_n = max_n;
        arg->begin = i;
        arg->step = threads;
        pthread_create(&pid[i], NULL, cal, (void *)arg);
    }
    for (i = 0; i < threads; i++) {
        pthread_join(pid[i], NULL);
    }
    printf("%lf\n", res);
    pthread_mutex_destroy(&mutex);
    free(pid);
    return 0;
}

void *cal(void *_arg) {
    long i, j;
    double val, tmp;
    Args *arg = (Args *)_arg;

    val = 0.0;
    for (i = arg->begin; i <= arg->max_n; i += arg->step) {
        tmp = 1.0;
        for (j = 1; j <= 2*i+1; j++) {
            tmp *= arg->radian / j;
        }
        if (i % 2 > 0) {
            val -= tmp;
        } else {
            val += tmp;
        }
    }
    pthread_mutex_lock(&mutex);
    res += val;
    pthread_mutex_unlock(&mutex);
    free(arg);
    return NULL;
}

编译测试

g++ -pthread -o cal_sin cal_sin.cpp
./cal_sin 1.57 10 1
./cal_sin 3.14 100 2
./cal_sin 0.78 1000 4

提交作业并记录实验数据

pbs直接使用实验提供的即可，传入过程与.cpp同理

qsub cal_sin.pbs # 提交
qstat # 状态C即为完成作业
cat run.log # 查看计算结果
cat cal_sin.oxxx # 查看运行时间，o后面接作业ID

数据只记录real time即可，每个ppn值共运行10次，然后取平均值

脚本默认ppn=8，加锁状态下平均值约为7.88

更改脚本中的ppn值(我取了1 2 4 8 16)，重复第5步

1	vim cal_sin.pbs

计算加速比

加速比 = 单线程运行时间(ppn = 1) / 多线程运行时间