Q超越兔子的蜗牛O--逸云沙鸥Linux

飘飘何所似，天地一沙鸥;落霞与孤鹜齐飞，秋水共长天一色~~

日志

关于我

逸云沙鸥

不薄今人爱古文,潇洒自由过一生！好学深思千古事,勤劳善良专一心！

文章分类

伟大的尝试，使用了27G内存的Perl程序

2009-12-24 17:35:34| 分类： Bioinformatics | 标签： |举报 |字号大中小订阅

下载LOFTER 我的照片书 |

题目描述：（有点小复杂）lxc想用matlab来处理一批数据，chr1.depth 中存有位点坐标（有跳跃）和相应的匹配覆盖度，chr1.methy_rate中含有位点坐标和该点的甲基化概率，先要统计chr1.methy_rate中概率大于0.7的为正样本，为零的为负样本；再将正负样本中的位点坐标到chr1.depth中匹配，查看其相应位点处上下50bp位置都连续的点的覆盖度，并且按照：

1）正样本匹配结果 +1 1:2 2:2 3:2 4:2 5:2 6:2 7:2 8:2 9:2 10:2 ...

2）负样本匹配结果 -1 1:2 2:2 3:2 4:2 5:2 6:2 7:2 8:2 9:2 10:2 ...

的模式，将每个样本点采集100个数据为一行，输出到结果文件中。麻烦的是以上两个文件数据量很大，都在700Mb 左右。为了用大内存加速计算，我直接采用了hash，一次性将相关文件读入到内存。写好脚本文件，成功投递到集群机，花了约半个小时，完成计算，正负样本采集数据量如下：

（老l用C++标准库写的程序，计算了16个小时产出数据目前都才十几Mb）

124M Dec 24 15:46 sample.po

1.3G Dec 24 15:46 sample.ng

［叹］：真正体会到了用集群做生物信息的数据分析的必要性和好处！

附上关键脚本，供纪念：

1 #!/usr/bin/perl -w
2 #compare_qunero.pl
3
4 if (4 != @ARGV){
5     die " usage:compare_qunero.pl depth_file rate_file sample_p sample_n!";
6 }
7
8 my @files = @ARGV;
9 my $depth_file =  shift @files;
10 my $rate_file =  shift @files;
11 my $pos_file = shift @files;
12 my $neg_file = shift @files;
13 print "$depth_file $rate_file $pos_file $neg_file \n";
14
15 open DEPTH,"<","$depth_file";
16 open RATE,"<","$rate_file";
17 open POS,">","$pos_file";
18 open NEG,">","$neg_file";
19
20 my %index;
21 my %depth;
22 $loc=0;
23 while(<DEPTH>){
24     ($dep,$val)=split;
25     $index{$dep}=$loc++;
26     $depth{$dep}=$val;
27 }
28
29 my %rate_p;
30 my %rate_n;
31 my $value;
32 my $key;
33 my $i;
34 my $start;
35 while(<RATE>){
36     ($key,$value)=split;
37     if($value==0){
38         $rate_n{$key}=0;
39     }elsif($value>0.7){
40         $rate_p{$key}=$value;
41     }#else do nothing
42 }
43
44 foreach $key(sort keys %rate_p){
45     $start=$key-51;
46     if(100 == $index{$key+50}-$index{$key-50}){
47         print POS "+1 ";
48         for($i=1 ; $i<=100 ; $i++){
49             print POS "$i:$depth{$start+$i} ";
50         }
51         print POS "\n";
52     }
53 }
54
55 foreach $key(sort keys %rate_n){
56     $start=$key-51;
57     if(100 == $index{$key+50}-$index{$key-50}){
58         print NEG "-1 ";
59         for($i=1 ; $i<=100 ; $i++){
60             print NEG "$i:$depth{$start+$i} ";
61         }
62         print NEG "\n";
63     }
64 }
65

评论这张

转发至微博

阅读(831)| 评论(5)

历史上的今天

this.p={  m:2,
              b:2,
              loftPermalink:'',
              id:'fks_083071092083084071083085086095087083081068083080094069',
              blogTitle:'伟大的尝试，使用了27G内存的Perl程序',
              blogAbstract:'<span style=\"font-family: monospace; line-height: normal; font-size: medium; color: rgb(255, 255, 255); \"\><font color=\"#ffff00\"\><span style=\"color: rgb(51, 51, 51); font-family: Tahoma, Verdana, STHeiTi, simsun, sans-serif; line-height: 21px; font-size: 14px; \"\><p style=\"margin-top: 0px; margin-right: 0px; margin-bottom: 0.8em; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; \"\>题目描述：（有点小复杂）lxc想用matlab来处理一批数据，chr1.depth 中存有位点坐标（有跳跃） 和相应的匹配覆盖度，chr1.methy_rate中含有位点坐标和该点的甲基化概率，先要统计chr1.methy_rate中概率大于0.7的为正样本，为零的为负样本；再将正负样本中的位点坐标到chr1.depth中匹配，查看其相应位点处 上下50bp位置都连续的点的覆盖度，并且按照：</p\><p style=\"margin-top: 0px; margin-right: 0px; margin-bottom: 0.8em; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; \"\>1）正样本匹配结果 +1</p\></span\></font\></span\>',
              blogTag:'',
              blogUrl:'blog/static/350776862009112453534461',
              isPublished:1,
              istop:false,
              type:0,
              modifyTime:0,
              publishTime:1261647334461,
              permalink:'blog/static/350776862009112453534461',
              commentCount:5,
              mainCommentCount:2,
              recommendCount:0,
              bsrk:-100,
              publisherId:0,
              recomBlogHome:false,
              currentRecomBlog:false,
              attachmentsFileIds:[],
              vote:{},
              groupInfo:{},
              friendstatus:'none',
              followstatus:'unFollow',
              pubSucc:'',
              visitorProvince:'',
              visitorCity:'',
              visitorNewUser:false,
              postAddInfo:{},
              mset:'000',
              mcon:'',
              srk:-100,
              remindgoodnightblog:false,
              isBlackVisitor:false,
              isShowYodaoAd:true,
              hostIntro:'不薄今人爱古文,潇洒自由过一生！\n好学深思千古事,勤劳善良专一心！',
              hmcon:'1',
              selfRecomBlogCount:'0',
              lofter_single:'<iframe width="140" height="560" style="overflow:hidden;" src="http://www.lofter.com/mailEntry.do?blogad=1&blog" frameBorder="0"></iframe>'
            }

{list a as x}
    {if !!x}
    <div class="iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
      {if x.visitorName==visitor.userName}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        {if x.moveFrom=='wap'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/wapblog.html?frompersonalbloghome"><span title="来自网易手机博客" class="iblock wapIcon"> </span></a>
        {elseif x.moveFrom=='iphone'}
          <a class="noul pnt" target="_blank"><span title="来自iPhone客户端" class="iblock iphoneIcon"> </span></a>
        {elseif x.moveFrom=='android'}
          <a class="noul pnt" target="_blank"><span title="来自Android客户端" class="iblock androidIcon"> </span></a>
        {elseif x.moveFrom=='mobile'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/emsblog.html?frompersonalbloghome"><span title="来自网易短信写博" class="iblock wapIcon"> </span></a>
        {/if}
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
          ${fn(x.visitorNickname,8)|escape}
        </a>
      </div>
    </div>
    {/if}
    {/list}

<#--最新日志，群博日志--> <#--推荐日志-->

<p class="fc06">推荐过这篇日志的人：</p>
    <div>
      {list a as x}
      {if !!x}
      <div class="iblock nbw-fce nbw-f40">
        <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
        <img alt="${x.recommenderNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.recommenderName)}"/>
        </a>
        <div class="cwd thide">
          <a class="fc03 m2a" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
            ${fn(x.recommenderNickname,6)|escape}
          </a>
        </div>
      </div>
      {/if}
      {/list}
    </div>
    {if !!b&&b.length>0}
    <p  class="fc06">他们还推荐了：</p>
    <ul>
    {list b as y}
      {if !!y}
        <li class="rrb"><span class="iblock">·</span><a class="fc03 m2a" target="_blank" href="http://blog.163.com/${y.recommendBlogPermalink}/?from=blog/static/350776862009112453534461">${y.recommendBlogTitle|escape}</a></li>
      {/if}
    {/list}
    </ul>
    {/if}

<#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇，下一篇--> <#-- 热度 -->

{list a as x}
    {if !!x}
    <div class="hotItem iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
      {if x.publisherUsername==visitor.userName}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
          ${fn(x.publisherNickname,8)|escape}
        </a>
      </div>
      <a class="f-myLikeIcons hottype {if x.type==1} js-liketype{elseif x.type==2} js-reblogtype{elseif x.type==3} js-sharetype{else}{/if}" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/"> </a>
    </div>
    {/if}
    {/list}

<#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->

页脚

我的照片书 - 手机博客 - 下载LOFTER APP - 订阅此博客

Q超越兔子的蜗牛O--逸云沙鸥Linux

导航

日志

伟大的尝试，使用了27G内存的Perl程序

历史上的今天

最近读者

热度

评论

页脚