搜索
您的当前位置:首页正文

sum

2020-08-15 来源:易榕旅网
BIOINFORMATICSAPPLICATIONSNOTE

Vol.19no.102003,pages1294–1295DOI:10.1093/bioinformatics/btg135

Longestbiasedintervalandlongestnon-negativesuminterval

LloydAllison

SchoolofComputerScienceandSoftwareEngineering,MonashUniversity,Clayton,Victoria,Australia3800

ReceivedonOctober16,2002;revisedonNovember17,2002;acceptedonJanuary22,2003

ABSTRACT

Summary:Describedisanalgorithmtofindthelongestintervalhavingatleastaspecifiedminimumbiasinasequenceofcharacters(bases,aminoacids),e.g.‘atleast0.95(A+T)-rich’.Itisbasedonanalgorithmtofindthelongestintervalhavinganon-negativesuminasequenceofpositiveandnegativenumbers.Inpractice,itrunsinlineartime;thiscanbeguaranteedifthebiasisrational.Availability:Javacodeofthealgorithmcanbefoundathttp://www.csse.monash.edu.au/∼lloyd/tildeProgLang/Java2/Biased/

Contact:lloyd@bruce.cs.monash.edu.au

Supplementaryinformation:ExamplesofapplicationstoPlasmodiumfalciparumgenomicDNAcanbefoundattheaboveURL.

INTRODUCTION

Regionsoflowinformationcontentinbiologicalse-quencescanbeofconsiderableinterest,forexampleinconnectionwiththecentromeresofPlasmodiumfalci-parum‘Elevenofthe14chromosomescontainedasingleregionof2–3kbwithextremelyhigh(A+T)content(0.97)’Gardneretal.(2002).

Here,analgorithmisdescribedtofindthelongestbiasedinterval(LBI),havingaspecifiedminimumbias,inasequenceofcharacters.Itreliesonanalgorithmwhich,givenasequenceofnumbers,findsthelongestnon-negativesuminterval.

Theproblemconsideredhereisrelatedtothemaximumsuminterval(MSI)problem,thatisgivenasequenceofnumbers,x[0...n−1],findaninterval,x[i...j],suchthatthesumofthenumbersintheintervalisaslargeaspossible.Bentley(1986)attributesthe2DversionofthatproblemtoU.Grenanderandthesolutiontothe1DproblemtoJ.Kadane.TheMSIproblemissolvedbyconsideringcumulativesumsinthesequencefromx[0]tox[k](solidline,Fig.1).AnMSIcorrespondstoamaximumincreaseinsumsforx[0...i−1]tox[0...j]wherei󰀁j+1.Below,arelatedideaisusedtosolvethenewproblem.

1294

NON-NEGATIVESUMINTERVALS

Givenasequenceofnumbers,x[0...n−1],thelongestnon-negativesuminterval(LNNSI)problemistofindaninterval,x[i...j],that(a)sumstoanon-negativetotaland(b)isaslongaspossible.Ittoocanbesolvedbyconsideringcumulativesumsfromx[0]tox[k].Butinthiscase,anycandidateintervalcorrespondstoanon-decreaseinsumsforx[0...i−1]tox[0...j]wherei󰀁j+1.Thecumulativesumiscalculatedinalefttorightscanofthesequence.Auxiliaryarraysareusedtostore:(a)firstoccurrencesofnewminimainthesuccessivecumulativesumsand(b)thepositionsofthosefirstoccurrences(dottedline,Fig.1);thedashedlineinthefigureindicatesthelastpointatwhichthesumisatoraboveeachvalue.Ateachscanposition,j,thealgorithmconsidersthebestcandidateinterval,x[i...j],overiin0...j+1.Thecumulativesumforx[0...i−1]mustbenomorethanthatforx[0...j],infactimustbethesmallestsuchindex;imaginejadvancingalongthesolidlineanditrackingbackandforthalongthedottedlineinthefigurewhilestayingatorjustbelowj’sheight.Thebestoutofallsuchcandidateintervals,x[i...j],isrememberedduringthescantoyieldthesolution.

BIASEDINTERVALS

Givenasequenceofcharacters,s[0...n−1],aselectionofpreferredcharacters,andaminimumproportion(bias),f,ofpreferredcharacters,thelongestbiasedinterval(LBI)problemistofindaninterval,s[i...j],suchthat:(a)theintervalcontainsatleastthespecifiedproportionofpreferredcharacters;and(b)theintervalisaslongaspossible.

Preferredcharactersare‘good’andaregivenapositivescore,1−f.Othercharactersof‘bad’andaregivenanegativescore,−f.Onlycandidateintervalshavenon-negativetotalscores:Foranintervalofggoodandbbadcharacters,g/(g+b)󰀂fisequivalenttog󰀂(g+b)·fandtog·(1−f)−b·f󰀂0,soLBIisequivalenttoanLNNSIproblem.

AsforLNNSI,thesequence(ofcharacters)isscannedfromlefttoright,computingcumulativesumsofscores,

cOxfordUniversityPress2003;allrightsreserved.Bioinformatics19(10)󰀇

LBIandLNNSI

Fig.1.Cumulativesums:solidline:cumulativesum;dottedline:firstbelow(newminima):dashedline:lastabove.

andcandidateintervals,s[i...j].Thealgorithmrunsinlinear-timeinpracticebecausewhenjisincreasedtoj+1,ialmostalwayschangesbyonlyasmallamountunderreasonableconditions:Ifthesumtoj+1increases,thenimay‘retreat’afewstepstoanearlierandhigherminimum(dottedline,Fig.1).Ifthesumtoj+1decreases,thenimay‘advance’byafewstepstoalaterandlowerminimum.Forpathologicalsequencesand‘suitable’choicesoff,thealgorithmcouldinprinciplerunmoreslowly.Inpractice,ascanofchromosome-14(3mb)ofP.falciparumtakeslessthan1s:Javacode,just-in-timecompileranda900MHzprocessor.Itisthuspracticaltoscanevenverylongsequencesformanydifferentvaluesofbias,f.

Thereisanalternativecodingofthealgorithmwhichisguaranteedtoruninlinear-timeiftheminimumbias,f,isarationalnumber:Integerscores,+pand−qsay,canthenbeusedandthelocationsofnewminimastoredinanarrayindexedbyvalue.Thefirstoccurrenceofagivencumulative-sumvaluecanthenbefoundusingrandomaccess,andthestartpositionofthecandidateintervaladjustedinO(1)-time.Worst-casespace-complexityislinearbutthemultiplieristhelargerofpandq.

RuzzoandTompa(1999)gavealinear-timealgorithmtofindallmaximumsumintervals(MSIs)ofasequence,i.e.everyintervalthatcannotbeextendedwithoutdecreasingitssum.Forourproblem,itispossibletohavetwooverlappingbiasedsequences,yetneithercanbeextendedwithoutfallingbelowtheminimumbias:e.g.Consider(A)-rich,biasf=0.6andsequence‘ATAT...ATAT’;anyinstanceof‘ATATA’hasthreeAsoutoffivecharactersandsatisfiestheconditionbutnoextensiondoes.Thedefinitionofalocal,ratherthanaglobal,biasconditionisthereforeproblematic.Howeveradivideandconquerapproachwouldusuallyfindregionsofinterestinpractice:FindaglobalLBI,andrepeatrecursivelyonthesectionsofthesequencetoitsleftanditsright—O(n·logn)-timeonaverageifthesplitsareoftenreasonablyeven,O(n2)-timeatworstifthesplitsarealmostalwaysunbalanced.

REFERENCES

Bentley,J.(1986)ProgrammingPearls.Addison-Wesley,Reading,MA.

Gardner,M.J.,Hall,N.,Fung,E.,White,O.,Berriman,M.,Hy-man,R.W.,Carlton,J.M.,Pain,A.,Nelson,K.E.,Bowman.,S.etal.(2002)GenomesequenceofthehumanmalariaparasitePlas-modiumfalciparum.Nature,419,498–511.

Ruzzo,W.L.andTompa,M.(1999)Alineartimealgorithmforfind-ingallmaximalscoringsequences.7thInt.Conf.onIntelligentSystemsforMolecularBiology(ISMB).Heidelberg,Germany,pp.234–241.

1295

因篇幅问题不能全部显示,请点此查看更多更全内容

Top