Correlation Analysis of Spatial Time Series Datasets A Filter-and-Refine Approach
- 格式:pdf
- 大小:280.96 KB
- 文档页数:12
CorrelationAnalysisofSpatialTimeSeries
Datasets:AFilter-and-RefineApproach
PushengZhang,YanHuang,ShashiShekhar,andVipinKumar
ComputerScience&EngineeringDepartment,UniversityofMinnesota,200UnionStreetSE,Minneapolis,MN55455,U.S.A.[pusheng|huangyan|shekhar|kumar]@cs.umn.edu
Abstract.Aspatialtimeseriesdatasetisacollectionoftimeseries,eachreferencingalocationinacommonspatialframework.Correlationanalysisisoftenusedtoidentifypairsofpotentiallyinteractingelementsfromthecrossproductoftwospatialtimeseriesdatasets.However,thecomputationalcostofcorrelationanalysisisveryhighwhenthedimen-sionofthetimeseriesandthenumberoflocationsinthespatialframe-worksarelarge.Thekeycontributionofthispaperistheuseofspatialautocorrelationamongspatialneighboringtimeseriestoreducecompu-tationalcost.Afilter-and-refinealgorithmbasedonconing,i.e.groupingoflocations,isproposedtoreducethecostofcorrelationanalysisoverapairofspatialtimeseriesdatasets.Cone-levelcorrelationcomputationcanbeusedtoeliminate(filterout)alargenumberofelementpairswhosecorrelationisclearlybelow(orabove)agiventhreshold.Elementpaircorrelationneedstobecomputedforremainingpairs.Usingexperimen-talstudieswithEarthsciencedatasets,weshowthatthefilter-and-refineapproachcansavealargefractionofthecomputationalcost,particularlywhentheminimalcorrelationthresholdishigh.
1Introduction
Spatio-temporaldatamining[14,16,15,17,13,7]isimportantinmanyappli-
cationdomainssuchasepidemiology,ecology,climatology,orcensusstatistics,
wheredatasetswhicharespatio-temporalinnatureareroutinelycollected.The
developmentofefficienttools[1,4,8,10,11]toexplorethesedatasets,thefocus
ofthiswork,iscrucialtoorganizationswhichmakedecisionsbasedonlarge
spatio-temporaldatasets.
Aspatialframework[19]consistsofacollectionoflocationsandaneighbor
relationship.Atimeseriesisasequenceofobservationstakensequentiallyin
Thecontactauthor.Email:pusheng@cs.umn.edu.Tel:1-612-626-7515ThisworkwaspartiallysupportedbyNASAgrantNo.NCC21231andbyArmyHighPerformanceComputingResearchCentercontractnumberDAAD19-01-2-0014.Thecontentofthisworkdoesnotnecessarilyreflectthepositionorpolicyofthegovernmentandnoofficialendorsementshouldbeinferred.AHPCRCandMinnesotaSupercomputerInstituteprovidedaccesstocomputingfacilities.time[2].Aspatialtimeseriesdatasetisacollectionoftimeseries,eachrefer-
encingalocationinacommonspatialframework.Forexample,thecollection
ofglobaldailytemperaturemeasurementsforthelast10yearsisaspatialtime
seriesdatasetoveradegree-by-degreelatitude-longitudegridspatialframework
onthesurfaceoftheEarth.D11D21D1
D31D4
1D2
D22D12
D21D12tt
tt
ttt
t
Fig.1.AnIllustrationoftheCorrelationAnalysisofTwoSpatialTimeSeriesDatasets
Correlationanalysisisimportanttoidentifypotentiallyinteractingpairsof
timeseriesacrosstwospatialtimeseriesdatasets.Astronglycorrelatedpairof
timeseriesindicatespotentialmovementinoneserieswhentheothertimeseries
moves.Forexample,ElNino,theanomalouswarmingoftheeasterntropical
regionofthePacific,hasbeenlinkedtoclimatephenomenasuchasdroughts
inAustraliaandheavyrainfallalongtheEasterncoastofSouthAmerica[18].
Fig.1illustratesthecorrelationanalysisoftwospatialtimeseriesdatasetsD1
andD2.D1has4spatiallocationsandD2has2spatiallocations.Thecross
productofD1andD2has8pairsoflocations.Ahighlycorrelatedpair,i.e.
(D12,D21),isidentifiedfromthecorrelationanalysisofthecrossproductofthe
twodatasets.
However,acorrelationanalysisacrosstwospatialtimeseriesdatasetsiscom-
putationallyexpensivewhenthedimensionofthetimeseriesandnumberof
locationsinthespacesarelarge.Thecomputationalcostcanbereducedbyre-
ducingtimeseriesdimensionalityorreducingthenumberoftimeseriespairsto
betested,orboth.Timeseriesdimensionalityreductiontechniquesincludedis-
creteFouriertransformation[1],discretewavelettransformation[4],andsingular
vectordecomposition[6].
Thenumberofpairsoftimeseriescanbereducedbyacone-basedfilter-and-
refineapproachwhichgroupstogethersimilartimeserieswithineachdataset.
Afilter-and-refineapproachhastwologicalphases.Thefilteringphasegroups
similartimeseriesasconesineachdatasetandcalculatesthecentroidsand
boundariesofeachcone.Theseconeparametersallowcomputationoftheupper
andlowerboundsofthecorrelationsbetweenthetimeseriespairsacrosscones.
ManyAll-TrueandAll-Falsetimeseriespairscanbeeliminatedattheconelevel
toreducethesetoftimeseriespairstobetestedbytherefinementphase.We
proposetoexploitaninterestingpropertyofspatialtimeseriesdatasets,namely
spatialauto-correlation[5],whichprovidesacomputationallyefficientmethodto
determinecones.ExperimentswithEarthsciencedata[12]showthatthefilter-
and-refineapproachcansavealargefractionofcomputationalcost,especially