Ecole GridUse 1
23 June 2004γλGrand Large
Calcul Global, Desktop Grids et
XtremWeb
Franck Cappello
PCRI / INRIA Grand-Large LRI, UniversitéParis sud.
fci@lri.fr www.lri.fr/~fci
Grand Large
PCRI / INRIA
γλ
γλGrand Large
Outline
?Introduction :Desktop Grid, foundations for a Large Scale Operating System.?Some architecture issues ?User interfaces (XtremWeb)?Fault tolerance (XtremWeb)?Security (Trust)
?Final Remarks (what we have learned so far)
Ecole GridUse 3
23 June 2004γλGrand Large
Several types of GRID
2 kinds of distributed systems
Computing ?GRID ?
?Desktop GRID ? or ?Internet Computing ?Peer-to-Peer systems
Large scale distributed
systems Large sites Computing centers,Clusters PC
Windows,Linux
?<100?Stables ?Individual credential ?Confidence ?~100 000?Volatiles ?No
authentication ?No
confidence
Node Features:
γλGrand Large
DGRid for
Large Scale Distributed Computing
?Principle
–Millions of PCs –Cycle stealing
?Examples
–SETI@HOME
?Research for Extra Terrestrial I
?33.79 Teraflop/s (12.3 Teraflop/s for the ASCI White!)
–DECRYPTHON
?Protein Sequence comparison
–RSA-155
?Breaking encryption keys
Ecole GridUse 5
23 June 2004
γλGrand Large
DGrid for
Large Scale P2P File Sharing
?Direct file transfer after index consultation
–Client and Server issue direct connections
–Consulting the index gives the client the @ of the server ?File storage
–All servers store entire files –For fairness Client work as server too.?Data sharing
–Non mutable Data –Several copies no consistency check
?Interest of the approach
–Proven to scale up to million users
–Resilience of file access ?Drawback of the approach
–Centralized index –Privacy violated
user B
(Client + Server)
index
file-@IP Association
user A
(Client + Server)γλGrand Large
DGrid for
Large Scale Data Storage/Access
Providing Global-Scale Persistent Data
Distributed Data Integration
Projet Us
Ubiquitous storage
?Principle
–Millions of PCs
–“Disk space” stealing ?Storing and accessing stored files on participant nodes.
Files are stored as segments.
Segments are replicated for availability.?Collecting and integrating data coming from numerous devices
Freenet
Intermemory
Ecole GridUse 7
23 June 2004
γλGrand Large
DGrid for Networking
A set of PCs on the
Internet (Lab networks)Coordinated for
Networking experiments
A set of PCs on the Internet (ADSL)coordinated for measuring the communication performance NETI@home
Collects network performance statistics from end-systems
γλGrand Large
PC
PC
Parameters
Client application Params. /results.
Network
PC
Coordinator/Resource Disc.
Computational/Networking DGrids
?Computational Applications
–
SETI@Home , https://www.doczj.com/doc/b7993792.html,, –Décrypthon (France)
–Folding@home, Genome@home, –
ClimatePrediction, etc.
?Networking applications
–Planet Lab (protocol design, etc.)–“La grenouille”(DSL perf eval.)–Porivo (Web server perf testing)
?Research Project
–Javelin, Bayanihan, JET, –Charlotte (based on Java),
–Condor, XtremWeb, P3, BOINC
?Commercial Platforms
–Datasynapse, GridSystems –United Devices, Platform (AC)–Cosm
A central coordinator
schedules tasks/coordinate actions on a set of PCs,
For which application domains DGrids are used?
Ecole GridUse
9
23 June 2004
γλGrand Large
?Communication Applications
–Jabber, etc. (Instant Messaging)–Napster, Gnutella, Freenet, Kazaa, (Sharing Information)–Skipe, (Phone over IP)
?Storage Applications
–OceanStore, US, etc. (distributed Storage)
–Napster, Gnutella, Freenet, Kazaa, (Sharing Information)
?Research Projects
–Globe (Tann.), Cx (Javalin), Farsite, –Pastry, Tapestry/Plaxton, CAN, Chord, –XtremWeb
?Other projects
–Cosm , WebOS, Wos, https://www.doczj.com/doc/b7993792.html,, –JXTA (sun), PtPTL (intel),
Service Provider
Service provider
Resource Discovery/Coordinator
Network
Client
req.
Communication/Storage DGrids (P2P)
A resource discovery/lookup engine establishes:
-relation between a client and server(s)
-a communication between 2 participants
For which application domains DGrids are used?
γλGrand Large
Historical perspective
I-WAY
Globus, Ninf, Legion, Netsolve
GRID
CLUSTER
CALCUL
GLOBAL P2P
Cycle stealing Condor
https://www.doczj.com/doc/b7993792.html,
SETI@Home Internet Comp.
Javelin,CX,
Atlas, charlotte
Distributed Systems NOW
Beowulf
Napster, Gnutella,Freenet
Meta-Computing
XtremWeb research
1999
Time
DataGrid, GGF
Today
OGSA COSM
DHT : Pastry, Tapestry,Can Chord
XtremWeb
Cluster of Clusters
BOINC
1995
P3
WSRF
XW2
Ecole GridUse 11
23 June 2004γλGrand Large
Outline
?Introduction : Desktop Grid, foundation for a Large Scale Operating System.?Some architecture issues ?User interfaces ?Fault tolerance ?Security (Trust)
?Final Remarks (what we have learned so far)
γλGrand Large
Allows any node to play different roles (client, server, system infrastructure)
Request may be related to Computations or data
Accept concerns computation or data
Client (PC)
request result
Server (PC)
accept
provide Coordination/Match making/Sheduling/
Fault Tolerance
Server (PC)
accept
provide
Potential
communications for parallel applications
Computational Desktop Grids
Client (PC)
request
result
A very simple problem statement but leading to a lot of research issues:
scheduling, security, fairness, race conditions, message passing, data storage Large Scale enlarges the problematic: volatility, confidence, etc.
Ecole GridUse
13
23 June 2004
P e r f o r m a n c e
P e r f o r m a n c e
Rank
A study made by IMAG
Performance of participant team according to their ranking
Follow a Zipf law : Performance (rank) = C/rank àlaw of the 90%/10%
Up to 4 orders of magnitudes between the extremes
Rank
Credit O. Richard, G. Da Costa
γλGrand Large
Some facts about the users:
N u m b e r o f c o n n e c t e d u s e r s
A c c u m u l a t e d n u m b e r o f u s e r s
Date Class number
A study made by IMAG
User characteristics of French ADSL
One week of the ADSL as seen by Lagrenouille
Connection increases during daytime up to a maximum
Users are behaving differently (if we consider a vector of hours (1 if connected,0 if not), the 4373 users of Jan 7 belong to 1710different classes !!!)
noon 2PM
S u n d a y
More 4373users
Classification considering User connection vector +hamming distance of 1Between user conn. Vect (508 classes)
Credit O. Richard, G. Da Costa
Ecole GridUse
15
23 June 2004
A study made by IMAG
Number of users is quite stable across the week days In 1 hour up to ? of the users may change User connection frequency to ADSL is quite disparate among the users
40!
34
Number of connected hours for a user
2.24.3Number of connected days for a user
4321402Number of different users per hours
2034175Number of different user per day Standard Dev.
Mean
One week of the ADSL as seen by Lagrenouille
Credit O. Richard, G. Da Costa
γλGrand Large
Architecture
Fundamental components:
Client
Client Agent
Worker Engine
Agent Worker Engine interaction mode: PUSH or PULL (non permanent connection vserus connected).
Pull job
Push job
XtremWeb Datasynapse Condor
Fundamental mechanisms and Scheduling modes
Res. Disc.
Coordinat.
Res. Disc.Coordinat.Centralized or distributed
Transport layer: Firewalls, NAT, Proxy :
?XtremWeb, BOINC: Ad Hoc ?P3 : Jxta)
PC
PC
Resource
Internet
Firewall Firewall
Tunnel
Infrastructure
Ecole GridUse
17
23 June 2004γλGrand Large
Architecture
Data transfer mode: P2P, Data Server or Coordinator
Submit job
get job
get data
Submit job get job
get data
Put Data
Submit Job+data
get
Job+data
Datasynapse P2P systems
BOINC
XtremWeb
What kind of resources can be harnessed?
PC Dual-PC PC Cluster 1 agent X threads
1 agent 1 thread Agent compliant With 3thd party scheduler
Data transfers and Resource types
γλGrand Large
Resource Discovery/Coordination
Action Order
Resource discovery and orchestration
Resource Discovery:
1st Gen: centralized/hierar. 2d Gen: fully dist. 3rd Gen: DHT
peer
peer
peer
peer
GET file
S e
a r c h q u e r y
Search query
S e a r c h q u e r y P e e r
I D P e
e r I D 0
1
2
3
4
5
6
7612Start Interv Succ 2 [2,3) 33 [3,5) 35 [5,1) 0
Start Interv Succ 4 [4,5) 05 [5,7) 07 [7,3) 0
Action coordination:
Centralized Fully dist.
Submit
jobs
Res. Disc Action Order
Action Order
Action Order
client client
Ecole GridUse
19
23 June 2004
γλGrand Large
The Software Infrastructure
of SETI@home II
David P. Anderson
Space Sciences Laboratory
U.C. Berkeley
Goals of a PRC platform
Research lab X
University Y
Public project Z
projects
applications
resource pool
?Participants install one program, select projects, specify constraints;all else is automatic
?Projects are autonomous
?Advantages of a shared platform:
?Better instantaneous resource utilization ?Better resource utilization over time
?Faster/cheaper for projects, software is better ?Easier for projects to get participants ?Participants learn more
Distributed computing
platforms
?
Academic and open-source
–Globus –Cosm
–XtremWeb –
Jxta
?Commercial
–Entropia
–United Devices –
Parabon
Goals of BOINC
(Berkeley Open Infrastructure for Network Computing)
?Public-resource computing/storage ?Multi-project, multi-application
–
Participants can apportion resources
?Handle fairly diverse applications ?Work with legacy apps
?Support many participant platforms ?
Small, simple
Credit: David Anderson γλGrand Large
Anatomy of a BOINC project
?Project:
?Participant:
Scheduling server (C++)
BOINC DB
(MySQL)
Project work manager
data server (HTTP)App agent App agent App agent data server (HTTP)
data server (HTTP)
Web interfaces (PHP)
Core agent (C++)
Res. Disc +Coordination
Worker/Agent/Engine
Ecole GridUse 21
23 June 2004γλGrand Large
Principle of Personal Power Plant (P3)
Client
Worker/Agent/Engine γλGrand Large
P3: no central component
Res. Disc àJxta
Coordination àClient
Ecole GridUse 23
23 June 2004γλGrand Large
Coordinator
Worker PC
result
request
Coordinator
Client PC
Worker PC
Client PC
result
job
result
job
result
request
For research and production Multi-applications Multi-users
Multi-exec formats (Bin, Java)
Multi-plates-forms
(Linux, Windows, MacOS X )Secured (sandbox + certif.)Fault tolerant
Res. Disc. Coordination
γλGrand Large
Client
Client Coordinat.
Coordinat.Configure experiment
XW: Client
A API Java
àXWRPC (Virtual Rexec)àJob submission àresult collection àMonitoring/control Interfaces
àJava API
àcommand line (script)
Launch experiment Collect result
EP Launcher
EP Result
ClientRegister();createSession();createGroup();
for (i=0;i Ti = createWork();submitTask(Ti); While (!finished) getGroupResult();reduction();closeSession(); Thread Thread Ecole GridUse 25 23 June 2004 γλGrand Large Worker Worker Coordinat. Coordinat.WorkRequest workResult hostRegister workAlive XW: Worker Protocol : firewall bypass RMI/XML RPC et SSL authentication and encryption Applications àBinary (legacy codes CHP en Fortran ou C) àJava (recent codes, object codes)OS àLinux, SunOS, Mac OSX,àWindows Auto-monitoring àTrace collection γλGrand Large Worker Architecture Job Pool Producer/Consumer Activity Monitor Trace logging Activity Trigger Execution launcher Sandbox Multi-processors Communications Fault tolerance Duplex comm. Toward Coordinator Job Results Execute waiting Jobs Use/release the Computing resource Toward coordinator 4 parallel threads Ecole GridUse 27 23 June 2004 γλGrand Large XW: coordinator architecture Worker Requests Client Requests Data base set Applications Tasks Results Statistics Task selector Priority manager Scheduler Communication Layer XML-RPC SSL TCP Tasks Result Collector Results Volatility Detect. Request Collector γλGrand Large XtremWeb Software Technologies Installation prerequisites : database (Mysql),web server (apache), PHP, JAVA jdk1.2. Database SQL PerlDBI Java JDBC Server Java Communication XML-RPC RMI SSL http server PHP3-4 Installation GNU autotool Worker Client Java Ecole GridUse 29 23 June 2004 γλGrand Large Demo Distributed Image rendering by raytracing with PovRay PC PC Parameters Network PC Coordinator/Resource Disc. Submission web page Submits 100 jobs Select parameters Decompose Image rendering In 100 tasks 100 image blocs Select parameters Display image Compose Image from 100 blocs àDemo! γλGrand Large XtremWeb Application:Pierre Auger Observatory Understanding the origin of very high cosmic rays:?Aires: Air Showers Extended Simulation –Sequential, Monte Carlo.Time for a run: 5 to 10 hours (500MhzPC) PC worker air shower Server Internet and LAN PC Client Air shower parameter database (Lyon, France) XtremWeb Estimated PC number ~ 5000 ?Trivial parallelism ?Master Worker paradigm Ecole GridUse 31 23 June 2004 γλGrand Large Deployment example Internet Icluster Grenoble PBS Madison Wisconsin Condor U-psud network LRI Condor Pool Autres Labos lri.fr XW Client XW Coordinator Application :AIRES (Auger)Deployment:?Coordinator at LRI ?Madison:700 workers Pentium III, Linux (500 MHz+933 MHz)(Condor pool) ?Grenoble Icluster: 146 workers (733 Mhz), PBS ?LRI: 100 workers Pentium III, Athlon, Linux (500MHz, 733MHz, 1.5 GHz)(Condor pool) γλGrand Large WISC-97WL-113G-146WLG-271WLG-451 00 time (hours)4 hours 2 hours Node utilization during the experiments Performance evaluation 500 Ecole GridUse 33 23 June 2004 WISC-97 WL-113G-146WLG-271WLG-451 Task execution time (sorted in decreasing order) PIII-733Mhz -> 18 min. WISC-97/PIII 551 WISC-97/PIII 900 Time (minutes)20 40 Task execution time γλGrand Large Result arrival time form the first one Number of results 1024 Tasks executed correctly +18 minutes to get the first result 1024500 04h 2h WISC-97WL-113G-146WLG-271WLG-451 Between 1 to 4 hours to obtain the 1024 résults Speed-up G-146 : 126.5 Performance evaluation Ecole GridUse 35 23 June 2004WLG-300/150 faults WLG-270 Execution with massive fault (disconnection of 150 nodes) Loss of the icluster Delay to get the first results (because of the massive fault) Node utilization Result arrival time (minutes) γλGrand Large Outline ?Introduction : Desktop Grid, foundation for a Large Scale Operating System.?Some architecture issues ?User interfaces ?Fault tolerance ?Security (Trust) ?Final Remarks (what we have learned so far) Ecole GridUse 37 23 June 2004γλGrand Large User Interfaces for DGrids There is a demand for applications expressed as: ?Bag of independent tasks ?Workflow (imperative program execution)?Dataflow Graph (data triggering the execution)àAlso use existing applications without modifications There is a demand for several user interfaces :?Batch scheduler like ?Programming API –RPC -like (non blocking), Master Worker, recursive –MPI àMPICH-V , γλGrand Large àapplication + command line +file archive +stdin ?result file archive + stdout+ stderr XtremWeb execution Model Working directory New files PC Client PC Worker Diffs SSL Cloning SSL Internet Client job submission: No shared file systems! àSending messages to servers Ecole GridUse 39 23 June 2004γλGrand Large XtremWeb Batch mode xwhelp xwformat [ html | csv | xml ] (specify output format)xwapps (list installed applications) xwaddapp new application) xwrmapp (get the workers list) xwstatus [jobUID [...]] (retreive user jobs status) xwget | --xwresult [--xwnoextract] [--xwrmzip] [--xwerase][--xwoverride] [jobUID [...]] (retreive user jobs results) xwremove | --xwdelete | --xwrm | --xwdel [jobUID [...]] (remove user jobs) xwsubmit | --xwjob (create a new job) γλGrand Large XtremWeb low level client API << GridRPC String submitJob(MobileWork job)boolean deleteJob(MobileWork job)int jobStatus(MobileWork job)MobileWork getJob(MobileWork job) MobileResult getJobResult(MobileWork job) boolean deleteJobs(Vector jobs) Vector getAllJobs()Vector getAllResults() Vector getAllResults(Vector jobs)