[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ale] Syntax problem



Thanks everyone for the help!  The syntax becomes less important with the new info on the different commands.  The reason I asked the question is I am learning how to control our cluster.  I need to learn the Torque/Maui batch queuing setup.  However, the mpirun script works fine but just doesn't clean up very well if something goes wrong.  TeamHPC has a mpiexec script to wrap with Torque and Maui that is supposed to clean up node processes but I haven't learned the system yet.

There is a "fornodes" script that will run a command on all the nodes, with which I was trying to pass the command y'all have been working on.  I found that passing the command to ssh didn't work but it would work when ssh'd in at the shell prompt.  So, I knew I was misquoting the command syntax somehow.  Thanks for all the options!

Just to pass on some good news:  The cluster is Opteron 250 CPUs in dual CPU 1U configuration.  We have 19 slave nodes and one head node.  The switch we have is a SMC 8624T.  The special fast interconnect is the Ammasso GigE RDMA 1100 NIC.  Using the molecular dynamics package NAMD, I've found that with the NAMD benchmark input file of ~92K atoms, the apoa1 protein, that we can get using Ammasso MPI libraries a full nanosecond of simulation time in 0.96 days.  If I try to scale up using just standard TCP communications and not use the Ammasso MPI, we can do  a full nanosecond in 1.31 days.  What this means is that with our smaller cluster of 19 nodes we are still benefitting very well using the fast interconnect from Ammasso.  When you think that we will save 462 minutes per nanosecond simulation time and we need a lot of nanoseconds to actually see changes, the fast interconnect shows it's worth.  I've compiled using gcc 3.4.3 so once I get recompiled with the Pathscale compiler, we should improve about 10-15%.  The larger the cluster, the more important the interconnect cost and latency becomes.  Our budget was small enough that we couldn't justify the expense of the typical fast interconnects like Infiniband or Myrinet.  We needed something cheaper and Ammasso came in at $450/GigE card and works as a GigE card or as a fast RDMA low latency interconnect simultaneously.  Jeffrey Layton was the guy that pointed me to the Ammasso product and I am so glad he did.  We spent about 1/7th of our budget on the interconnect instead of 1/3 because of this.  I would say that our cost per node for the interconnect was about $500 when you account for the GigE switch.

I'll quit rambling and get to work,
Dow




-----Original Message-----
From: "James P. Kinney III" <jkinney at localnetsolutions.com>
To: ale at ale.org
Sent: Aug 24, 2005 4:25 PM
To: Atlanta Linux Enthusiasts <ale at ale.org>
Subject: Re: [ale] Syntax problem

On Wed, 2005-08-24 at 16:00 -0400, Randy C. Ramsdell wrote:

> > > I realize that, but it kills the process never the less. But, kill
> > > doesn't take into account multiple processes whose name is namd. I
> > > thought he said kill all PIDs named namd.   
> > 
> > Hmm. My kill seems to require PIDs and not names. 
> 
> As does mine. I am not sure, but maybe what I wrote isn't clear. I think
> you may have thought I meant to leave the "pidof" command in place.
> After reading my paragraph, I can see that I was not clear about that.
> Anyway, both iterations work. One kills a single PID, using pidof, and
> the other kills multiple processes called "namd", without pidof. 

Yes. killall -d namd  will do the same thing as kill -9 `pidof namd` .
pidof generates a list of all the PID's for a named process. kill can
take a space-delimited list of PID's. 

example:
#/sbin/pidof spamd
29061 29060 29059 29058 29057 29000 16155

(need to find a way to fix this as spamassassin is used by evolution and
it keeps spawning new children on each use until the system runs out of
memory)

#pgrep spamd
16155
29000
29054
29057
29058
29059
29060
29061

So the output of pidof is suitable for dumping into kill directly. But
pgrep is better for dumping into a shell script in a for loop.

But as was posted earlier, pkill and pgrep look to be even more
efficient and have some other tools (user ID checking of PID's) that
make it even more useful.
> 
> > So to restate the above ssh string:
> > 
> > ssh machine "killall -9 namd"
> > 
> > would also work.

-- 
James P. Kinney III          \Changing the mobile computing world/
CEO & Director of Engineering \          one Linux user         /
Local Net Solutions,LLC        \           at a time.          /
770-493-8244                    \.___________________________./
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7


No sig.