Jacqueline Chen PANEL “Unleashing the Power of Computing and Data at Scale”

Jacqueline Chen PANEL  “Unleashing the Power of Computing and Data at Scale”


– So welcome, I think we
have an exciting panel today. At Purdue we like to do things at scale. So today we’re going to talk, our panel is going to discuss
compute and data at scale, how can we unleash the
power of this technology. I’ll start by introducing
briefly our panelists and then we’ll jump in. So we have our distinguished
visitor from Sandia national labs, Dr. Jackie Chen. She’s a distinguished member of the technical staff at Sandia, member of the National
Academy of Engineering. She leads a group at Sandia developing drag numerical simulation,
so fluid and combustion, and we have the pleasure to
listen to her talk yesterday, and if you’ve missed it
it’s available online so please go and check it out. She’s also a fellow of the
Combustion Institute and the APS. Next panelist is Carlo Scalo, he’s an assistant professor
of mechanical engineering here at Purdue. His work is on acoustics
and turbulent flow and he’s the founder of a startup
company called High Sonic. Next in the panel is Arressu Ardenaki, she’s an associate professor
of mechanical engineering here at Purdue. She’s interested in complex
fluids and multiphase flow. She has won several awards in Society of Women in Engineering. She’s got career work from the
National Science Foundation and the Presidential Early Career Award that President Obama gave her. John August, he’s our next panelist. He’s an associate professor of aeronautics and
astronautics here at Purdue. He’s interested in
experimental, computational, and theoretical fluid dynamics, runs the whole range. He’s a fellow of ASME and
the American Institute for Aero and Astronautics. And last but not least is
Professor Charlie Bowman. He’s a show water professor
of electrical and computer engineering and biomedical
engineering here at Purdue. His interest is in computational
imaging and sensing and his group developed
the first commercial model-based reconstruction system for medical applications for tomography and he’s a fellow of several societies. So without further ado, let’s
get started on the topic. And I’m going to sit and we’re
going to talk about these, the opportunities and the
challenges that present because of the convergence
of cyber infrastructure, including high performance
computing systems, communications, ability
to do cloud computing, data repositories, together with software that can make use of
these infrastructure and what we want to see is
talk about the future and the opportunities and the
challenges that these present. So I want to organize this in a few themes and looking forward to
thought-provoking comments and maybe disagreements and back and forth between the panelists. Okay, so let’s get started. The three themes roughly speaking, I’d like to discuss the
technology that we have today, what opportunities that enables, the ability to compute at scale, what are the challenges to
democratize this technology, how we can put these tools to
the hands of non-academicians and a few research groups, but to a large group of folks who can make use of
them to benefit society. And what are the challenges in education, so how to use this technology
to develop the next generation of scientists, the next
generation of engineers. So let me start briefly by saying the fastest supercomputer in the world in 1997 was ASCI Red at
Sandia national labs. I’m sure Jackie used that computer. And that was the first
computer to break a teraflow. That’s 10 to the 12th
floating point operations per second and that was in ’97. Today we have teraflow
power our on our desktops. We have millions of those
computers installed in the world. We have the next level up
in terms of simulations, thousand times more powerful, took about 10 years to develop. That was the Road Runner
computer at Los Alamos, and that was in 2008. Now we have about 1000 betaflop
computers in the world, and we’re going to the excess scale. So as the leadership competing
goes to the excess scale, we have a thousand or so betaflop systems, and we have millions
of teraflop computers. And, but my first theme, what I’d like to hear from the panelists, what opportunities does these enable in terms of compute and data. So maybe we can start with Jackie. – So my area of expertise is in combustion and turbulent rafting flows. And I think what the current
about 200 petaflop machines, I think Summit is the world’s
fastest sheet on the top 500 as of a couple months ago,
it’s allowing us to enlarge the range of dynamic range of
turbulent scales that we can simulate to and include
some degree of complexity in terms of multiphysics. And not only can we start
to simulate detailed reaction kinetics that are
relevant to the practical fuels that we use in our cars and
our automotive engines and in power plants that generate electricity. But also, I think having that
kind of capability lets us compute with extremely
high-fidelity scenarios or configurations that are
relevant to both experimental laboratory flames as well as to starting to become relevant to industry. So configurations that
represent processes, combustion processes and IC
engines or in gas turbines. And so this is a really
great opportunity to combine experimentation and high
performance computing and simulation and design these
numerical and physical experiments from the ground
up as a group collectively so that we can glean more physical insights and provide data benchmarks, both computational and
experimental that industry and students and folks from other
institutions can use those data sets and benchmarks
for validating their models or for their own purposes. So things like developing
portals and gateways that are accessible by
the broader community and having access to the data
as well as software tools to manipulate the data would be something that we’re kind of right
starting to do now. – So let me bring Charlie for a second then can we talk about access
to distributed computing, not necessarily the
leadership-type computing that Jackie was mentioning
but having lots of powerful systems distributed around
the world that you can have access maybe in the medical
field and have an impact. – Yeah, sure. And I think Purdue is actually a good example of that. Because I mean, at the risk
of for advertising some of the things we’ve done, we’ve
developed this the cluster computing system, which,
yes, which has very much democratized computing
on campus by allowing individual faculty members
to buy into a kind of an integrated cluster and
really reduce their costs. So cost of computing is really
crucial in terms of getting it out to users and it allows
people that are non-experts to get derive a lot of benefit
and we’ve done that with both CPU-based systems and GPU systems. And healthcare is a good example of where high performance
computing has played a huge role and will continue to play a huge role and where cost is an important factor. So unlimited cost is not
practical on health care. But I think the community
health care has technologies like CT and MRI really
bring together physicists and people who develop algorithms
and computer scientists and biologists and medical
experts to solve problems. And more and more they’re realizing the computation is a really key piece of that. And what you can see across the industry, a key direction is machine
learning and what they call AI, which is playing a huge role. And it’s going to have
huge computational demands and require different kinds
of computing platforms. So it’s definitely the case
because I work closely with various healthcare commercial companies and there’s a huge push to integrate AI into imaging applications in healthcare. And the last thing I want
to leave you with is that inverse problems as opposed
to forward modeling. So with Professor Chen’s work, it really was beautiful work and in simulation of combustion, and ultimately, engineering
problems are to design systems that solve problems. And inevitably you need
to solve inverse problems, which is a particular
interest of mine, of course. So that’s going to be I
think another big push, how you can put this forward models into several larger
computing problem systems that solve a system inverse problems required for sensing and design. – So I guess more on
the controversial side, you mentioned the explosion
of computing power and improving capabilities. And that is a hardware
and hardware technology that is hard to keep up with it. So it’s overwhelming from
a computer physicist’s time to do that. And as Dr. Chen mentioned yesterday, it’s hard to keep up. You need to have a team
of computer scientists. It becomes a different job at some point. And what I think what is more humbling is to think of a supercomputer
perhaps like the human brain, we all know, or they say that we only use a certain percent of it. I have the feeling that our
codes do the same, right. So there’s been a recent, I cannot recall the name of this ETH group who re-engineered completely their codes so that they could use every
single cycle of the CPU. Because otherwise our
standard legacy code’s only used maybe 20%, and for the rest the CPU’s were dormant. And so these are very exciting times, but they’re also terrifying
from our perspective. We’re not from a computer
physicist’s perspective. So there’s a lot of power out there. There’s a lot of potential, but I don’t think we’re tapping into it. And it’s becoming almost
distracting from our work because it’s very hard to keep up. So in a way, it’s a word of
caution that I might raise here. – Yeah, so in some ways, to
take a controversial standpoint can be super computers are getting worse. So for in terms of traditional computational fluid dynamics techniques, we are seeing a dropoff in
per processing element speed. So very large computers are
great for us in the sense of capturing a separation of spatial scales. You get more and more
turbulent skills in a computer simulation, but we have a lot
of problems that are stiff in time and the only way to crack
those problems was a faster and faster programming processing element. And modern computers
are getting a bit slower in the last few years. We’ve seen slower per core speed in order to reach the power consumption. So we need to, like Carlo
said, rethink from scratch the approaches we use to
make use of these hardware because we’re not going
to see America going back to faster and faster
core speeds just because of this cost in terms of power. – I agree with what all what you said. And it’s something that I
wanted to add as we moved from megaflop computations from
in 1970’s all the way to now, talking about exoscale computing. What also matters is talking
about what are the most important and toughest
problems in the world that we want to tackle with. We’ve heard about healthcare. We heard about combustion and
how that’s related to emission and other challenges we are dealing with, but there are two areas that I
wanted to just touch on which exoscale computing can make an impact on. One would be on climate, forecasting prediction of hurricanes, air pollution, ocean pollution. And the current models have
resolutions on the order of 50 kilometers, a 100 kilometers, where they would develop
basically Earth system models, which includes chemistry, physics, geochemical evolution
and everything together. But we now know clouds, low clouds, convection processes, ocean eddies, all of those would also contribute
to those climate models. So it becomes important to
use these exoscale computing powers to go to resolutions
down to even one kilometers or even below to include
some of those effects. One other area that I wanted
to mention is on biology of where resolving different
scales becomes very important going from atoms to DNA to
cell to organ to organisms. And understanding these
metabolic processes, these cellular processes and
these complex interactions become important. It has been only a few
years ago where million atom computations in biology being possible, even those have been possible
for materials science or other areas decades before, and
that’s because we still need mathematical tools to be
able to include these complex processes and just to add a
a question on that regard is that what they need is basically
rigorous course ingraining to off-scale these processes. How are they in that regard and
what are the challenges that we are facing in developing
those course grain models? – Yeah so it seems clear
that we need teams of people. And the national labs are
particularly good at this, where they can put together
teams of computer scientists, domain experts,
experiments for validation, and that’s really what would
allow us to solve these problems that we’re discussing
and my field is materials science and we have the
same multi-scale problems, the same time scale problems
where you cannot easily paralyze time, which is
what John was mentioning. But there’s new algorithms
coming up where you can use statistics methods to achieve that, parallel replica approaches and whatnot. So it’s really a combination of hard work, new algorithms, new ways of
thinking about old problems that’s going to really
help us make use of these leadership-type computers. But moving on maybe to the
second theme in terms of democratizing access to these tools, so as we have these themes,
we have experts in computer scientist and domain experts, I’d like to think a little bit
or discuss a little bit about the end users. So I think if I have to do a
self-criticism of our field, we end up often being the
end users of the products, and the leadership-type computing
serves a relatively small group of people around the world, and we have thousands of
beta-scaled systems and millions of tera-scale systems that
could be better utilized. We all have smartphones and
you don’t need a manual to use them and it’s very
sophisticated technology. You don’t lose files, you don’t organize, you don’t have to create
directories by hand the way our students do when they
organize HPC systems. So there’s a whole set of
technology in the commercial sector even microservices,
Netflix and a bunch of companies developing very sophisticated
systems that I think our field could benefit from and
make us better developers, better users, and also be able
to transfer our technology to end users that may be engineers, may be doctors who can benefit
from these tools without knowing the inner workings of
knowing the physics and the application the same way
one can drive a car without knowing the inner workings of the engine. So thoughts on that
democratization of the tools? – So I think I completely
agree with what you said. I think it does take a team of experts, possibly interdisciplinary
experts to run on these hero types of machines. But the data that’s generated
and our communities that are very much doing this already, like the climate community
code are generating data that many, many people from
across the world can use. And there are machine learning
tools and AI tools that are allowing people to develop
surrogate models or digital twin models that are far simpler and
less complex and smaller and faster to run once they’re
trained on a few selective experimental datasets or hero simulations. And so using those on more
ubiquitous machines like terascale or petascale
machines in a few years, would open up access to the
data information and allow much larger parametric sweeps and
optimization problems that for example industry would care
about if they’re trying to design a product or an engine in my field. So I think having those tools, and then having workflows. I mean part of the problem with
sharing data has been in the past that composing dynamic
workflows and bringing the community on board only works
if the software’s not buggy. I mean most people are willing
to give it a try but if they’re stuck with things not
working as they’re supposed to they’ll only do it once or
twice and then they move on and give up. And so having hardened software
tools and like you said there’s a lot to be learned
from the commercial sectors that are already doing that
and data as a service and all thing kind of thing and applying
it to the science fields is something that we need to kind
of get a better handle on. – Let me make a point that
we’re going to have questions from the audience, especially students. So start thinking about those questions. We’d like to open the door in a minute. – Yeah I’d just like to pick
up on the comment about the machine learning cause I
totally agree with that. I gave a talk a week or
two ago and the title was “Fear of the Deep” which deep learning, everybody knows about. And it’s been amazing to me. Very surprising that for a lot
of problems where I thought you needed highly accurate
physical models that these machine learning methods,
particularly deep learning, can often replicate performance
fidelity that’s comparable and when you consider
how much faster it is, then you can incorporate more facts and you can get actually better results. So in other words you can
basically train the machine learning models with a
higher-fidelity model so that in practice it may give you
higher fidelity than a physics-based model. So I mean that’s sort of all shaking out, and maybe it won’t quite work
out that way but I think that we need to be sort of cognizant
of that and how it can be leveraged, maybe where
we have as you suggested, very high-fidelity models that
run on a few computers and maybe that data is distributed
widely for use in training machine learning models. I’m not exactly sure but it
just seems like this is kind of a game-changing technology. – Right. And I think we’re
still kind of in the early days. And maybe incorporating more
physics-based informed machine learning to constrain it
and also take care of, there’s new math methods to
help identify rare events or extrema, data out in the tails of PDF’s. So all of that still needs to be done. – I think people are still
grappling with how to incorporate physics in these machine learning
models and there’s likely to be a lot of innovation in that space. – So to add on what Jackie and
Charlie said and play devil’s advocate here, what sometimes
I think about is we rely very heavily on machine learning
techniques and treat physical processes as black box and come
up with surrogate model and not including physics-based model. Are we losing our fundamental
understanding of physics of more complicated processes? And so I think it becomes
very important to look at both aspects together as we evolve
on newer and newer methods. – To add to that, machine
learning basically interpolates between physical model cases
that you have computed, so if you step outside the box, you’ve calibrated your
machine learning model, your model will not be correct
and you have no idea what the error bounds are on a prediction
that’s outside the box and could be anything. So machine learning in that
sense is extremely dangerous and if you use that for something
critical like medical applications, I’m a little bit
worried about like diagnosis by machine learning. – Society has done many
extremely dangerous things identifying really what’s called for. – So why not another? – I totally agree with what
you’re saying and that’s why I titled this talk Fear of the
Deep but you can’t on the other hand ignore the reality. The other comment I wanted
to say is I have a sign in my office that says deep learning
leads to shallow thinking. Physical science has for
a long time been very experimentally oriented
in much of its endeavors, and but information science
has been much more kind of theoretical-analytical and
it’s moving away from that towards more experimental
information-based science. And that’s gonna be something
we’re gonna have to get more and more comfortable with. – Yeah and I think if the data
was out there you could train these models with data that’s available. Machine learning can actually
tell you about outliers and about the model like Jackie
was saying that for whatever reason is giving you the wrong values. And it seems to me that
as a research community, we spend sometimes millions
of dollars and all that data ends up in a PDF in a paper
that’s not discoverable. You cannot query and it really
contributes very little to the knowledge and now we have
the infrastructure to actually do this and I think that would
allow us to train the machine learning models better and
also assess uncertainties and assess where we’re extrapolating
in dangerous territory. – So one aspect of machine
learning that I’m enthusiastic about is using it to
find surprising things in large data sets. So one of my problems is
I have large data sets. I had to recently did the
calculation with a restart file for a computation with 3.2 terabytes. And every timestep I regenerated that. And I couldn’t possibly save
all the data so I have to take a guess before the computation what will be interesting and save that. A machine learning algorithm
could tell me well maybe you should be saving this. And that’s I think is extremely valuable. – [Host] That should be the inline. Jackie mentioned today, yesterday. – Yeah I think computational
steering is a really nice use of machine learning and
things like anomaly detection. Taking advantage of information
for example in the higher moments beyond the mean and
standard deviation might allow you to identify rare events or something that is an anomaly. And that might steer increased
IO or in our combustion field maybe it tells me I need
to inject more fuel to get things to light up faster or something. – Related to the democratization
of high performance computing, there’s clearly a spectrum of problems you can run: things that are called
embarrassingly parallel, and then the end users could be, which includes part of machine
learning and then say Monte Carlo simulations or data mining, and then all the way to something
very technically specific that might have a lower impact
such as Eigenvalue solvers, solve the massively parallel
architectures or even DNSNL. That might seem obscure to most
of the community out there. And so when it comes to
democratization I think there’s a good chance of making HPC resources more accessible and democratic. If you think of cloud computing
where there’s that level of abstraction where the end users, which could be companies, which could be somebody on an iPhone. In the future they don’t have
to worry about the hardware. There’s a layer of abstraction. But then that’s one side of the spectrum. On the other hand of the spectrum
there’s the computational physicists that have big data fever and they wanna dig into the hardware. They wanna optimize their
code to the last bit. And so I think it is
important to keep in mind that there’s a spectrum. – It kinda goes along with
software stack from the application on down to the hardware. – Yes. And so I think to answer
the question how and when HPC can be, will it ever be more democratic and more accessible? It depends. You need to work
your way from the spectrum, from one end to the other of the spectrum. But I think we’re still, I don’t think that the
end users are still, there should be no reason where
the iPhone couldn’t make use of cloud computing nowadays, but I don’t think we’re there yet. – So I have to put a flag
here for nanoHUB by the way, which have about 500
simulation tools that are all web-enabled, that run on HPC
resources but you can run a simulation with a few
clicks from your iPhone or from your tablet. – [Panelist] This was not planned. – We didn’t plan for this. – [Jackie] So can you say
something about how it, was this organically grown
from the ground up or? – This grew up out of an
idea that Mark Langston had a request from a colleague who
wanted to run a simulation and he had the student put it
up and make it available. And then a combination between
Mark’s idea and some folks at the National Science Foundation, who had the guts to see, think big and say this could
be turned into something that’s a global cyber infrastructure
and now it serves millions of visitors every year and tens
of thousands of simulation users including classroom
usage of really sophisticated tools that are simplified in
terms of the interaction and abstracting away all the computer science, the algorithms. You need to know the
physics to execute it. Let’s see if we have a
question from the audience. Any questions from
faculty or students? Tim. – Second, nanoHUB’s great. I wanna say all of democratization
through history went from roads Greek computer says maroon, and then went it came out to the people. With democratizing technology
and all throughout history went from three computers in
the world take up a whole room, no one could see exactly how
that would change a lot of people’s lives as we went along. And that trend has continued. So it’s inherently hard to predict, but I’d like to kind of put it
to all of your expertise and imaginations, how do you think
democratized petascale and maybe even someday exoscale
computing to everyone might open up things that are hard to
imagine now but could bear a really deep impact on civilization? – So we gave each of you a crystal ball. I’d like you to pull it out at this point. – Someone else start. – Maybe I can continue. If you
look at how cloud computing is done and I did some research
on that yesterday because I was not involved in cloud computing. I work with my own little world
as a computational physicist with my cluster and classic
clusters and architectures and the first thing I thought
without reading anything is like oh my God, heterogeneous architectures. A computation is run a
little bit in Africa, a little bit in Asia, a little bit in Japan and then
data’s collected together. I don’t wanna deal with that. But obviously there’s a
nice layer of abstraction. You don’t see it. What is presented to you
is a virtual machine. So you don’t even know what’s going on. It’s Amazon’s problem or
somebody else’s problem. So can exoscale reach that
point where the user doesn’t know, doesn’t wanna know
and should not know that? Probably, they already do it
with cloud computing and I would have said that was
impossible to do but. So future’s bright I guess. – Going back to what Tim said, I think it’s great to do
the democratization and have everyone access and unleash
the power of exoscale computing for many different aspects
that we can’t even imagine now, but again playing devil’s advocate here, what if also people who don’t
have good intentions access that and use it for things
that are not good for humanity? So that’s also the other
side of it to think about. – [Host] That would never happen. – I think security authentication
will become even more important as we move on into the future. – From a lab standpoint, I
was checking the amount of compute power in the commercial sector and in cloud service providers is increasing. And I don’t have a crystal
ball but you would imagine that the times in which national
labs completely dominated HPC might be passing and there
may be serious competitors in terms of HPC in the commercial sector. And so do you imagine, this
is true also for companies, what would be required for an
organization like a national lab to run maybe not super
sensitive computing outside in cloud resources and if anyone
has some insight from a commercial standpoint,
from a company standpoint, what are the barriers to doing some of this computing in the cloud? – Well I think some algorithms
are more amenable to cloud computing where you have
looser connections between processes where they’re
embarrassingly parallel and there are other problems like partial solving systems of
partial differential equations that require much tighter coupling that probably won’t happen very
well necessarily in the cloud unless they have
very faster interconnect. – I think they are developing those. – Yeah, and I should say one
of the things that has to happen is even as computing
gets faster and faster, a thousandfold every couple of years, the networks also have to
increase in speed to accommodate the higher throughput and bandwidth. So right now things like the
Energy Science Network that DOE puts up is maybe a couple
hundred gigabits per second in terms of data transmission
within a large part of the US and to Europe. But I think as more
people hop on to this and do computing in the
cloud or streaming data, you’re gonna have traffic in
the network that’s gonna be the bottleneck and not the computing itself. So that’s another infrastructure issue. – There’s an interesting challenge in, for example in healthcare
and MR and CT reconstruction. People have been talking about
using the cloud for years but the big challenge there is you
have huge data sets that have to be moved around so there’s a big, so it’s sort of moving in that direction. I think that you’ll start to see maybe, sort of local servers in
hospitals that do that sort of thing so they get better utilization. The disadvantage of
having embedded computing, high performance embedded
computing with each device is that that thing is sitting
idle most of the time. So it’s very inefficient. But the advantage is that you don’t have to move so much data around. One example of, it’s had big
impact is on our cell phones with speech recognition. Siri and what’s the other one, I forget. But anyway that’s an example
of combination of local and cloud computing that’s enabled
a very impactful application, but the challenge there is
to balance the amount of communication and also the
security issues of moving. People don’t necessarily want
their secure data distributed all over the world. So it’s gonna be a real challenge and cost right now I think is high. For instance we’ve looked at
cloud services on Amazon for secure computing and it’s very expensive. – I think it also opens up the
door for lots of interesting, innovative data compression
methods and methods for multi-resolution
visualization, for example, and reconstruction of data
so that you can ship small packets and then only
the critical information. – [Panelist] Clossey methods
which identify criminal information and just transfer,
or even parameters of it. – That’s probably interesting
because you need those type of techniques also in leadership-type
computing because of the fact that you cannot store it. And so maybe there’s an
opportunity there where you can use similar techniques on both
ends where you do some local processing to ship the data out
to do more powerful analysis in the cloud and then ship
them back in post-process. Similar to what you do in
leadership-type environment. – This communications problem
actually brings in also the question of reliability of distance. So if you’re looking in
millions of processors and your chance of losing a processor
is one in a million, is it probably gonna lose some processors? So being able to proceed with
a calculation having lost some of it and maybe doing some kind
of error correction to keep up might also be a big breakthrough. It would also help with the communication. – I think along those lines
developing math algorithms that are resilient to failure or
iterative methods can pick up, can keep progressing when there’s errors or asynchronous methods. – [Panelist] So some combination
of local computations of a speculatively and in a correction from the global network of computation. – Yeah I think we have another
question from the audience. Another student I think. – Always a student. So I was
fascinated by the two bookends that have come up in the discussion. One on the side of accessibility
and pervasive computing and accessibility, and on the
other side you’re talking leadership computational tools. And I couldn’t but help
think of the path that most disruptive technologies have followed. And one that I’ve been
reading about recently is gene drives for example. One interesting thing that you
see is the development of the what you would call
leadership kind of systems, really expansive, just heroic
computations kind of stuff, is always sold based on potential impact to key really important problems. So gene drives are very much sold as gee, we might be able to create a
mosquito that knocks off all the other mosquitoes that spread malaria. So there’s a grand challenge problem. Everyone gets it. Investment follows, but
then as gene drives, prices come tumbling down, accessibility becomes easier, everyone is able to use them to their locally-defined challenges. So it’s probably very clear
that as high performance computing, whether
cloud-based or otherwise, hybrid systems, that it goes
down to the hands of users will define our own ways in different
places to drive change and transformation, but at the
leading edge of it, exoscale. All of you have been in
situations talking about it. What are some of those grand
challenge impact areas that you think, to say something like hey, we can get rid of malaria. That level of impact, what
can we say in the different fields that you’re in. Some examples of those kind of impacts. – What can we solve at the exoscale, a few petascale systems? Maybe I’m sure Jackie is particularly- – Well exoscale sounds a lot
but it’s only a thousandfold bigger than petascaling. Just to calibrate. It doesn’t solve world’s problems. But I think what in our field, in combustion what it will
do is like I said before, allow us to incorporate coupled
simulation and experiment to provide as much detailed
information in as relevant as practical engineering spaces
that can inform industry. People designing gas turbines
and engines and then maybe through machine learning
and other types of methods, provide surrogate models and
better modeling that industry can then use to run millions
of calculations to help optimize their design of both the fuel as well as the combustor. – Yeah it’s interesting. I mean
I’m gonna actually push back against that hypothesis a little bit. Which is that I mean we tend
to wanna look for the big application wave, and
they’re very important, but sometimes what really
has impact is just making something simple more efficient. And I think the CPU was a
huge revolution in computing because it sort of
commoditized the computation. We took a lot of complicated
flip-flops and gates and so forth, the bit-slice
processors and we said, okay we’re gonna have this
single generic CPU with a defined instruction set and we’ll just
keep making it cheaper and cheaper and cheaper and so
more and more people have it. A similar thing happened
with machine learning. One of the reasons I think it’s
so hot right now is a lot of the technology’s existed
for quite a while, but they just took some of it, they took a few key modules
that were well-understood and they made them really efficient and fast and cheap and easy to use. So I kinda think with high
performance computing maybe what needs to be done is identify
a few core operations that are sort of very commoditized
and easy to use and then they could be sort of democratized
across a lot of applications and then we’d find out what
the big wins were when people tried a lot of different things. So it’s hard to say. I’m not saying that that’s right but it’s not necessarily wrong. – Along those lines as I say, using these hero machines
with different types of heterogeneous processors,
tensor processing units, GPU’s, CPU’s, what we’re seeing
more of at these large-scale runs is combining physics
runs with machine learning in situ, where you devote the
tensor processing units maybe to convolutional neural
networks and then you use, do your PDE solves on the CPU’s and maybe chemistry solves on the GPU’s. And so as I think you said, pointed out earlier, we’re only using a fraction of
the machine to do the actual science solves and there’s all
this extra real estate on the machine to do these analytics
and machine learning types of things and couple them together. And so to couple it together effectively, you need to have a runtime or
a programming model that kinda sees, sits on top of all of
it and kinda can orchestra holistically the data movement
and which resources different computations should be
computed on dynamically. – [Panelist] I totally agree. – And even having that kind
of software stack and runtime system I think at the leading
edge will trickle down to, well this is a forecast. I don’t know if it will. To more common things that
we’re all gonna be doing and trying to grapple with. – As Sarbin was saying I think
this does trickle down and it changes the way you do computing. – [Jackie] People do things.
Doing data-centric computing, maybe, I don’t like to bash NPI but it sits at a low level and
maybe it’s seen its better days given the changes in
hardware and use modes. – Another question from a student. – So this could comment on two phrases: analog computing, one phrase. And also comment on another phrase, artificial intelligence. – So we’ve talked a little bit
about artificial intelligence with Charlie and maybe we can discuss a little bit neural computing. Neural processing units that are analog. They’re used, they’re on devices today to do parts of the computing. – It’s really hard to say. I mean it’s interesting, I
started my career at Lincoln Lab working in analog signal
processing groups that were using surface acoustic wave devices and SOFETs. But then what happened
is that the wave kind of, the pendulum swung and
people said oh gosh, you wanna digitize things as
close to the sensor as possible and process everything digitally
because speeds were so much higher and so much easier to
program than it is to try to design a hardware device. But maybe that’s gonna swing back. Although if it does, we need
to define very well-defined modules in my opinion that
for which you would be able to plug in that analog computing
and it has to be a big win because the complexity of it is high. As far as AI goes, it’s interesting. We have this huge burgeoning
interest in AI and I think that’s a great thing but really most of it comes down to machine learning. There’s an alternative argument
to make which is that we’re just at the, we’re just
scratching the surface of AI. Now these really hard problems
like object recognition that we thought oh if we
could finally crack that, the AI would be so much easier, but then so we cracked that
problem but now we realize, gee okay so we can
recognize objects in a room, but how do we make decisions
about how to do things? That like humans do and we
suddenly realize once again that there’s a huge number of problems
in artificial intelligence that we have no idea how to do. So I think that we had a big breakthrough, but I think we’re just starting a journey. – Excellent. So maybe as a final theme, we’re at a university. Purdue’s a big, large university. We educate a lot of students. We’d like to reach out to more students and certainly all our alums. How, what are the implications
in terms of education? What are the implications in, not just in educating next
generation of students, how can we use this technology
to make education better? What do we need to teach our students? – I think we should integrate
large-scale computing into the curriculum. So the student who’s going to
school now will probably even if non-specialist will encounter
large-scale computing in the work environment when
that person gets out. And so we need to develop that
sort of skills to sensibly use large-scale computing
when they get out. And I’d like to add again
as devil’s advocate, most large-scale computing
is used very badly. So if you look at the queue on
a large-scale supercomputer, there are a lot of small
jobs running it at one time. That’s a totally waste
of money and electricity. They should be running
very large problems. So learning to use these kind
of resources sensibly should be a part of the curriculum. – And the point I would
make and again to be a bit controversial I remember a
colleague of mine saying, I’m not going to mention who it was, say well we’re trying to teach
these students computational physics and well we opened an exterm. They were all surprised
and they didn’t know what to do with an exterm. And maybe we shouldn’t open
an exterm to begin with. Maybe we need to think about
using better tools and educate students with more modern tools
and not the way we used to do computational science in the 70’s. Creating directories by hand
and then VI and all that. And I love that, that’s how I do it. But I don’t think our
students should do it the same way I did it. – But I think there’s the
preliminary step to be taken. Sometimes we as instructors
have the push from above to for example train students on
commercial software or have them be end users of software. Whereas I would like to teach
coding as soon as possible. And even Python. Why? Python is like reading a book sometimes. So we start with high-level languages. And then we, after they digest
and understand high-level languages we go to low-level languages. And then we can talk about
high performance computing. But we’re missing that preliminary step, I believe, to train the students. Sometimes coding is considered a taboo. There should be I think every degree should have a computer science class. A hardcore computer science
class as early as possible, and I think some degrees
are still lacking that. So I would argue that we
need a step before that. – I certainly think not every
degree has to have a hardcore computer science class
because we can use these tools without having a computer science. I think your programming to some degree, an introductory programming
course is interesting. I don’t know that I don’t
think that everyone, every engineer needs to be a programmer. They certainly need to
be a good expert user of the codes like finite elements. If you’re a mechanical engineer, you have to know how to
program sophisticated code. – But programming is not
just scientific computing. It could be something else. There’s folks in humanities
that there’s programs that combine programming and
algorithmic development with humanitary studies or humanities. And so I think we can
treat a coding education as a more broad. Coding education before high
performance computing, I guess. – That’s something that
universities are missing in our curriculum and large-scale,
massively parallel computations. There are courses of course
offered but it’s not basically very directed curriculum
and including these advanced computing courses, data analysis, and all of those in sequence
where a student gets training to get there to have the knowledge
needed for doing exoscale computing becomes very important. It doesn’t mean necessarily that everyone needs to take those courses. – I think it can be as part
of that course developing the student to have good software
engineering practices also. Regression testing, code
repositories and nothing can replace actually hands-on
writing code as mini projects to actually do it and practice to get better. – So as a consumer of our products, you hire a lot of PhD’s in your group. Over the years, how do you
see a difference in the knowledge and the type of
education that our students have over the years? Do you see an evolution of the
type of background that your staff members when they join Sandia have? – Well I look to students
that have a lot of interest in both the physical sciences as
well as in computer science and computational science and algorithms. So they kinda have, a lot of
them are hackers to begin with and happen to like physical sciences too. – Is it harder to find folks with both? – It’s hard to find people
like that with dual legs. There are a number of universities
that teach that kind of mixed curriculum and so for
example at UT Austin has a very strong program in computational
sciences and Utah. There’s a couple other places
where the students have exposure to computer science, applied math, and the physical sciences. And not just a stovepiped education. – This is been a fascinating
discussion but I have to come back down on the side of
teaching the students computing, and I would say really at the
risk of being controversial, all engineering students
should learn to program. It’s interesting like 25 years
ago there was a big swing in the pendulum where people were saying, well electrical engineers really shouldn’t know how to program. It’s not necessary because
they’ll be some sort of point and click environment where they do this. And it’s gone exactly the opposite way. More and more companies and
employers are just really want students to be able to
program because that’s how you actually implement your ideas. And they want them to
have good coding skills. – I agree with that. What I
meant is that we don’t want, we don’t necessarily need all the students to do exoscale computing. – No I took yours to be in
line with my view actually. – [Host] From exoscale
or computer science. – But I don’t think the coding
education should be bypassed for user-friendly exoscale tool that maybe we can train them on. I think they should know
what’s under the hood.

Leave a Reply

Your email address will not be published. Required fields are marked *