ERDDAP > Information
ERDDAP is a data server that gives you a simple, consistent way to download
subsets of scientific datasets in common file formats and make graphs and maps.
Table of Contents
Without ERDDAP, when a person (or a computer program) looks on the Internet for a
specific type of scientific data (for example, satellite sea surface temperature data),
there are problems ...
- Interesting datasets are hard to find because they are at many different websites.
- Each site requires a different protocol to request the data:
(for example,
HTTP GET,
XML,
SOAP+XML,
OPeNDAP,
WCS,
WFS,
SOS,
or an HTML form).
- Each site returns the data in a different format (for example, XML, SOAP+XML, OPeNDAP binary
data stream, ASCII text, HDF 4, HDF 5, NetCDF, ...) and it isn't the common file format
that you want.
- Data from different sites is hard to compare because the dates+times are expressed in
different formats for example, "Jan 2, 1985", "02-JAN-1985", "1/2/85", "2/1/85",
"1985-01-02", or days since Jan 1, 1980, or ...).
- For a quick introduction to ERDDAP,
watch the first half of this
video. (5 minutes)
In it, a scientist downloads ocean currents forecast data from ERDDAP to model a toxic
spill in the ocean using
NOAA's GNOME software (in 5 minutes!).
This video shows:
Thanks to Rich Signell.
(One tiny error in the video:
when searching for datasets, don't use AND between search terms. It is implicit.)
- ERDDAP
can get data from local (on the server's hard drive) and
remote (accessed via the web) data sources.
See the
list
of types of data sources
that ERDDAP can access.
- ERDDAP
can serve many types of scientific data, not just oceanographic data.
ERDDAP is a data server that was written at
NOAA
NMFS
SWFSC
ERD.
The ERDDAP server at ERD serves oceanographic data,
but ERDDAP (the program) can access and serve any gridded or tabular data.
- ERDDAP
offers several ways to search for interesting datasets.
For example,
full text search,
search by category
(also known as faceted search), and
Advanced Search.
Advanced Search combines all of the search techniques
and adds searches for datasets that have data within longitude, latitude, and time ranges,
so you can search for datasets based on many different criteria simultaneously.
- ERDDAP lets you request data in a standardized way,
regardless of the data source's request protocol.
ERDDAP also provides Data Access Forms (web pages) which help humans create the OPeNDAP
requests. OPeNDAP's
Data Access Protocol (DAP)
is one of
NOAA's Data Access Technical Recommendations
and a
NASA Earth Science Data and Information System (ESDIS) standard.
(OPeNDAP is great!)
ERDDAP translates your request from the OPeNDAP, WMS, or SOS format to the data source's
request format and converts the response to one of ERDDAP's internal data structures.
Then ERDDAP reformats the data in the common file format of your choice (for example, as an
.html table, ESRI .asc, Google Earth .kml, .mat, .nc, ODV .txt, .csv, .tsv, .json, .xhtml, .png, .pdf)
and sends the file to you.
See the list of
griddap file types
and the list of
tabledap file types.
Other protocols for requesting the data (for example,
WCS)
may be added in the future.
ERDDAP is structured for these additions and there don't seem to be any impediments.
- Requests for gridded data can be made in user units.
Although requests for gridded data in ERDDAP can be made with array indices
(following the OPeNDAP specification), requests can also be in user units
(for example, degrees east), using
a parentheses notation,
since users think in those units,
not indices.
- ERDDAP sends results in common data file formats.
The results can be returned in any of several common data file formats
(for example,
HTTP GET,
XML,
SOAP+XML,
OPeNDAP,
WCS,
WFS,
SOS,
or an HTML form),
instead of just the original format or just the OPeNDAP transfer format (which has no
standard file manifestation). These files are created on-the-fly.
Since there are few internal data structures, it is easy to add additional file-type drivers.
See the complete list of
grid file types
and table file types.
- ERDDAP standardizes the variable names and units for longitude, latitude, altitude,
depth, and time in the results.
To facilitate comparisons of data from different datasets, the requests and results in ERDDAP
use standardized space/time axis units:
- longitude is always in degrees_east.
- latitude is always in degrees_north.
- altitude is always in meters with positive=up.
- depth is always in meters with positive=down.
- time, when formatted as a number, is always in "seconds since 1970-01-01T00:00:00Z"
(known as Unix time or epoch seconds,
which is
UDUNITS-compatible)
and, when formatted as a string, is formatted according
to the
ISO 8601:2004 "extended" format
standard (yyyy-MM-ddTHH:mm:ssZ, for example,
"1985-01-02T00:00:00Z").
(You can convert numeric times to/from ISO string times with ERDDAP's
time converter.)
Also, to avoid time zone and daylight saving time confusion, time values are always
converted to the Zulu (UTC, GMT) time zone.
This makes it easy to specify constraints in requests without having to worry about the altitude
data format (are positive values up or down? in meters or fathoms?) or the time data format
(a nightmarish realm of possible formats, for example, "Jan 2, 1985",
"02-JAN-1985", "1/2/85", "2/1/85", "1985-01-02", or days since Jan 1, 1980).
This makes the results from different data sources easy to compare.
ERDDAP has a utility to
Convert a Numeric Time to/from a String Time.
For more details, see
How ERDDAP Deals with Time.
Because the longitude, latitude, altitude, and time variables are specifically recognized,
ERDDAP is aware of the geo/temporal features of each dataset.
This is useful when making images with maps or time-series, and when saving data
in geo-referenced file types (e.g., .esriAscii, .geoJson, and .kml).
Two common standards for writing units of measure are:
- UDUNITS
- from
Unidata,
which is used in
COARDS,
CF, and
NetCDF
data files. For example,
UDUNITS has many options for degrees Celsius, including degree_C and degC.
- UCUM
- the Unified Code for Units of Measure.
OGC
services such as
SOS,
WCS, and
WMS
often refer to UCUM as UOM (Units Of Measure). For example, UCUM has just one
case-sensitive option for degrees Celsius: "Cel".
Although ERDDAP doesn't require the use of either units standard, most ERDDAP installations
favor one or the other.
(ERDDAP administrators: you can specify this with the <units_standard>
tag in setup.xml.)
You can convert UDUNITS to/from UCUM units with ERDDAP's
units converter.
When you request data or a graph from a tabledap dataset,
you can append &units("UDUNITS")
or &units("UCUM") to the end of the URL to request UDUNITS or UCUM units.
more information
- ERDDAP can add or modify metadata.
Many data sources have little or no
metadata
(for example,
CF metadata)
describing the data.
ERDDAP lets (and encourages) the administrator to describe metadata which will be added
to datasets and their variables on-the-fly.
See the
addAttributes section
of the
directions for administrators.
- ERDDAP lets you request .png and .pdf image files with graphs and maps
of the data in addition to the actual data. And ERDDAP's Make A Graph lets you customize the images.
Some special uses of these images are:
- Requesting Compressed Files
ERDDAP doesn't offer results stored in compressed (e.g., .zip or .gzip) files
unless the source file is already compressed.
Instead, ERDDAP looks for
accept-encoding
in the HTTP GET request header sent by the client.
If a supported compression type (gzip, x-gzip, or deflate) is found in the
accept-encoding list, ERDDAP includes content-encoding in the HTTP response header
and compresses the data as it transmits it.
It is up to the client program to look for content-encoding and decompress the data.
Compressed responses are often 3-10 times faster, although there is no benefit
to requesting compressed .png files since the files' contents are already compressed.
Browsers and OPeNDAP clients do this by default. They request compressed data and
decompress the returned data automatically.
Other clients (e.g., Java programs) have to do this explicitly.
With curl, add --compressed to the command line to tell curl to
request a compressed response and automatically decompress it.
- ERDDAP
makes different types of data servers (OPeNDAP, OBIS, SOS, WMS, ...) interoperable.
Different types of data servers are used in different scientific communities.
In the foreseeable future, it is unlikely that any one type will become dominant and
replace the others. So ERDDAP acts as a bridge between different types of
client programs (web browsers, IDV, Matlab, netCDF programs, ODV, WMS clients, etc.)
and the different types of data servers.
- ERDDAP accepts client requests for data in different formats (e.g., OPeNDAP, WMS).
- ERDDAP converts a given request into the request format used by the source data server
(e.g., OPeNDAP, SOS, OBIS, ...) and sends that to the source data server.
- ERDDAP converts the response data from the source data server into an internal format,
including converting all time data to a common format: "seconds since 1970-01-01T00:00:00Z".
- ERDDAP converts the data from the internal format into the file format requested by the client
(e.g., .csv, Google Earth .kml, .htmlTable, .dods, .mat, .nc, ODV .txt, .png).
Clients don't have to worry about, or know about, the type of the source data server.
They just get the data they want, in the file format they want.
- ERDDAP uses just two basic data structures to hold data.
- Since it is difficult for human clients and computer clients to deal with a complex set of
possible dataset structures, ERDDAP uses just two basic data structures:
- Certainly, not all data can be expressed in these structures, but much of it can.
Tables, in particular, are very flexible data structures (look at the phenomenal success
of
relational database programs).
- This makes data queries easier to construct.
- This makes data responses have a simple structure, which makes it easier to serve the data
in a wider variety of standard file types (which often just support simple data structures).
This is the main reason that we set up ERDDAP this way.
- This, in turn, makes it very easy for us (or anyone) to write client software which works
with all ERDDAP datasets.
- This makes it easier to compare data from different sources, for example for an
Integrated Ecosystem Analysis (IEA).
- We are very aware that if you are used to working with data in other data structures
you may initially think that this approach is simplistic or insufficient.
But all data structures have tradeoffs. None is perfect.
Even the do-it-all structures have their downsides: working with them is complex and
the files can only be written or read with special software libraries.
If you accept ERDDAP's approach enough to try to work with it, you may find that it has
its advantages (notably the support for multiple file types that can hold the data responses).
The
original ERDDAP slide show
(particularly the
data
structures slide)
talks about these issues.
- And even if this approach sounds odd to you, most ERDDAP clients will never notice --
they will simply see that all of the datasets have a nice simple structure and they will
be thankful that they can get data from a wide variety of sources returned in a wide
variety of file formats.
- ERDDAP offers
email/URL and
RSS
subscription services,
so you can be notified whenever a dataset changes.
- ERDDAP is very good at detecting changes to gridded datasets because it can detect
when the axis values (e.g., the time values) change.
- ERDDAP is not very good at detecting changes to tabular datasets because there are usually
no changes to the metadata when new data is added.
- ERDDAP will detect if a dataset becomes unavailable (but perhaps not immediately).
- ERDDAP will detect when that dataset becomes available again.
- ERDDAP makes no promises about the suitability or accuracy of these services
(see ERDDAP's DISCLAIMERS).
Email/URL Subscriptions
(not available at some ERDDAP installations)
Whenever a dataset changes, the email/URL subscription system will immediately
send you an email or contact a URL that you specify.
Email/URL subscriptions are not available at some ERDDAP installations.
To set up an email/URL subscription, click on one of the envelope icons
that appear at the far right on ERDDAP web pages with lists of datasets
(example)
and on the Data Access Forms and Make A Graph web pages for individual datasets
(example)
if this ERDDAP installation supports email/URL subscriptions.
(Computer programmers: if you write web services, you can use the URL system
to have ERDDAP notify your web service immediately whenever a dataset changes.)
RSS
Subscriptions
RSS is standard system for notifying users when the content at a website has changed.
Modern web browsers have an RSS client built in or you can use a separate
RSS Reader.
ERDDAP offers a separate RSS 2.01 feed for each dataset so that you can find out
when interesting datasets have changed.
To subscribe to a dataset's RSS feed, click on one of the RSS icons
that appear at the far right on ERDDAP web pages with lists of datasets
(example)
or on the Data Access Forms and Make A Graph web pages for individual datasets
(example).
Comparison
The RSS service may be just what you are looking for. It is a nice standard.
But if you need to know as soon as possible when a dataset changes, use the
email/URL system, not RSS. RSS clients periodically (every hour?) request and read
the RSS XML document to look for changes.
So typically, an RSS client will not detect a change to a dataset quickly (average 30 minutes?).
In contrast, the email/URL subscription system acts immediately whenever ERDDAP detects
a change to a dataset.
The more proactive approach of the email/URL system is also much more efficient:
You may be able to set your RSS client to check for changes every minute (don't do it!),
but that would just lead to lots of unnecessary requests to the ERDDAP server
and it still wouldn't detect changes immediately.
- ERDDAP is a
web application
(web pages with forms for humans using browsers)
and a
web service
(with services for computer programs).
In fact, the forms on ERDDAP's web pages just generate specially formed URLs
that are then submitted to ERDDAP's web services.
- ERDDAP has
REST-
and
ROA-style
links to make its services available to computer programs.
These features can be used to build another web service on top of ERDDAP
(making ERDDAP do all the work!).
ERDDAP is not intended to be a high-level data exploration/graphing service.
Instead, ERDDAP is intended to provide services for such websites and programs.
So if you have an idea for a better interface to the data the ERDDAP serves,
we encourage you to build your own web application or web service, and use ERDDAP
as the foundation. Read more about ERDDAP's
Services for Computer Programs.
- Security - By default, ERDDAP runs as an entirely public server with no login
system and no restrictions to data access.
However, an ERDDAP administrator can configure ERDDAP to restrict access to some
or all datasets to users who log in and have been assigned certain roles.
ERDDAP has built-in methods for authentication (logging in).
If an ERDDAP installation has authentication turned on, there will be a "log in" link
at the top of each web page.
Users never have to log in to access the publicly available datasets.
Users who have logged in can access public datasets and the private datasets to which
they are allowed access.
Users must use https: (Secure Sockets Layer) to log in and to access private datasets.
Datasets can configured to have graphs and maps publicly accessible,
but have data only accessible to authorized users.
(more information)
- ERDDAP processes data in chunks.
To save memory (a big issue) and make responses start sooner, ERDDAP processes
data requests in chunks — repeatedly getting a chunk of data from the source,
cleaning it up (for example, adding
metadata),
and sending that to the client.
For many data sources, this means that the first chunk of data (for example, from the
first sensor) gets to the client in seconds instead of minutes (for example, after data
from the last sensor has been retrieved), reassuring the client that the data is coming.
From a memory standpoint, this allows numerous large requests (each larger than available
memory) to be handled simultaneously.
- ERDDAP has a modular structure.
ERDDAP is structured so that it is easy to add different components
(for example, a class to request data from an SOS server and store it as a table).
The new component then gains all the features and capabilities of the parent
(for example, support for OPeNDAP requests and the ability to save the data in several
common file formats).
- Data Dissemination / Data Distribution Networks: Push and Pull Technology
Normally, ERDDAP acts as an intermediary: it takes a request from a user;
gets data from a remote data source; reformats the data; and sends it to the user.
Pull Technology:
But ERDDAP also has the ability to actively get all of the available data
from a remote data source and
store
a local copy of the data.
Push Technology:
By using ERDDAP's subscription services,
other data servers can be notified
as soon as new data is available so that they can request the data (by pulling the data).
ERDDAP's
EDDGridFromErddap and EDDTableFromErddap use ERDDAP's subscription services and
flag system
so that they will be notified immediately when new data is available.
You can combine these to great effect:
if you wrap an EDDGridCopy around an EDDGridFromErddap dataset
(or wrap an EDDTableCopy around an EDDTableFromErddap dataset),
ERDDAP will automatically create and maintain a local copy of another ERDDAP's dataset.
Because the subscription services work as soon as new data is available,
push technology disseminates data very quickly (within seconds).
This architecture puts each ERDDAP administrator in charge of determining where the data
for his/her ERDDAP comes from. Other ERDDAP administrators can do the same.
There is no need for coordination between administrators.
If many ERDDAP administrators link to each other's ERDDAPs,
a data distribution network is formed.
Data will be quickly, efficiently, and automatically disseminated from data sources
(ERDDAPs and other servers) to data redistribution sites (ERDDAPs) anywhere in the network.
A given ERDDAP can be both a source of data for some datasets and a redistribution site
for other datasets.
The resulting network is roughly similar to data distribution networks set up with programs
like
Unidata's IDD/IDM,
but less rigidly structured.
DAP? OPeNDAP? DODS? ERDDAP? What's the difference?
My (Bob's) understanding is:
DODS (Distributed Oceanographic Data System) was created in the 1990's,
before there was http: (!). The DODS system created and used the dods: protocol on the Internet.
When HTTP came along and was so successful, they switched from dods: to http:.
At some point, they realized the system was useful for more than just oceanographic data.
So they ditched that DODS name (although it lives on in some code),
formed a small organization called
OPeNDAP
and wrote the
DAP (Data Access Protocol) specification,
which standardizes the format of the requests for metadata and/or data,
and the responses with the metadata and/or data.
OPeNDAP (the organization) still shepherds DAP (the specification) and is the author of Hyrax (the data server which
is often mistakenly referred to as OPeNDAP).
Hyrax, THREDDS, GRADS, ERDDAP and others are data servers (software) which implement DAP.
They each implement a subset of DAP but do other things very differently.
ERDDAP uses code (in the "dods" directory) (actually written by Jake Hamby at NASA JPL)
for some features of reading data from external DAP servers.
ERDDAP uses its own code to write out DAP responses.
Is ERDDAP a solution to everyone's data distribution / data access problems?
No. ERDDAP tries to find a sweet spot that is a really good solution to most of the
data distribution problems that we confronted.
ERDDAP takes a middleware approach:
It can get data from lots of different types of remote data servers
and it can give that data to clients in lots of different file formats.
It is designed to be an agnostic solution which seeks to make other data servers
(OPeNDAP, SOS, OBIS, WMS, ...) interoperable.
Is there one perfect data server that meets everyone's needs perfectly? We don't think so.
And even if you think there is or will be, it will be a long time before everyone switches
to it, if ever. Until then, ERDDAP is available right now to make other data servers
interoperable and to serve data right now.
ERDDAP can handle many/most datasets as is, but not all.
It isn't that the remaining datasets (e.g., model data using a cubed sphere projection)
aren't important. It's just that ERDDAP's goal of returning data in common file formats
(some of which are pretty simple), precludes a more complex internal data structure.
Groups of researchers working with more complex data structures often already have specialized
data servers and specialized client software which are customized to their community's needs.
ERDDAP, as a general purpose data server, doesn't try to compete with these specialized data servers.
They are customized to the needs of their community and do a great job.
However, those datasets are often only "understood" by the specialized software in that community.
A Work-Around for Complex Datasets - ERDDAP has a way to handle complex datasets that it
can't handle directly. Just as a
relational database
can store a complex dataset by using just
one simple data structure (a table), ERDDAP can serve the data from more complex datasets by
breaking the source dataset into a few ERDDAP datasets, each with similar, simple data structures.
For example, some gridded environmental model datasets can be stored in ERDDAP by
putting the sea surface variables ([time][latitude][longitude]) in one ERDDAP dataset,
and by putting the variables with altitude ([time][altitude][latitude][longitude])
in another ERDDAP dataset. We know this isn't ideal, but it is necessary to allow ERDDAP
to return data in common file formats (some of which are pretty simple).
Another approach to dealing with complex datasets (e.g., for model data using a cubed
sphere projection) is to also offer a reprojected version of the dataset
([time][altitude][latitude][longitude]) which ERDDAP can work with easily.
These simpler data structures aren't meant to replace the original data structures,
but they can be a useful way to distribute the data to a wider audience.
How sustainable is the ERDDAP project?
ERDDAP is very sustainable.
Some people are surprised and disappointed to hear that ERDDAP is mostly
developed by one person (me, Bob Simons).
[By the way, the opinions on this web page are my personal opinions and
do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration.]
They fear that if something happens
to me, that will be end of ERDDAP. That is simply not true.
ERDDAP's positioning for long-term sustainability is excellent,
and close to the best it could possibly be.
Yes, I am the main developer of ERDDAP. I am a fully funded federal employee.
My funding isn't "soft" money, so I don't receive or rely on grants.
I spend more than half my time developing ERDDAP.
The rest of my time is spent managing datasets.
That work is useful for ERDDAP because I need to work with real datasets in order to
know in detail what ERDDAP needs to do. My bosses fully support my work on
ERDDAP because it does what I was hired to do: make it easier for fisheries
scientists (primarily, but really everyone) to get scientific data from diverse
sources.
The miraculous
thing about software is that it costs nothing to duplicate.
So to do my job, I write ERDDAP for use at ERD. I think that is the best possible way
for me to do my job. That reason alone justifies the
expense of developing ERDDAP. (I think it could be shown that ERDDAP has saved
more NOAA scientist's time than that I have spent developing ERDDAP. Time=Money.)
But the side benefit is that any other organization can
download, install, and use ERDDAP
for free to distribute their scientific data.
Over 90 organizations in at least 14 countries use ERDDAP. Maybe there is such a thing as a free lunch.
ERDDAP is a Java program. The source code
for every version is on
GitHub,
the most commonly used system for collaborative
software projects.
So far, these groups/people have contributed code to ERDDAP:
- MergeIR
EDDGridFromMergeIRFiles.java was written and contributed by
Jonathan Lafite and Philippe Makowski of R.Tech Engineering
(license: copyrighted open source). Thank you, Jonathan and Philippe!
- TableWriterDataTable
.dataTable (TableWriterDataTable.java) was written and contributed by Roland Schweitzer of NOAA
(license: copyrighted open source). Thank you, Roland!
- json-ld
The initial version of the
Semantic Markup of Datasets with json-ld (JSON Linked Data)
feature (and thus all of the hard work in designing the content) was
written and contributed (license: copyrighted open source)
by Adam Leadbetter and Rob Fuller of the Marine Institute in Ireland.
Thank you, Adam and Rob!
- orderBy
The code for the
orderByMean filter
in tabledap and the extensive changes to the code to
support the
variableName/divisor:offset notation
for all orderBy filters
was written and contributed (license: copyrighted open source)
by Rob Fuller and Adam Leadbetter of the Marine Institute in Ireland.
Thank you, Rob and Adam!
- Borderless Marker Types
The code for three new marker types (Borderless Filled Square, Borderless Filled Circle,
Borderless Filled Up Triangle) was contributed by Marco Alba of ETT / EMODnet Physics.
Thank you, Marco Alba!
- Translations of messages.xml
The initial version of the code in TranslateMessages.java which uses
Google's translation service to translate messages.xml into various languages
was written by Qi Zeng, who was working as a Google Summer of Code intern.
Thank you, Qi!
- orderBySum
The code for the
orderBySum filter
in tabledap (based on Rob Fuller and Adam Leadbetter's orderByMean) and
the Check All and Uncheck All buttons on the EDDGrid Data Access Form
were written and contributed (license: copyrighted open source)
by Marco Alba of ETT Solutions and EMODnet.
Thank you, Marco!
- Out-of-range .transparentPng Requests
ERDDAP now accepts requests for .transparentPng's when
the latitude and/or longitude values are partly or fully out-of-range.
(This was ERDDAP GitHub Issues #19, posted by Rob Fuller -- thanks for posting that, Rob.)
The code to fix this was written by Chris John.
Thank you, Chris!
- Display base64 image data in .htmlTable responses
The code for displaying base64 image data in .htmlTable responses
was contributed by Marco Alba of ETT / EMODnet Physics.
Thank you, Marco Alba!
- nThreads Improvement
The nThreads system for EDDTableFromFiles was significantly improved.
These changes lead to a huge speed improvement (e.g., 2X speedup when nThreads is set to 2 or more)
for the most challenging requests (when a large number of files must be processed to gather the results).
These changes will also lead to a general speedup throughout ERDDAP.
The code for these changes was contributed by Chris John. Thank you, Chris!
I hope others will contribute code in the future.
If something happens to me, my bosses will hire a replacement with the specific goal that
s/he continues the development of ERDDAP.
Further, I try to write very clean code. I write Java Doc comments. I write
comments in the code. I chose variable names carefully. I follow the Java formatting guidelines.
All of this is an effort
to make the code more readable, for other programmers who want to understand
and/or change it, and for me, because, in a year or two, I will have forgotten
the details of how and why the code was written the way it was.
Clean code with good comments makes my ongoing work on ERDDAP easier, so I have a
great incentive to write clean code with good comments.
But all of my answers so far are not very important.
Only one thing that is really important. Only one thing guarantees sustainability
for ERDDAP or any software project: that ERDDAP is
Free and Open Source Software (FOSS).
Specifically, ERDDAP uses
Apache-compatible software licenses,
so anyone can do anything they want with the code.
Why is that important? One might think that software will be reliably available
in the future because a big
company is behind it. But Google, for example, has discontinued numerous projects
(here's a list).
I don't want to pick on Google because I really like Google and they
fund a large number of great, open-source projects. Microsoft has
discontinued projects. Apple has discontinued projects. ...
The point is that just having the backing of a large company is no assurance
that the project will continue.
The users of that software are out of luck,
unless the software was (and therefore, always is) Free and Open Source Software
(FOSS). Then, whenever there is interest by even one developer, the project can and will
continue to evolve. FOSS is an insurance policy. In fact, FOSS is the only
insurance policy, the only assurance, that matters. FOSS insures that there
is always a way forward for the software. That is a right that no one can
take away, ever.
One might also think that software that has a large team of developers
will be more sustainable than software with one main developer. But
lots of developers usually need lots of funding. I know a famous, reasonably
large project with 10 developers (I won't embarrass them by naming them) that is
in constant serious danger of stopping the project because they don't have
enough funding. They rely on grants. They always run a deficit. Their patron has always bailed them
out at the last minute, but is getting really tired of bailing them out.
So if they can't raise a million dollars a year in grant money
(or the patron gets too tired of bailing them out), they'll stop. And the group can't
conceive of having fewer than 10 developers. Each developer has a role to play
in their group. In light of that, it seems to me that it is a great sign
that ERDDAP can be, and is, actively developed by just one main developer (who is
fully funded) with the unofficial assistance of a few others.
If fact, it would be a bad sign if ERDDAP required multiple
developers. That ERDDAP has just one main developer means that it isn't a huge
task that requires massive ongoing funding; it is a relatively small task
that requires minimal effort and funding. That is more sustainable, not less.
One might think that hiring a contracting company
to write software is a good idea.
For a fee, they'll provide developers and promise continuity (which
is good unless/until they go out of business). But they
also have you over a barrel: you must pay them what they request or there is
no more development, unless the software is FOSS and you're just paying them
to work on the code. With FOSS, you always have choices about how to move forward.
Because ERDDAP is FOSS, contractors are always a good option for you or anyone
with regard to ERDDAP: if anything happens to me (the one main developer), or if I don't have time to
make some change that you want, or I retire and you don't like my replacement's work,
you can always hire a contracting company to make the changes you want (or make them yourself).
In summary,
ERDDAP has the two sustainability features that matter most:
- ERDDAP is a small project (small enough to be handled by one main developer
with the unofficial assistance of a few others), so it doesn't require massive resources.
- ERDDAP is Free and Open Source Software, so no one can ever stop you or anyone else
from working on ERDDAP.
I cannot think of a better situation.
I hope that relieves any fears you (or anyone else) had about ERDDAP's sustainability.
If you hear people questioning or discouraging the use of ERDDAP because there is just
one main developer, please set them straight by pointing them to the above discussion at this URL:
https://coastwatch.pfeg.noaa.gov/erddap/information.html#sustainable .
How to Cite a Dataset in a Paper
It is important to let readers know how you got the data that you used in your paper.
For each dataset that you used, please look at the dataset's metadata in the
Dataset Attribute Structure section at the bottom of the .html page
for the dataset, e.g.,
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.html .
The metadata sometimes includes a required or suggested citation format for the dataset.
The "license" metadata sometimes lists restrictions on the use of the data.
To generate a citation for a dataset:
If you think of the dataset as a scientific article, you can generate a citation
based on the author (see the "creator_name" or "institution" metadata),
the date that you downloaded the data, the title (see the "title" metadata),
and the publisher (see the "publisher_name" metadata).
If possible, please include the specific URL(s) used to download the data.
If the dataset's metadata includes a
Digital Object Identifier (DOI),
please include that in the citation you create.
How to Cite ERDDAP in a Paper
If you want to cite ERDDAP itself in a scientific paper, please use something like
Simons, R.A. 2022. ERDDAP. https://coastwatch.pfeg.noaa.gov/erddap . Monterey, CA: NOAA/NMFS/SWFSC/ERD.
What does the acronym "ERDDAP" stand for?
"ERDDAP" used to be an acronym, but it outgrew that original description.
Now, please just think of it as a name, not an acronym.
Guidelines for Data Distribution Systems
Bob's opinions about the design
and evaluation of data distribution systems can be found
here.
You can
Set Up Your Own ERDDAP Server
and serve your own data.
- The small effort to set up ERDDAP brings many benefits.
- If you already have a web service for distributing your data, you can set up ERDDAP
to access your data via the existing service or via the source files or database.
Then, people will have another way to access your data and will be able to download
the data in additional file formats or as graphs or maps.
- If you have datasets that are in high demand, you can install
multiple ERDDAPs
that work together to scale up and meet the needs of a large data
distribution center.
If you have questions, suggestions, or comments about ERDDAP in general (not this specific
ERDDAP installation), please send an email to
erd dot data at noaa dot gov
and include the ERDDAP URL directly related to your question or comment.
Or,
you can join the ERDDAP Google Group / Mailing List by visiting
https://groups.google.com/forum/#!forum/erddap
and clicking on "Apply for membership".
Once you are a member, you can post your question there
or search to see if the question has already been asked and answered.
DISCLAIMER: The opinions on this web page are Bob Simons' personal opinions and
do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration.