Hi, I'm looking to open a topography file with the name topography_km62_201209.ieeei4
I've never opened this file type, and not much is coming up online that is helpful. Does anyone know a quick way to read these files with python? Thanks in advance!
Mira Berdahl said:
Hi, I'm looking to open a topography file with the name topography_km62_201209.ieeei4
I've never opened this file type, and not much is coming up online that is helpful. Does anyone know a quick way to read these files with python? Thanks in advance!
Looks like a binary file extension specific to the NCAR climate modeling world.
I don't have an example (some others around here might), but you could try using something like numpy.fromfile.
Yeah, I was just testing out the arguments but I think you want
import numpy as np
data = np.fromfile('topography_km62_201209.ieeei4', dtype='>i')
the i
in dtype = '>i'
tells numpy to read in 4-byte (32-bit) integers, and the >
tells it to use big-endian (the file name looks like a CESM input file, which are all big-endian by convention). I would then look at np.min(data)
and np.max(data)
to make sure the range looks reasonable -- if they look off, try dtype='i'
instead. I think little-endian is more common for binary files outside of the CESM world. You should also check np.shape(data)
to make sure it is the size you were expecting.
Note that this will give you a 1D array of data, so you may need to reshape it (data2 = np.reshape(data, (...))
, where (...)
is an array of the new dim sizes).
This is great @Michael Levy & @Katelyn FitzGerald thanks for the fast responses!
I just tried and can open the file, but its 1D as you suggested. Unfortunately, I don't know what dimensions the data ought to be though, so can't reshape it. Any idea how to glean this info?
Mira Berdahl said:
I just tried and can open the file, but its 1D as you suggested. Unfortunately, I don't know what dimensions the data ought to be though, so can't reshape it. Any idea how to glean this info?
Can you give a little background on what this file is and how it is used? Also, how big is the 1D array?
@Michael Levy , the file is a bathymetry file used in a 0.1deg ocean CESM simulation. The shape of the binary file is (8640000,). Thanks!
The 0.1 degree grid is 3600 x 2400 -- Iwould try np.reshape(data, (3600, 2400))
and then plot it -- the region where the value is 0 should look like the continents, but if it doesn't then you'll want np.reshape(data, (2400, 3600))
@Michael Levy This works, thanks. It needed to be of the shape (2400,3600). I am a bit confused, though, since I assumed this file would actually show the bathymetry (in meters), but instead it looks a lot like the KMT field, where the values are 0-62, corresponding to depth levels of the POP grid. Do you have any thoughts on this? Thanks.
@Mira Berdahl the data is all integer and it corresponds to the number of active ocean levels at that location. For the 1 degree grid, there is a 1D array of cell-center depths associated with each layer and then you can use the index contained in the bathymetry file to find the depth at that column. For the 0.1 degree ocean simulation, we typically run with "partial bottom cells" turned on and that requires a separate 2D file containing the actual depth. If you have access to the case root for the CESM run itself, look in ${CASE_ROOT}/CaseDocs/pop_in
and see what bottom_cell_file
is set to. For the runs I've done with that grid, the default value is ${DIN_ROOT_LOC}/ocn/pop/tx0.1v3/grid/dzbc_pbc_s2.0_20171019.ieeer8
, and you can read it in with data = np.reshape(np.fromfile('/path/to/file.ieeer8', '>d'), (2400, 3600))
(>
for big-endian, and d
instead of i
since this is double-precision real values instead of integer values). Again, double check the min and max to make sure it read in correctly, and plot it to make sure the reshape was ordered right (but it should list columns in the same order as the bathymetry file). I think you're right about the units, but the max value will also make it obvious if the units are centimeters instead of meters
@Mira Berdahl to fill in anyone who stumbles on this thread later, it sounds like what you really want to do is read the bathymetry used in a POP run. If you have access to any of the POP history (or time series) output files, there is a field HT
that will give you ocean depth at a given cell. So bathymetry = xr.open_dataset(pop_netcdf_output)['HT']
should do the trick.
If you want to construct the bathymetry ahead of time, we were on the right track. The first file you read from the CESM input data repository (the .ieeei4
file) tell us how many active ocean layers each grid cell has. The second file you read in (the .ieeer8
file) is also necessary, but I didn't describe it correctly -- the dzbc_pbc
file actually tells you the layer thickness (in centimeters) of the bottom most active layer. So the last thing we need is to know the expected layer thicknesses: then, for each cell, we replace the bottom-most thickness (index based on the value in the bathymetry
file) with the value from the dzbc_pbc
file and then sum over the thicknesses of the active cells to get the total depth.
The thicknesses of each layer (not considering the partial bottom cells) come from a file in the POP repository. Rather than processing it yourself, you can trust me when I tell you
dz = [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1019.6808, 1056.4484, 1105.9951, 1167.807, 1242.4133, 1330.9678, 1435.141, 1557.1259, 1699.6796, 1866.2124, 2060.9023, 2288.8521, 2556.2471, 2870.575, 3240.8372, 3677.7725, 4194.0308, 4804.2236, 5524.7544, 6373.1919, 7366.9448, 8520.8926, 9843.6582, 11332.4658, 12967.1992, 14705.3438, 16480.709, 18209.1348, 19802.2344, 21185.957, 22316.5098, 23186.4941, 23819.4492, 24257.2168, 24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291, 24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094, 25000.0, 25000.0]
So I think something like
num_levels = np.reshape(np.fromfile('topography_km62_201209.ieeei4', dtype='>i'), (2400, 3600))
dzbc = np.reshape(np.fromfile('/path/to/file.ieeer8', '>d'), (2400, 3600))
dz = [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1019.6808, 1056.4484, 1105.9951, 1167.807, 1242.4133, 1330.9678, 1435.141, 1557.1259, 1699.6796, 1866.2124, 2060.9023, 2288.8521, 2556.2471, 2870.575, 3240.8372, 3677.7725, 4194.0308, 4804.2236, 5524.7544, 6373.1919, 7366.9448, 8520.8926, 9843.6582, 11332.4658, 12967.1992, 14705.3438, 16480.709, 18209.1348, 19802.2344, 21185.957, 22316.5098, 23186.4941, 23819.4492, 24257.2168, 24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291, 24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094, 25000.0, 25000.0]
bathymetry = np.zeros((2400, 3600))
for k in range(62):
bathymetry = bathymetry + np.where(k<num_levels-1, dz(k), np.where(k == num_levels-1, dzbc, 0))
Note that my network connection is a little flaky so I haven't had a chance to test the above, but the general gist should be right (and it should match HT
from a history file). And both of those values will be in cm, not m.
Despite @Keith Lindsay reminding me to be careful, I'm pretty sure the above code snippet has an off-by-one error because Fortran (and topography_km62_201209.ieeei4
start counting at 1 but the python array indices start counting at 0... now that my network is more stable, I'll whip up a quick jupyter notebook that runs on Casper and makes sure everything is correct
Okay, it looks like my code above is correct, except I have a dz(k)
that should really be dz[k]
:) Here's a nicer formatting:
inputdata = os.path.join(os.sep,
os.environ['CESMDATAROOT'],
'inputdata',
'ocn',
'pop',
'tx0.1v3',
'grid',
)
# Set dz
dz = [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0,
1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0,
1019.6808, 1056.4484, 1105.9951, 1167.807, 1242.4133, 1330.9678,
1435.141, 1557.1259, 1699.6796, 1866.2124, 2060.9023, 2288.8521,
2556.2471, 2870.575, 3240.8372, 3677.7725, 4194.0308, 4804.2236,
5524.7544, 6373.1919, 7366.9448, 8520.8926, 9843.6582, 11332.4658,
12967.1992, 14705.3438, 16480.709, 18209.1348, 19802.2344, 21185.957,
22316.5098, 23186.4941, 23819.4492, 24257.2168, 24546.7793, 24731.0137,
24844.3281, 24911.9746, 24951.291, 24973.5938, 24985.9609, 24992.6738,
24996.2441, 24998.1094, 25000.0, 25000.0]
# Read topography file and dzbc
num_levels = np.reshape(np.fromfile(os.path.join(inputdata, 'topography_20170718.ieeei4'), dtype='>i'), (2400, 3600))
dzbc = np.reshape(np.fromfile(os.path.join(inputdata, 'dzbc_pbc_s2.0_20171019.ieeer8'), '>d'), (2400, 3600))
# Compute bathymetry
bathymetry = np.zeros((2400, 3600))
for k in range(62):
bathymetry = bathymetry + np.where(k<num_levels-1, dz[k], np.where(k == num_levels-1, dzbc, 0))
I also read in HT
from an earlier run and verified that HT - bathymetry
is zero everywhere
This is so helpful, thanks so much @Michael Levy !
Last updated: May 16 2025 at 17:14 UTC