Stream: python-questions
Topic: Replace Empty String value with Floating Point?
Muntaha Pasha (Oct 01 2020 at 17:19):
Hello all! I'm a student from CU Boulder working with Dr. Shields this semester to look at and study anemometer data from the Sonic versus the WXT. I just wanted to ask a question about some list conversion issues I'm having. To keep it very brief, Time, Wind Speed, and Wind Direction are being read in as a list of strings.
For instance, Wind Speed looks something like ['2.0', '1.98', '3.04'] etc.
Now I have wanted to convert Wind Speed into a list of floating points, however... not all the data in my list is actually floats. There's missing data.
So in reality, Wind Speed looks like ['3.40', ' ', '9.02', ' ', '6.78], and so on.
I've tried a bunch of list comprehensions, I've tried .replace and all sorts of other things but can't for the life of me figure out how to replace all these ' ' values with some temporary float value like '99.99'. As you can see from my code I've tried this lambda thing but for some reason that messed up my data and started conjoining floats together, so i'd end up with 99.99899.99599.99 and weird data like that.
Does anyone have any idea on how I can replace these empty string values with a float so that I can convert my list to floats? (Also you can ignore the CWindSpd2 array, that was something I deleted because the lambda function wasnt properly cleaning my list) Capture.PNG
Kevin Paul (Oct 01 2020 at 17:27):
Welcome, @Muntaha Pasha! Thanks for the question.
This kind of problem happens a lot with data. There are always errors or unexpected values in the data that need to be dealt with in post-processing. It is the messiest thing about data analysis.
Try changing your CWindSpd
calculation to the following:
CWindSpd = list(map(lambda s: float(s) if s else None, WindSpd))
Kevin Paul (Oct 01 2020 at 17:29):
The lambda
function that I wrote will return the string s
converted to a float
if the string evaluates to True
, otherwise it will return None
. You could also replace None
with your "missing value" of 99.99
.
Muntaha Pasha (Oct 01 2020 at 17:31):
@Kevin Paul Thank you so much! That did the trick.
Kevin Paul (Oct 01 2020 at 17:31):
Note that in Python, it is typical that anything that contains multiple "items" in it (strings, lists, dicts, etc.) will evaluate to False
if it is "empty" (i.e., len(x) == 0
). So, the line if s else None
is equivalent to if len(s) > 0 else None
.
Muntaha Pasha (Oct 01 2020 at 17:32):
Ohh, I see. Yeah, that's a better way to go about it. At first I tried crazy for loops to iterate through empty spots and then try and replace the values there, but this is definitely more condensed and makes a lot more sense to me. Thanks again for the help!
Kevin Paul (Oct 01 2020 at 17:35):
You're very welcome.
Since you are reading your data from a CSV file, you might consider working with Pandas. Pandas has a builtin function read_csv
that will automatically read a CSV file and return a "spreadsheet-like" data structure for you. The advantage of Pandas is that it will automatically try to detect the type of data. In your case, it will automatically detect float
s.
It also allows you to do computations with missing values, which it will replace with NaN
s.
Kevin Paul (Oct 01 2020 at 17:37):
You can also easily extract "columns" from your Pandas "spreadsheet" (called a Dataframe
) into NumPy arrays, which are great for computation.
Kevin Paul (Oct 01 2020 at 17:37):
...But start with what you know and go from there once you have it working the way you want. :smiley:
Last updated: Jan 30 2022 at 12:01 UTC