Stream: python-questions

Topic: Replace Empty String value with Floating Point?


view this post on Zulip Muntaha Pasha (Oct 01 2020 at 17:19):

Hello all! I'm a student from CU Boulder working with Dr. Shields this semester to look at and study anemometer data from the Sonic versus the WXT. I just wanted to ask a question about some list conversion issues I'm having. To keep it very brief, Time, Wind Speed, and Wind Direction are being read in as a list of strings.
For instance, Wind Speed looks something like ['2.0', '1.98', '3.04'] etc.
Now I have wanted to convert Wind Speed into a list of floating points, however... not all the data in my list is actually floats. There's missing data.
So in reality, Wind Speed looks like ['3.40', ' ', '9.02', ' ', '6.78], and so on.
I've tried a bunch of list comprehensions, I've tried .replace and all sorts of other things but can't for the life of me figure out how to replace all these ' ' values with some temporary float value like '99.99'. As you can see from my code I've tried this lambda thing but for some reason that messed up my data and started conjoining floats together, so i'd end up with 99.99899.99599.99 and weird data like that.
Does anyone have any idea on how I can replace these empty string values with a float so that I can convert my list to floats? (Also you can ignore the CWindSpd2 array, that was something I deleted because the lambda function wasnt properly cleaning my list) Capture.PNG

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:27):

Welcome, @Muntaha Pasha! Thanks for the question.

This kind of problem happens a lot with data. There are always errors or unexpected values in the data that need to be dealt with in post-processing. It is the messiest thing about data analysis.

Try changing your CWindSpd calculation to the following:

CWindSpd = list(map(lambda s: float(s) if s else None, WindSpd))

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:29):

The lambda function that I wrote will return the string s converted to a float if the string evaluates to True, otherwise it will return None. You could also replace None with your "missing value" of 99.99.

view this post on Zulip Muntaha Pasha (Oct 01 2020 at 17:31):

@Kevin Paul Thank you so much! That did the trick.

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:31):

Note that in Python, it is typical that anything that contains multiple "items" in it (strings, lists, dicts, etc.) will evaluate to False if it is "empty" (i.e., len(x) == 0). So, the line if s else None is equivalent to if len(s) > 0 else None.

view this post on Zulip Muntaha Pasha (Oct 01 2020 at 17:32):

Ohh, I see. Yeah, that's a better way to go about it. At first I tried crazy for loops to iterate through empty spots and then try and replace the values there, but this is definitely more condensed and makes a lot more sense to me. Thanks again for the help!

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:35):

You're very welcome.

Since you are reading your data from a CSV file, you might consider working with Pandas. Pandas has a builtin function read_csv that will automatically read a CSV file and return a "spreadsheet-like" data structure for you. The advantage of Pandas is that it will automatically try to detect the type of data. In your case, it will automatically detect floats.

It also allows you to do computations with missing values, which it will replace with NaNs.

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:37):

You can also easily extract "columns" from your Pandas "spreadsheet" (called a Dataframe) into NumPy arrays, which are great for computation.

view this post on Zulip Kevin Paul (Oct 01 2020 at 17:37):

...But start with what you know and go from there once you have it working the way you want. :smiley:


Last updated: Jan 30 2022 at 12:01 UTC